The following analysis was performed to address the question "What can the data offer in the way of guidance for a tornado forecaster?" More specifically, an attempt will be made to identify the particular variables that appear to be the best predictors of tornados.
Three data sets will be examined - the circulations detected by the MDA, the TDA, and those detected by MDA and TDA, jointly. The sample sizes are 59691, 5860, and 891, and the number of tornadic (N_1) and nontornadic (N_0) circulations in each data set is (N_0=58786, N_1=905), (N_0=5348, N_1=512), and (N_0=591, N_1=300), respectively. These constitute 29 days of data. For further details regarding the data Gregory Stumpf and DeWayne Mitchell of the National Severe Storms Laboratory should be consulted.
It is well-known, though often over-looked, that the issue of the best predictors is best addressed in a univariate fashion, i.e. one variable at a time. There are many reasons for this, but an illustration is offered by considering a linear regression model. It is a fact that the regression slopes in a multivariate analysis are entirely meaningless as measures of predictive strength if there exists any collinearity among the independent variables. This ambiguity is made worse in the context of nonlinear models partly due to the existence of local minima in the error function. Finally, the presence of interaction terms in a nonlinear regression model renders the question of the best predictors entirely meaningless. In short, a univariate analysis is often the most reliable method of assessing the predictive strength of a set of independent variables.
In this report, several univariate approaches will be employed to address the question of the best predictors of tornados.
Method:
A first approach is to examine the posterior probability of a tornado event, given the value of an independent variable, P_1(x). This probability can be calculated from the conditional frequency distribution, N_i(x), at a given value of x, where i=0,1 refers to nontornados and tornados, respectively. Specifically, Bayes' theorem implies
A "good" predictor is one whose P_1(x) changes rapidly as a function of x. An example of a "good" predictor and a "bad" predictor is given in Figure 1. The solid curve is P_1(x) as a function of x, and the dotted curve is N_1(x) as a function of x. The latter quantity can be interpreted as placing a measure of confidence on the former. For instance, a P_1(x) that is accompanied by a relatively large N_1(x) is statistically more significant than one that has a small N_1(x). One may rank-order the independent variables according to the change in P_1(x) over the range of x.


Figure 1: Examples of a "good" predictor and a "bad" predictor (solid curves). Note the change of 25% in y-axis of the left graph in contrast to only 2% in the right graph. Dotted curves are the number of tornados at the corresponding value of x.
Another method for rank-ordering the variables is according to their correlations with the dependent variable (i.e. tornado ground truth), specifically, Pearson's correlation coefficient, r. When both the independent and the dependent variable are continuous r is a measure of linear correlation between the two. Although in the current case the dependent variable is binary (tornado/no-tornado), r does still offer a measure of correlation, although a better description may be association.
An alternative approach is offered by considering the way in which a forecaster uses the variables. He/she may be interested in issuing forecasts that maximize some dichotomous measure of performance. As such, an important quantity is the value of the decision threshold that reduces the continuous variable to binary (warning/no-warning). Consequently, the question of the "best" predictor becomes one of rank-ordering the variables according to the maximum value of some measure of performance that can be obtained from each variable. Three measures will be examined here: the Critical Success Index (CSI), the Heidke Skill Statistic (HSS), and entropy (ENT). Because each measure captures a different aspect of performance quality the choice of the "best" predictors will depend on the choice of the measure. The definitions of these variables are available upon request.
Additionally, it is important to identify the variables - good or bad - that are correlated with one another. In this way one can further reduce the number of variables that must be examined. Pearson's correlation coefficient, r, can again be utilized to this end. However, the rare-event nature of the data sets under study can cause r to be excessively large. For this reason the correlation coefficients must be computed for the two classes (nontornados and tornados), separately. The variables that are highly correlated for both classes may be considered statistically equivalent.
Finally, instead of asking "What are the best predictors?", one may ask "Which are the variables that capture most of the variance in the data?" This is the topic of principal components analysis. The two questions are related if "economy", i.e. the smallest set of "useful" variables is of concern. However, as will be shown, the nature of the data does not allow for a drastic reduction in the number of variables. Furthermore, the set of variables that account for most of the variance in the data are not necessarily the best predictors of tornados. For these reasons the second question will not be addressed in detail.
Below is a list of all the variables examined. The numbers to the left indicate the label of the variable as discussed in the remainder of this report. The variables referred to as "TVS" are those of the Tornado Detection Algorithm (TDA), and the remaining variables are computed by the Mesocyclone Detection Algorithm (MDA). Other abbreviations are self-evident.
| 8 | range | 55 | TREND base |
| 10 | base | 56 | TREND depth |
| 11 | depth | 57 | TREND strength rank |
| 12 | strength rank | 58 | TREND low-level diameter |
| 13 | low-level diameter | 59 | TREND maximum diameter |
| 14 | maximum diameter | 60 | TREND height of max diam |
| 15 | height of maximum diameter | 61 | TREND low-level rot velocity |
| 16 | low-level rotational velocity& | 62 | TREND maximum rot velocity |
| 17 | maximum rotational velocity | 63 | TREND height of max rot vel |
| 18 | height of max rot velocity | 64 | TREND low-level shear |
| 19 | low-level shear | 65 | TREND maximum shear |
| 20 | maximum shear | 66 | TREND height of max shear |
| 21 | height of maximum shear | 67 | TREND low-level g-t-g del v |
| 22 | low-level gate-to-gate del v | 68 | TREND maximum g-t-g del v |
| 23 | max g-t-g del v | 69 | TREND hght max g-t-g del v |
| 24 | height of max g-t-g del v | 70 | TREND MSI weighted |
| 25 | core base | 71 | TREND MSIr "rank" |
| 26 | core depth | 72 | TREND relative depth |
| 27 | age | 73 | TREND low-level convergence |
| 28 | MSI weighted by ... | 74 | TREND mid-level convergence |
| 29 | strength index (MSIr) "rank" | 75 | TREND Vrtclly-int rot vel |
| 30 | relative depth | 76 | TREND Vrtclly-int Shear |
| 31 | low-level convergence | 77 | TREND Vrtclly-int g-t-g del v |
| 32 | mid-level convergence | 78 | TREND Vrtclly-int Rssmssn con |
| 33 | TVS base | 79 | TREND Vrtclly-int Rsmssn rot |
| 34 | TVS depth | 80 | TVS TSI |
| 35 | TVS low-lvl gtg del v | 81 | TVS CAPE |
| 36 | TVS max gtg del v | 82 | TVS SREH |
| 37 | TVS ht max gtg del v | 83 | TVS TREND base |
| 38 | TVS low-lvl shear | 84 | TVS TREND depth |
| 39 | TVS max shear | 85 | TVS TREND low-lvl gtg del v |
| 40 | TVS h max shear | 86 | TVS TREND max gtg del v |
| 48 | CAPE | 87 | TVS TREND h max gtg del v |
| 49 | SREH | 88 | TVS TREND low-lvl shear |
| 50 | Vertically-integrated rot v | 89 | TVS TREND max shear |
| 51 | Vertically-integrated Shear | 90 | TVS TREND h of max shear |
| 52 | Vertically-integrated g-t-g del | v91 | TVS TREND TSI |
| 53 | Vrtclly-int Rasmussen convergence | 93 | TVS Range |
| 54 | Vrtclly-intgrtd Rsmssn rotation |
Results:
Even without any analysis the number of tornadic and nontornadic.circulations in the data sets imply that the a priori probability is.about 2% that an MDA-detected circulation is tornadic. Similarly, a.TDA-detected circulation has an a priori probability of about 9% to.be tornadic, while a joint detection by MDA and TDA implies a 34% probability.of a tornadic circulation. The drastic increase from single-digit.probabilities for MDA and TDA individually to the double-digit probability of.a joint detection is probably one of the best reasons for utilizing MDA.and TDA *jointly*.
The posterior probabilities for all the variables are enclosed as.Appendix I for MDA, Appendix II for TDA, and Appendix III for joint MDA/TDA.data. These figures can be consulted to view the behavior of any variable.of interest. The summary of the "best" predictors according to this method.is in the Summary section.
The correlation coefficients, r, between ground truth and each of the.variables in MDA, TDA, and MDA/TDA jointly, are displayed in Figure 2..The "best" predictors are marked on each plot.

Figure 2: The correlation coefficients between the dependent variable.(ground truth) and each of the independent variables for MDA (top), TDA.(middle), and MDA/TDA jointly (bottom).
As for the measure-based method, Figures 3-5 show the highest values of the.three measures (y-axis) as obtained by dichotomizing each variable (x-axis)..The outstanding predictors according to this method are labeled on each graph.

Figure 3: The maximum value of three measures obtained by dichotomizing the MDA variables.

Figure 4: The maximum value of three measures obtained by dichotomizing the TDA variables.

Figure 5: The maximum value of three measures obtained by dichotomizing the MDA/TDA variables.
Another important quantity in this method is the value of the decision threshold (warning/no-warning) that yields the highest performance; those quantities are presented in Table 1 for some of the best predictors. The values of the decision thresholds for the remaining variables are available upon request.
Table 1: The values of decision threshold, required to yield the maximum of the corresponding performance measure, for some of the best predictors.
| Predictor | Threshold for CSI | Predictor | Threshold for HSS | Predictor | Threshold for ENT | |
| MDA | x12 | 5.0 | x12 | 5.0 | x12 | 1.0 |
| x27 | 35.0 | x27 | 35.0 | x27 | 4.0 | |
| x11 | 9961.0 | x11 | 10411.0 | x26 | 3001.0 | |
| TDA | x35 | 19.0 | x35 | 9.0 | x33 | 1817.0 |
| x36 | 37.0 | x36 | 41.0 | x35 | 19.0 | |
| x80 | 1750.0 | x80 | 1994.0 | x81 | 1906.0 | |
| MDA/TDA | x73 | -40.0 | x81 | 1906.0 | x81 | 1906.0 |
| x31 | 10.0 | x33 | 1778.0 | x25 | 1813.0 | |
| X39 | 18.0 | x25 | 1813.0 | x39 | 18.0 |
Pairs of variables with high (>= 0.8) correlation coefficients for both tornadic and nontornadic circulations are given in Table 2. r_0[x][y] represents the correlation coefficient.between x and y, for nontornadic circulations, and r_1 represents the same quantity for tornadic circulations. The probability that these values of r could be obtained by chance was computed and was found to to be zero (to 12 decimal places). Therefore, to a high level of significance, these variables are statistically equivalent.
| MDA: | r0[23][17]=0.806, r1=0.826 | r0[67][61]=0.842, r1=0.876 |
| r0[25][10]=0.994, r1=0.841 | r0[68][62]=0.859, r1=0.857 | |
| r0[50][28]=0.957, r1=0.899 | r0[71][70]=0.825, r1=0.886 | |
| r0[52][28]=0.832, r1=0.863 | r0[77][75]=0.879, r1=0.875 | |
| r0[52][50]=0.864, r1=0.820 | ||
| TDA: | r0[39][38]=0.876, r1=0.874 | r0[89][88]=0.863, r1=0.812 |
| r0[40][37]=0.980, r1=0.953 | r0[90][87]=0.986, r1=0.957 | |
| r0[86][85]=0.820, r1=0.842 | r0[91][86]=0.849, r1=0.865 | |
| MDA/TDA: | r0[39][38]=0.812, r1=0.887 | r0[54][51]=0.869, r1=0.924 |
| r0[40][37]=0.980, r1=0.938 | r0[79][76]=0.890, r1=0.899 | |
| r0[50][17]=0.807, r1=0.830 | r0[86][85]=0.806, r1=0.824 | |
| r0[51][20]=0.851, r1=0.828 | r0[90][87]=0.982, r1=0.941 | |
| r0[52][23]=0.868, r1=0.865 | r0[91][86]=0.859, r1=0.852 |



Figure 6: Scatter plots between variables with r_0>0.9 and r_1>0.9. The top figure is for MDA, the middle two are for TDA, and the last two are for MDA/TDA. The larger (smaller) circles represent the tornadic (nontornadic) circulations.
As mentioned previously, another way of reducing the number of variables is via Principal Components Analysis (PCA). The variance accounted for by first .Principal Component (PC) is designed to be larger than that of the second PC, etc. The utility of PCA is realized if only the first few PCs turn out to account for most (say 95%) of the total variance. Otherwise.it is wise to retain the entire set of variables. In the present case, as seen from Figure 7, almost all of the PCs are required to account for 95% of the variance in the MDA, TDA, and MDA/TDA data sets. Since there is no significant reduction in the number of variables PCA will no longer be considered a viable option.


Figure 7: Cumulative variance as a function of the number of principal components, for MDA, TDA, and MDA/TDA.
Linear discriminant analysis was also explored for rank-ordering the variables but was abandoned because the assumptions of normality and homoelasticity of the distributions are violated in the current data.
Summary:
Without almost any analysis one can conclude that an MDA detection alone .corresponds to a 2% probability of tornado. Although a TDA detection corresponds to a 9% probability of tornado, a joint MDA/TDA detection raises that probability to 34%. This is a good reason for utilizing MDA and TDA *jointly*. Another good reason for utilizing MDA in addition to TDA is that the top-3 best predictors in the joint MDA/TDA, according to some of the methods employed herein, are all MDA variables.
The outstanding predictors for MDA, TDA, and MDA/TDA, according to the posterior probability method are: x30, x65, x64, x27, x11 (for MDA), x35, x80, x36 (for TDA), and x51, x22, x19, x20, x81 (for MDA/TDA). As for the remaining methods, the best predictors are marked in Figures 2-5. Note that the choice of the best predictor depends on the choice of the method. This is because each method captures a different aspect of "goodness."
Also note that a "good" predictor for MDA may be a "bad" predictor for TDA, or vice versa. On the other hand, some variables are "bad" for all three algorithms. A few examples are as follows: x11 is good in MDA, but bad in MDA/TDA; x81 is bad in MDA and TDA, but good in MDA/TDA; meanwhile x82 is bad in MDA, TDA, and MDA/TDA.
Appendix I:
The posterior probability of tornado (solid curve), P_1(x)=N_1(x) / ( N_1(x)+N_0(x) ), as a function of MDA variables. The dotted curve is N_1(x) itself, i.e. the number of tornadic circulations detected by the MDA for a given x. See the text for a list of the variables.
Available upon request (contact Caren Marzban).
Appendix II:
The posterior probability of tornado (solid curve),P_1(x)=N_1(x)/ ( N_1(x)+N_0(x) ), as a function of TDA variables. The dotted curve is N_1(x) itself, i.e. the number of tornadic circulations detected by the TDA for a given x. See the text for a list of the variables.
Available upon request (contact Caren Marzban).
Appendix III:
The posterior probability of tornado (solid curve), .P_1(x)=N_1(x)/ ( N_1(x)+N_0(x) ), as a function of MDA/TDA variables. .The dotted curve is N_1(x) itself, i.e. the number of tornadic circulations.detected by the MDA and TDA, jointly, for a given x. See the text for a list of the variables.
Available upon request (contact Caren Marzban).
vrf 2/10/97
NSSL Home » Warning R&D Division » Warning Applications Research » Tornado Detection Algorithm