An Analysis of MDA and TDA Data
by
Caren Marzban

Preliminaries:

The following analysis was performed to address the question "What can the data offer in the way of guidance for a tornado forecaster?" More specifically, an attempt will be made to identify the particular variables that appear to be the best predictors of tornados.

Three data sets will be examined - the circulations detected by the MDA, the TDA, and those detected by MDA and TDA, jointly. The sample sizes are 59691, 5860, and 891, and the number of tornadic (N_1) and nontornadic (N_0) circulations in each data set is (N_0=58786, N_1=905), (N_0=5348, N_1=512), and (N_0=591, N_1=300), respectively. These constitute 29 days of data. For further details regarding the data Gregory Stumpf and DeWayne Mitchell of the National Severe Storms Laboratory should be consulted.

It is well-known, though often over-looked, that the issue of the best predictors is best addressed in a univariate fashion, i.e. one variable at a time. There are many reasons for this, but an illustration is offered by considering a linear regression model. It is a fact that the regression slopes in a multivariate analysis are entirely meaningless as measures of predictive strength if there exists any collinearity among the independent variables. This ambiguity is made worse in the context of nonlinear models partly due to the existence of local minima in the error function. Finally, the presence of interaction terms in a nonlinear regression model renders the question of the best predictors entirely meaningless. In short, a univariate analysis is often the most reliable method of assessing the predictive strength of a set of independent variables.

In this report, several univariate approaches will be employed to address the question of the best predictors of tornados.

Method:

A first approach is to examine the posterior probability of a tornado event, given the value of an independent variable, P_1(x). This probability can be calculated from the conditional frequency distribution, N_i(x), at a given value of x, where i=0,1 refers to nontornados and tornados, respectively. Specifically, Bayes' theorem implies

P_1(x)= N_1(x) / ( N_1(x)+N_0(x) )

A "good" predictor is one whose P_1(x) changes rapidly as a function of x. An example of a "good" predictor and a "bad" predictor is given in Figure 1. The solid curve is P_1(x) as a function of x, and the dotted curve is N_1(x) as a function of x. The latter quantity can be interpreted as placing a measure of confidence on the former. For instance, a P_1(x) that is accompanied by a relatively large N_1(x) is statistically more significant than one that has a small N_1(x). One may rank-order the independent variables according to the change in P_1(x) over the range of x.

Figure 1: Examples of a "good" predictor and a "bad" predictor (solid curves). Note the change of 25% in y-axis of the left graph in contrast to only 2% in the right graph. Dotted curves are the number of tornados at the corresponding value of x.

Another method for rank-ordering the variables is according to their correlations with the dependent variable (i.e. tornado ground truth), specifically, Pearson's correlation coefficient, r. When both the independent and the dependent variable are continuous r is a measure of linear correlation between the two. Although in the current case the dependent variable is binary (tornado/no-tornado), r does still offer a measure of correlation, although a better description may be association.

An alternative approach is offered by considering the way in which a forecaster uses the variables. He/she may be interested in issuing forecasts that maximize some dichotomous measure of performance. As such, an important quantity is the value of the decision threshold that reduces the continuous variable to binary (warning/no-warning). Consequently, the question of the "best" predictor becomes one of rank-ordering the variables according to the maximum value of some measure of performance that can be obtained from each variable. Three measures will be examined here: the Critical Success Index (CSI), the Heidke Skill Statistic (HSS), and entropy (ENT). Because each measure captures a different aspect of performance quality the choice of the "best" predictors will depend on the choice of the measure. The definitions of these variables are available upon request.

Additionally, it is important to identify the variables - good or bad - that are correlated with one another. In this way one can further reduce the number of variables that must be examined. Pearson's correlation coefficient, r, can again be utilized to this end. However, the rare-event nature of the data sets under study can cause r to be excessively large. For this reason the correlation coefficients must be computed for the two classes (nontornados and tornados), separately. The variables that are highly correlated for both classes may be considered statistically equivalent.

Finally, instead of asking "What are the best predictors?", one may ask "Which are the variables that capture most of the variance in the data?" This is the topic of principal components analysis. The two questions are related if "economy", i.e. the smallest set of "useful" variables is of concern. However, as will be shown, the nature of the data does not allow for a drastic reduction in the number of variables. Furthermore, the set of variables that account for most of the variance in the data are not necessarily the best predictors of tornados. For these reasons the second question will not be addressed in detail.

Below is a list of all the variables examined. The numbers to the left indicate the label of the variable as discussed in the remainder of this report. The variables referred to as "TVS" are those of the Tornado Detection Algorithm (TDA), and the remaining variables are computed by the Mesocyclone Detection Algorithm (MDA). Other abbreviations are self-evident.

8 range 55 TREND base
10 base 56 TREND depth
11 depth 57 TREND strength rank
12 strength rank 58 TREND low-level diameter
13 low-level diameter 59 TREND maximum diameter
14 maximum diameter 60 TREND height of max diam
15 height of maximum diameter 61 TREND low-level rot velocity
16 low-level rotational velocity& 62 TREND maximum rot velocity
17 maximum rotational velocity 63 TREND height of max rot vel
18 height of max rot velocity 64 TREND low-level shear
19 low-level shear 65 TREND maximum shear
20 maximum shear 66 TREND height of max shear
21 height of maximum shear 67 TREND low-level g-t-g del v
22 low-level gate-to-gate del v 68 TREND maximum g-t-g del v
23 max g-t-g del v 69 TREND hght max g-t-g del v
24 height of max g-t-g del v 70 TREND MSI weighted
25 core base 71 TREND MSIr "rank"
26 core depth 72 TREND relative depth
27 age 73 TREND low-level convergence
28 MSI weighted by ... 74 TREND mid-level convergence
29 strength index (MSIr) "rank" 75 TREND Vrtclly-int rot vel
30 relative depth 76 TREND Vrtclly-int Shear
31 low-level convergence 77 TREND Vrtclly-int g-t-g del v
32 mid-level convergence 78 TREND Vrtclly-int Rssmssn con
33 TVS base 79 TREND Vrtclly-int Rsmssn rot
34 TVS depth 80 TVS TSI
35 TVS low-lvl gtg del v 81 TVS CAPE
36 TVS max gtg del v 82 TVS SREH
37 TVS ht max gtg del v 83 TVS TREND base
38 TVS low-lvl shear 84 TVS TREND depth
39 TVS max shear 85 TVS TREND low-lvl gtg del v
40 TVS h max shear 86 TVS TREND max gtg del v
48 CAPE 87 TVS TREND h max gtg del v
49 SREH 88 TVS TREND low-lvl shear
50 Vertically-integrated rot v 89 TVS TREND max shear
51 Vertically-integrated Shear 90 TVS TREND h of max shear
52 Vertically-integrated g-t-g del v91 TVS TREND TSI
53 Vrtclly-int Rasmussen convergence 93 TVS Range
54 Vrtclly-intgrtd Rsmssn rotation    

Results:

Even without any analysis the number of tornadic and nontornadic.circulations in the data sets imply that the a priori probability is.about 2% that an MDA-detected circulation is tornadic. Similarly, a.TDA-detected circulation has an a priori probability of about 9% to.be tornadic, while a joint detection by MDA and TDA implies a 34% probability.of a tornadic circulation. The drastic increase from single-digit.probabilities for MDA and TDA individually to the double-digit probability of.a joint detection is probably one of the best reasons for utilizing MDA.and TDA *jointly*.

The posterior probabilities for all the variables are enclosed as.Appendix I for MDA, Appendix II for TDA, and Appendix III for joint MDA/TDA.data. These figures can be consulted to view the behavior of any variable.of interest. The summary of the "best" predictors according to this method.is in the Summary section.

The correlation coefficients, r, between ground truth and each of the.variables in MDA, TDA, and MDA/TDA jointly, are displayed in Figure 2..The "best" predictors are marked on each plot.

Figure 2: The correlation coefficients between the dependent variable.(ground truth) and each of the independent variables for MDA (top), TDA.(middle), and MDA/TDA jointly (bottom).

As for the measure-based method, Figures 3-5 show the highest values of the.three measures (y-axis) as obtained by dichotomizing each variable (x-axis)..The outstanding predictors according to this method are labeled on each graph.

Figure 3: The maximum value of three measures obtained by dichotomizing the MDA variables.

Figure 4: The maximum value of three measures obtained by dichotomizing the TDA variables.

Figure 5: The maximum value of three measures obtained by dichotomizing the MDA/TDA variables.

Another important quantity in this method is the value of the decision threshold (warning/no-warning) that yields the highest performance; those quantities are presented in Table 1 for some of the best predictors. The values of the decision thresholds for the remaining variables are available upon request.

Table 1: The values of decision threshold, required to yield the maximum of the corresponding performance measure, for some of the best predictors.

Table 1: The values of decision threshold, required to yield the maximum of the corresponding performance measure, for some of the best predictors.
  Predictor Threshold for CSI Predictor Threshold for HSS Predictor Threshold for ENT
MDA x12 5.0 x12 5.0 x12 1.0
  x27 35.0 x27 35.0 x27 4.0
  x11 9961.0 x11 10411.0 x26 3001.0
TDA x35 19.0 x35 9.0 x33 1817.0
  x36 37.0 x36 41.0 x35 19.0
  x80 1750.0 x80 1994.0 x81 1906.0
MDA/TDA x73 -40.0 x81 1906.0 x81 1906.0
  x31 10.0 x33 1778.0 x25 1813.0
  X39 18.0 x25 1813.0 x39 18.0

Pairs of variables with high (>= 0.8) correlation coefficients for both tornadic and nontornadic circulations are given in Table 2. r_0[x][y] represents the correlation coefficient.between x and y, for nontornadic circulations, and r_1 represents the same quantity for tornadic circulations. The probability that these values of r could be obtained by chance was computed and was found to to be zero (to 12 decimal places). Therefore, to a high level of significance, these variables are statistically equivalent.

Table 2: Correlation coefficients for some of the highly correlated variables.in the three data sets. The corresponding pair of variables may be considered statistically equivalent.
MDA: r0[23][17]=0.806, r1=0.826 r0[67][61]=0.842, r1=0.876
  r0[25][10]=0.994, r1=0.841 r0[68][62]=0.859, r1=0.857
  r0[50][28]=0.957, r1=0.899 r0[71][70]=0.825, r1=0.886
  r0[52][28]=0.832, r1=0.863 r0[77][75]=0.879, r1=0.875
  r0[52][50]=0.864, r1=0.820  
TDA: r0[39][38]=0.876, r1=0.874 r0[89][88]=0.863, r1=0.812
  r0[40][37]=0.980, r1=0.953 r0[90][87]=0.986, r1=0.957
  r0[86][85]=0.820, r1=0.842 r0[91][86]=0.849, r1=0.865
MDA/TDA: r0[39][38]=0.812, r1=0.887 r0[54][51]=0.869, r1=0.924
  r0[40][37]=0.980, r1=0.938 r0[79][76]=0.890, r1=0.899
  r0[50][17]=0.807, r1=0.830 r0[86][85]=0.806, r1=0.824
  r0[51][20]=0.851, r1=0.828 r0[90][87]=0.982, r1=0.941
  r0[52][23]=0.868, r1=0.865 r0[91][86]=0.859, r1=0.852

Figure 6: Scatter plots between variables with r_0>0.9 and r_1>0.9. The top figure is for MDA, the middle two are for TDA, and the last two are for MDA/TDA. The larger (smaller) circles represent the tornadic (nontornadic) circulations.

As mentioned previously, another way of reducing the number of variables is via Principal Components Analysis (PCA). The variance accounted for by first .Principal Component (PC) is designed to be larger than that of the second PC, etc. The utility of PCA is realized if only the first few PCs turn out to account for most (say 95%) of the total variance. Otherwise.it is wise to retain the entire set of variables. In the present case, as seen from Figure 7, almost all of the PCs are required to account for 95% of the variance in the MDA, TDA, and MDA/TDA data sets. Since there is no significant reduction in the number of variables PCA will no longer be considered a viable option.

Figure 7: Cumulative variance as a function of the number of principal components, for MDA, TDA, and MDA/TDA.

Linear discriminant analysis was also explored for rank-ordering the variables but was abandoned because the assumptions of normality and homoelasticity of the distributions are violated in the current data.

Summary:

Without almost any analysis one can conclude that an MDA detection alone .corresponds to a 2% probability of tornado. Although a TDA detection corresponds to a 9% probability of tornado, a joint MDA/TDA detection raises that probability to 34%. This is a good reason for utilizing MDA and TDA *jointly*. Another good reason for utilizing MDA in addition to TDA is that the top-3 best predictors in the joint MDA/TDA, according to some of the methods employed herein, are all MDA variables.

The outstanding predictors for MDA, TDA, and MDA/TDA, according to the posterior probability method are: x30, x65, x64, x27, x11 (for MDA), x35, x80, x36 (for TDA), and x51, x22, x19, x20, x81 (for MDA/TDA). As for the remaining methods, the best predictors are marked in Figures 2-5. Note that the choice of the best predictor depends on the choice of the method. This is because each method captures a different aspect of "goodness."

Also note that a "good" predictor for MDA may be a "bad" predictor for TDA, or vice versa. On the other hand, some variables are "bad" for all three algorithms. A few examples are as follows: x11 is good in MDA, but bad in MDA/TDA; x81 is bad in MDA and TDA, but good in MDA/TDA; meanwhile x82 is bad in MDA, TDA, and MDA/TDA.

Appendix I:

The posterior probability of tornado (solid curve), P_1(x)=N_1(x) / ( N_1(x)+N_0(x) ), as a function of MDA variables. The dotted curve is N_1(x) itself, i.e. the number of tornadic circulations detected by the MDA for a given x. See the text for a list of the variables.

Available upon request (contact Caren Marzban).

Appendix II:

The posterior probability of tornado (solid curve),P_1(x)=N_1(x)/ ( N_1(x)+N_0(x) ), as a function of TDA variables. The dotted curve is N_1(x) itself, i.e. the number of tornadic circulations detected by the TDA for a given x. See the text for a list of the variables.

Available upon request (contact Caren Marzban).

Appendix III:

The posterior probability of tornado (solid curve), .P_1(x)=N_1(x)/ ( N_1(x)+N_0(x) ), as a function of MDA/TDA variables. .The dotted curve is N_1(x) itself, i.e. the number of tornadic circulations.detected by the MDA and TDA, jointly, for a given x. See the text for a list of the variables.

Available upon request (contact Caren Marzban).


vrf 2/10/97

NSSL Home » Warning R&D Division » Warning Applications Research » Tornado Detection Algorithm