(Updated 29 August 2003. Comments to Harold
Collecting and Processing the Data
Severe Weather Reports
We have used two datasets to create the figures shown here. The first
contains reports of all kinds of severe weather (defined in the United
States as either tornadoes, thunderstorm winds of at least 58 mph, or hail
of at least 3/4 inch in diameter) collected by National
Weather Service (NWS) meteorologists from all over the United States
and archived at the NWS Storm Prediction
Center. Although data of this kind have been collected since
1950, we have focused on the portion of the record since 1980. We've
limited our consideration to this time period because of the large increase
in the number of reports over time (almost two orders of magnitude since
1950). Much of this increase is due to increased efforts to collect
the data. We have to compromise between having a record that shows
as little of an effect of the increase in reports and a long enough record
to be meaningful.
The second dataset was produced by Tom Grazulis of The
Tornado Project. It contains significant tornadoes (rated F2
or greater in damage) since 1680 in the United States. In the same
way as we have had to limit our attention to the period beginning in 1980
with the NWS dataset, we have limited our attention to 1921-1995 in the
We know that the reports aren't always a perfect record; events are
missed and erroneous reports are collected. We have tried to focus
on aspects of the reports that we believe may be the most reliable:
the location and the date of the reports. For the NWS data, the location
is given by latitude/longitude coordinates. For the Grazulis dataset,
it is by county. We have taken the location and mapped each report
onto a grid. (Technically, it is a Lambert conic conformal grid,
true at 30 and 60 N.) The grid is approximately 80 km on a side,
so that the area associated with each grid point is roughly equivalent
to a circle 25 miles in diameter. For simplicity and for consistency
in some of the processing that goes on later, we have taken only the touchdown
location of tornadoes. Also, we have limited our attention only to
the question of whether or not a particular kind of severe weather event
occurred on a day, not how many occurred on any particular day. Thus,
the probability maps and graphs that are displayed should be interpreted
as representing probability of one or more of the severe weather events
occurring within 25 miles of the location. The total threat maps show the
average number of days per year with one or more of the severe weather
events occurring within 25 miles of the location.
We've used a statistical technique known as nonparametric density estimation
to produce the probabilities. In a nutshell, we smooth the reports
in time and space. (The technical details
are described below.) That means that we think that a tornado occurring
on 3 May tells us something about the likelihood that one could occur on
2 May or 4 May, but not very much about how likely one is on 3 March.
Similarly, a tornado at Fort Worth, Texas, tells us something about the
probability of a tornado at Dallas, Texas, but not very much about the
probability at Chicago, Illinois. We've included information from different
time periods of the record to show how variable the reports are.
The maps are based on the reports from a particular time period. The procedure
is as follows:
The smoothing is intentionally heavy, trying to leave only the strong signals.
The smoothing leads to a slight underestimate of the probabilities which,
fortuitously is about the same amount as the area of a 25 mile radius circle
is smaller than a 80 km square, so that the calculations of threat on an
80 km grid is close enough to the threat within a 25 mile radius to be
less than the other sources of error.
Reports for each day are put onto a grid 80 km x 80 km. Thanks for this
go to Mike Kay of the SPC, without whom this would have gone nowhere.
If one or more reports occur in a grid box, that box is assigned the value
"1" for the day. If no reports occur, it's a zero.
The raw frequency for each day at each grid location is found for the period
(number of "1" values divided by number of years) to get a raw annual cycle.
The raw annual cycle at each point is smoothed in time, using a Gaussian
filter with a standard deviation of 15 days.
The smoothed time series are then smoothed in space with a 2-D Gaussian
filter (SD = 120 km in each direction).
Back to Thunderstorm Hazards page