15th AMS Conference on Weather Analysis and Forecasting
Norfolk, Virginia, 19-23 August 1996, in press



Harold E. Brooks1, John V. Cortinas Jr.1,2, Robert H. Johns3

1NOAA/ERL/National Severe Storms Laboratory

Norman, Oklahoma

2Cooperative Institute for Mesoscale Meteorological Studies

Norman, Oklahoma

3NOAA/NWS/Storm Prediction Center

Norman, Oklahoma


The Storm Prediction Center (SPC) will be responsible for putting out short-term guidance products for hazardous winter weather, beginning in the winter of 1997-8. Since products of this kind have not been issued before, an experiment was conducted in the winter of 1995-6 to test procedures and products. Staff from the SPC and the Mesoscale Applications Group (MAG) of the National Severe Storms Laboratory (NSSL) put out forecasts for freezing rain, heavy snow, and blizzard conditions on 27 days from October-December 1995. Verification is an essential part of any forecasting experiment (Doswell and Flueck 1989). Here, we report on preliminary efforts to verify the forecasts from the Winter Weather Experiment (WWE).
Probabilistic forecasts were made for three winter hazards: freezing rain, snow greater than 4 inches, and blizzard conditions (wind gusts > 35 mph, with visibility <1/4 mile in falling or blowing snow). Forecasters wrote a discussion of the ingredients and processes that might produce the relevant hazards and drew probability contours (20%, 40%, 60%, 80%) on a map of the contiguous United States between the hours of 1800 UTC and 0600 UTC the next day. Forecasts were issued at approximately 1700 UTC. Because of logistical and staffing constraints, the experiment was carried out on weekdays from 23 October to 11 December. A total of 27 forecast days were included.


One of the major concerns of the SPC is the development of a verification program that leads to forecast improvement. There are two separate parts of the observational data base that need to be considered in any such verification. The first is the development of as complete a description of all hazards as is possible. This is the equivalent of the current "Storm Data" for severe thunderstorm reports. While "Storm Data" is not perfect, by any means, it represents the best possible picture available at the present time. However, it takes many months to process and, as a result, there is no immediate feedback for operational forecasters.
The second data base is a rapidly developed, "quick look" set of observations. Since not all National Weather Service (NWS) forecast offices use Local Storm Reports (LSRs) to report winter hazards, the most reliable and complete data set for rapid feedback comes from the standard observation network. This is particularly true for national forecasting centers, who will not have access to local news media and other reports that may reach the local NWS offices. The approach is fraught with difficulties. Clearly, events frequently occur between stations and, thus, can be missed completely. Hazards are not necessarily observed in a way consistent with what one would like to see from a forecasting perspective. For example, the observation of heavy snow (S+) is dependent on the visibility, not snow rate. Accumulations are also not reported at all sites and, frequently, cannot be deduced for the forecast period. As a result of such restrictions, our primary emphasis in verification here is restricted to forecasts of freezing rain (ZR, ZR-).

The forecasts are summarized in Table 1, with the maximum probability used for each event on each day is listed, along with the number of hourly observations of the hazard and number of sites at which it was observed, i.e., if one station reported ZR for 6 hours, there would be 6 observations at 1 site. Note that for the heavy snow observations, the value in the table is for observations of S+, not accumulations of more than 4 inches in 12 hours. Thus, there is a mismatch between the forecasts and observations. Finally, the number of stations within each probability contour for the freezing rain forecasts are listed.

There were only three days (11%) on which some winter hazard was not forecast and none after 30 October. Only four days (15%) had no ZR, ZR-, or S+ observed, with one coming in November. For comparison, Branick (1996) reported that, beginning in November, approximately 80% of all days have some significant winter weather event somewhere in the contiguous United States. Given the mismatch between the S+ and heavy snow accumulation, it seems reasonable to believe that the period of the experiment was not atypical for the number of events in the period. Thus, SPC personnel will have to issue guidance products for winter hazards almost every day during the cold season. During late October and November, it is likely that the additional threat of severe thunderstorms will make the forecasters' task very difficult.


The WWE freezing rain forecasts are the most amenable to verification in a way that is relevant to the rapid feedback problem. (Verification of the experi-mental blizzard condition forecasts is, in a sense, trivial, since the conditions were never observed. The accumu-lation/S+ distinction makes the heavy snow forecast difficult to deal with.) We will treat the probability as being similar to the probability of precipitation (PoP), which has been shown to be equivalent to the expected areal coverage of precipitation (Schaefer and Livingston 1990). We assign the value of the contour line to each point inside of the contour. Since we do not have information about the areal coverage, we attack the problem by counting the number of observing stations within each probability contour that reported ZR or ZR- as events and the number within each contour that did not report ZR or ZR- during the forecast period as non-events. Using this method, we make each map of contours equivalent to a series of point probability forecasts for every point in the contiguous U.S. In total, for the 27 forecast days, the experiment produced approximately 18250 point forecasts of freezing rain. With 44 stations reporting freezing rain, the sample climatological frequency was 0.24%. Information from Table 1 will enable us to construct a contingency table of forecasts and observations (Table 2).
As discussed by Doswell et al. (1990) and Murphy (1996), contingency tables for the forecasting of rare, severe events tend to be dominated by correct forecasts of null events. As a result, an important question is how to deal with the "easy" forecasts of non-events. One possibility, suggested by Murphy (1995) is to stratify the forecasts in a way that eliminates these "easy" forecasts. In our case, intuitively, one would like to eliminate forecasts such as those for areas where the temperature is well above freezing. While we plan to explore more sophsticated conditioning methods in the future, for now, we'll apply a simple estimation of the number of "difficult" forecasts. The "worst" missed forecast of an event was the 8 November forecast, when six sites reported freezing rain, although none was forecast. On that day, 75 sites reported some kind of winter precipitation. Using that as a guide for the number of difficult forecast locations, we get approximately 2000 "difficult" forecasts for the entire WWE. While this is one of the most widespread precipitation cases, which might lead to an overestimate of the number of difficult forecasts, there are obviously a number of locations where winter precipitation did not occur that represent a forecast challenge. Thus, we believe this is a reasonable order of magnitude for the number of difficult forecasts. Another way of looking at this is that it implies that approximately 10% of the U.S. has "difficult" forecasts on a given day. For many winter weather events, we believe this is a conservative estimate.
The contingency table provides us with a complete picture of the forecast performance (Murphy and Winkler 1987). In general, forecasters overforecast the probability of freezing rain, with the frequency of occurrence of freezing rain events being about half the forecast probability, for the non-zero forecasts. This is particularly clear from the reliability diagram (Fig. 1) (Wilks 1995). Although "sophisticated" users of the forecast product could use that information to calibrate the forecasts for their own use (Murphy and Winkler 1987), it seems clear that, in general, it would be better to improve the quality of the forecasts. The overforecasting of rare events is similar, but less severe than in the experimental forecasting of tornadoes (Doswell et al. 1996). One cause may be similar, however. In neither case did the experimental fore-casters have knowledge of the appropriate climatological frequency of the event in question on the proper time and space scales for which they were forecasting simply because that information doesn't exist. This "hole" in their background made it difficult to assess what is a relatively small or large risk of a hazard. Robbins and Cortinas (1996) report on preliminary efforts to determine the climatological frequency of freezing rain. Presumably, this information should be valuable for SPC operational forecasters.


SPC and MAG carried out the WWE for a variety of reasons. Among these were to gain familiarity with forecasting the various phenomena and to experiment with various forecast techniques (Janish et al. 1996). This experience is essential in order to help in the development of quality guidance products for winter hazards. A significant part of learning from any forecasting effort, be it experimental or operational, is the verification of the forecasts. SPC and MAG are committed to developing as complete of a verification system as possible for the operational SPC product suite. In fact, the design of the verification system plays a critical role in the design of the products. Some important lessons were learned from the WWE. In particular:

1. Adequate climatologies need to be developed for the hazards being forecast. Branick (1996) and Robbins and Cortinas (1996) report on those efforts for winter hazards.

2. Procedures to record winter weather events for rapid feedback verification need to be established. The LSR format provides an excellent opportunity to do that (Branick 1996).

3. More attention needs to be paid to writing up complete descriptions of hazardous winter events for "Storm Data". This is essential for the observational data base for the final verification product and for the improvement of the climatological data base.

4. Follow-up work to identify common factors that led to high- and low-quality forecasts to "close the loop" on the verification process is important. A systematic effort has not been begun yet.

The quality of the forecasts in the WWE is not the most important issue here. At the very least, the experiment defined a baseline for future improvement. It also provided significant information on the forecasting process that will lead to changes in the kinds of products and the verification done by the SPC in an operational environment. We believe that a properly designed verification system can be of enormous benefit, both to the SPC and to forecast offices in the field that will use SPC products as guidance.


In particular, we want to thank the forecasters who participated in the WWE: Phillip Bothwell, Hugh Crowter, Dave Keller, Bob Johns, Joel Olson (SPC); John Cortinas, Charlie Crisp, Ron Holle, Paul Janish (MAG); and Mike Branick (NWSFO OUN). Gary Grice (SPC) provided support that allowed the experiment to be carried out. We received helpful comments from Barbara Brown, Chuck Doswell, Allan Murphy, and Dan Wilks about verification issues, and anticipate their further assistance in the design of the SPC verification system.


Branick, M., 1996: A climatology and proposed rating system for significant winter weather in the United States. Wea. Forecasting, submitted.
Doswell, C. A. III, R. Duncomb, H. E. Brooks, and F. H. Carr, 1996: Verification of VORTEX-94 forecasts. Preprints, 15th Conf. on Weather Analysis and Forecasting, Amer. Meteor. Soc., Norfolk, Virginia, this volume.
_____, and J. A. Flueck, 1989: Forecasting and verifying in a field research project: DOPLIGHT '87. Wea. Forecasting, 4, 97-109.
Janish, P. R., C. A. Crisp, J. V. Cortinas Jr., R. L. Holle, and R. H. Johns, 1996: Development of an ingredients based approach to forecasting hazardous winter weather in an operational environment. Preprints, 15th Conf. on Weather Analysis and Forecasting, Amer. Meteor. Soc., Norfolk, Virginia, this volume.
Murphy, A. H., 1995: A coherent method of stratification within a general framework for forecast verification. Mon. Wea. Rev., 123, 1582-1588.
_____, 1996: The Finley affair: A signal event in forecast verification. Wea. Forecasting, 11, 3-20.
_____, and R. L. Winkler, 1987: A general framework for forecast verification. Mon. Wea. Rev., 115, 1330 1338.
Robbins, C. C., and J. V. Cortinas Jr., 1996: A climatology of freezing rain in the contiguous United States: Preliminary results. Preprints, 15th Conf. on Weather Analysis and Forecasting, Amer. Meteor. Soc., Norfolk, Virginia, this volume.
Schaefer, J. T., and R. L. Livingston, 1990: Operational implications of the "Probability of Precipitation". Wea. Forecasting, 5, 354-356.
Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, 467 pp.

Table 1

10/23/950806002 (1)-
11/1/9508005 (1)0-
11/2/95202004 (2)08
11/8/95080014 (6)1-
11/9/95208006 (3)032
11/13/95020002 (1)-
11/14/952080202 (1)5 (3)3
11/15/9508003 (1)0-
11/16/95406002 (2)2 (2)9|0
11/17/95204004 (3)2 (1)2
11/27/9540804020 (9)22 (8)17|10
11/28/9560806004 (3)22|16|4
11/30/956040406 (6)019|7|3
12/1/95402006 (3)2 (1)8|3
12/5/950804003 (2)-
12/11/95406002 (2)011|5
TOTAL---79 (44)49 (27)170|47|7
N(P,X > 0)142481714
Table 1: Summary of WWE forecasts. MAX PROB is the maximum probability (in per cent) contour used on a forecast. Number of hourly observations (sites during period) for hazard in fourth and fifth columns. (Blizzard conditions were not observed. ) ZR FCST PTS is number of observation sites in 20%|40%|60% contours. N(P,X >0) is number of forecast or observation days with greater than zero probability or events. P(MEAN) is the mean forecast maximum probability and P(MEAN|P>0) is the conditional mean forecast maximum probability, given that a non-zero forecast was issued.

Table 2

Forecast Probability

Table 2: Contingency table for point freezing rain forecasts. Italicized values derived from estimated "difficult" forecasts (see text). N(x) [N(f)] are number of events (forecasts). p(x) [p(f)] is unconditional probability of events (forecasts). p(x|f) is conditional probability of event, given a particular forecast.