Topic 4: Forecast "Goodness"

References (starred references are particularly important):

*Murphy, A. H., 1993: What is a good forecast? An essay on nature of goodness in weather forecasting. Wea. Forecasting, 8, 281-293. (PDF)

Roebber, P. J., and L. F. Bosart, 1996: The complex relationship between forecast skill and forecast value: A real-world analysis. Wea. Forecasting, 11, 544-559. (PDF)

The 1993 Murphy paper presents a fundamental qualitative philosophical underpinning for forecast evaluation. It is one of the most important papers written on the subject of forecast evaluation.

Three kinds of forecast "goodness":

  1. Consistency: Are the forecasts identical to the forecaster's "true" beliefs? Two particular points arise in this regard. First, verification systems should not be set up so that a forecaster makes a forecast that disagrees with his or her best estimate of the weather to be expected. The second issue is the collection of observations to compare with forecasts. Just as forecasts should not be affected by the verification system, neither should the collection of observations. Forecasters should not collect observations to verify their own forecasts.
  2. Quality: The correspondence of forecasts with events. This is the traditional domain of forecast verification. It is multifaceted and simple, single measures are incapable of representing the full complexity of the relationship between forecasts and events. In order to measure the quality properly, we need to consider the joint probability distribution function of forecasts and events [p(f,x)]. Included in quality-related concepts are bias, accuracy, skill, discrimination, resolution, reliability, and sharpness.
  3. Value: The increase in utility for a forecast user as a result of using a forecast (measurable in economic terms or in some non-economic utility function). Forecasts have no intrinsic value of their own and acquire value only by being used. It is important to note that value is associated not with weather-sensitive users, but with weather-information sensitive users.

Relationships between kinds of goodness:

In general, the relationship is complex between any two kinds of goodness. Increasing quality of forecasts does not necessarily increase the value of forecasts. An important reationship is between consistency and value. Murphy shows that, for the simple "cost-loss" problem, the way to maximize expected losses for the most users (assumed to be "rational" and risk-neutral) is to issue categorical forecasts. Unless the forecaster knows the utility function of the user, they maximize the user's expected utility by issuing forecasts that represent their true beliefs. In general, unless making a forecast for a single user, forecasters cannot know the utility function of their users.

Consistency and Quality

This relates to the notion of "proper" scoring systems, which are those in which a forecaster receives the optimal expected score for a forecast by forecasting exactly what he or she expects to occur. Improper systems are easy to identify: if a warning forecaster is evaluated only by the probability of detection of tornadoes, then the score is maximized by issuing a tornado warning for every forecast location and time, regardless of what the forecaster actually believe will happen. As an example of calculating whether a scoring system is strictly proper, see the discussion of the family of scoring rules that includes the Brier score.

Consistency and Value

Using a similar approach as we did to look at strictly proper scoring systems, we can evaluate expected expenses as a function of forecast probability vs. true belief. It can be shown (see the Murphy article) that the use of categorical forecasts increases expenses for the maximum number of users.

Quality and Value

The relationship between quality and value is, in general, complex. For excellent examples from prescriptive studies of model users, see Roebber and Bosart.