References (starred references are particularly important):
*Murphy, A. H., 1993: What is a good forecast? An essay on nature of goodness in weather forecasting. Wea. Forecasting, 8, 281-293. (PDF)
Roebber, P. J., and L. F. Bosart, 1996: The complex relationship between forecast skill and forecast value: A real-world analysis. Wea. Forecasting, 11, 544-559. (PDF)
The 1993 Murphy paper presents a fundamental qualitative philosophical underpinning for forecast evaluation. It is one of the most important papers written on the subject of forecast evaluation.
In general, the relationship is complex between any two kinds of goodness. Increasing quality of forecasts does not necessarily increase the value of forecasts. An important reationship is between consistency and value. Murphy shows that, for the simple "cost-loss" problem, the way to maximize expected losses for the most users (assumed to be "rational" and risk-neutral) is to issue categorical forecasts. Unless the forecaster knows the utility function of the user, they maximize the user's expected utility by issuing forecasts that represent their true beliefs. In general, unless making a forecast for a single user, forecasters cannot know the utility function of their users.Consistency and Quality This relates to the notion of "proper" scoring systems, which are those in which a forecaster receives the optimal expected score for a forecast by forecasting exactly what he or she expects to occur. Improper systems are easy to identify: if a warning forecaster is evaluated only by the probability of detection of tornadoes, then the score is maximized by issuing a tornado warning for every forecast location and time, regardless of what the forecaster actually believe will happen. As an example of calculating whether a scoring system is strictly proper, see the discussion of the family of scoring rules that includes the Brier score. Consistency and Value Using a similar approach as we did to look at strictly proper scoring systems, we can evaluate expected expenses as a function of forecast probability vs. true belief. It can be shown (see the Murphy article) that the use of categorical forecasts increases expenses for the maximum number of users. Quality and Value The relationship between quality and value is, in general, complex. For excellent examples from prescriptive studies of model users, see Roebber and Bosart.