We acknowledge that many users may be sensitive to small temperature changes at some critical temperatures (e.g., a decision about travel when precipitation is forecast and the temperature is near freezing or for load forecasting for power companies where small improvements can save tens of thousands of dollars.).
It is important to note that none of our comments about the performance of human forecasting should be interpreted in terms of performance relative to other NWS forecast offices. We believe that, in the context of a complete distributions-oriented verification program, intercomparison of office performance is a desirable thing. However, since that verification program does not exist, we cannot make those comparisons.
We note that the description of those bins, as given by the Regional Operations Manual Letter (NWS Southern Region Headquarters 1984), is ambiguous. While we have chose to collect the forecasts in 1-5 [[ring]]F, 6-10 [[ring]]F, 11-15 [[ring]]F, etc., bins, there is no guidance in the NWS Operations Manual as to the boundaries of the bins.
We note that the use of persistence as a baseline is one way of reducing the dimensionality of the verification problem. The observed range of high temperatures over the period was 17 [[ring]]F to 103 [[ring]]F. The dimensionality of the evaluation of one forecast system over that range would be 87*87 - 1 = 7568, while for the day-to-day changes, it is only (73*73) -1 = 5328, a reduction of 30%. Other methods of reducing the dimensionality by stratifying the results, such as departures from climatology, also exist.
Note that in the tables of the conditional probability of observations given the forecasts (Table 4), comparisons between values must be done along a row, while for tables of the conditional probability of forecasts given the observations (Table 5), comparions must be done along a column.
As discussed by Murphy et al. (1989), a model of the expected value of the forecast given a particular observation, E(f |x), can also be constructed. We have chosen to include only the model for E(x|f) here.
As noted by Murphy (1993) and in the introduction here, forecasts take on value only by being used by someone. We are using the term qualitatively here, under the presumption that large (~10 [[ring]]F) improvements in a temperature forecast will provide value for virtually all users.