## Assessing Probabilistic Forecasts Graphically

References

Mason, I., 1982:

Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press. (Chapter 7)

Wilson, L. J., 2000:

 F N Precip % 0 175 3 1.7 5 1 0 0 10 48 4 8.3 20 37 5 13.5 30 20 4 20 40 10 4 40 50 10 5 50 60 13 5 38.5 70 13 7 53.8 80 3 3 100 90 2 1 50 100 6 3 50

F is the forecast value in per cent, N is the number of forecasts made with that value, Precip is the number of times that precipitation occurred for that forecast and % is the frequency (in per cent) of precipitation for that forecast value.

There are two ways to display this that are useful. The first is the attributes diagram (discussed in Wilks), focusing on the refinement-calibration decomposition of p(f,x). The second is the Relative (or Receiver) Operating Characteristics (ROC) curve, focusing on the likelihood-base rate calibration.

### Attributes Diagrams

The first step in creating an attribute diagram is to plot the observed frequency for each forecast value. Formally, this is a plot of p(x|f) vs. f. (This step creates what is referred to as a "reliability diagram.") We can add some additional lines to the figure to help in understanding. The first is a diagonal line running from the point at the bottom left (0,0) to the upper right (1,1 or 100%,100%). This is the line of perfect reliability. Points from the reliability curve on this line have perfect reliability (zero), such as the 40% and 50% points in this case. Second we add horizontal and vertical lines at the sample climatology. In this case, the sample climatological frequency is 44/338 (the sum of N and Precip above) or 13%. The horizontal line is referred to as the no resolution line. Points from the reliability curve on this line have no resolution (zero), such as the 20% forecast. We can draw a line halfway between the no resolution and perfect reliability lines. This is the "no-skill" line, because points on it do not add to the Brier skill score. Points closer to the no resolution line than the perfect reliability line (20%, 30%, 90%, and 100%, in this case) contribute negatively to the Brier skill score. Finally, we often add a line, dashed in this case, to show the relative frequency of use of each forecast category. For example, the 0% forecast is used a little over 50% of the time.