bp2.html

Forward Into the Past, or How Did You Think Joe Randa Would Do?

by Harold Brooks

Some time ago on the Baseball Prospectus web site, I discussed the problem of evaluating player projections. Now that 1997 is done, it's time to take a look back at how well those people who put their necks on the line and forecast player performance did. For 1997, I've got three primary sources of forecasts: actual performance in 1996 and the forecasts made by STATS in their 1997 Major League Handbook and BP's Vlad neural network. Let me set out a few assumptions about how to do the evaluation:

1) Rates of performance are the most important thing to evaluate. All of the methods give an estimate of playing time, but the crux of the forecasts are how well the rates are estimated. Implicitly, I assume that predicting Mark McGwire to hit 30 HR in 300 PA is a better forecast of a performance if he hits 60 in 600 PA than a forecast of 35 in 600 PA would be.

2) I'm only going to worry about regular or semi-regular players. As a result, I'm limiting the evaluation only to those players who got 300 PA and were forecast to get 300 PA from all three of the methods. Note that this means I'm not evaluating first-year players or guys whose playing time was horribly missed. (That leaves me with 178 players for 1997.)

3) For consistency between the sources, I'm going to evaluate ten predictions for each player--BA, SLG, OBA [(H+BB)/(AB+BB)], OPS, the simplest version of runs created/out and the rates (per PA) of 1B, 2B, 3B, HR, and BB. The rates probably are the hardest test of any forecast system.

4) Forecasting the change from last year is just as important, if not more important, than forecasting the exact magnitude of performance. Predicting that Frank Thomas and Barry Bonds will be among the best offensive players in the league is not very challenging. Predicting that Gary Sheffield will fall off a cliff or that Sandy Alomar, Jr. will have a career year is a challenge.

Here, I'll show only a few measures derived from the study. One that I want to do more with, but haven't yet, is a distance measure for the rates. If you square the error of the forecasts, that gives you a 'distance' between the forecast and observed performance. If you sum the squared errors for the five rates, that sum is the 'distance' squared in 5-dimensional space. This is a fairly strict criteria, since if your forecast is really bad in one of the five, you'll have a large distance. (To make things a little fairer, albeit harder still, I'm normalizing the rate by the mean of the overall sample for each item. That is, the fact that there aren't very many triples and, therefore, no system will be awful at predicting the absolute rate, I divide the predicted and observed rates of triples by the league average rate and use that value as the predicted variable.)

First, let's look at the correlation between the predictions and the observations for the 178 players in the sample. Below are the correlations between 1997 performance and 1) 1996 performance, 2) STATS's predictions, and 3) Vlad's predictions. For another bit of information, I've included the correlation between STATS and Vlad. STATS's has a higher correlation coefficient for all 10 variables than Vlad or the 1996 numbers (except for 1B). Vlad outperforms the 1996 values for the summary measures, but not for the individual rates, except for 3B. It seems that it has a hard time putting the hits in the right boxes, but comes up with the right combinations.

Note, however, that STATS and Vlad are much better correlated with each other than either of them are correlated to the observed performance. That is not an uncommon thing in prediction systems. For instance, numerical weather prediction models correlate better with each other than any of them do with the real atmosphere.

                                          |<----------Rates---------->|
Raw            BA   SLG   OBA   OPS  RC/O   1B    2B    3B    HR    BB
1996-1997    .494  .668  .665  .651  .639  .701  .403  .513  .810  .800  
STATS-1997   .617  .757  .766  .753  .738  .696  .426  .610  .846  .804
Vlad-1997    .520  .703  .705  .703  .673  .619  .256  .558  .806  .794  
S-V (1997)   .813  .882  .902  .891  .877  .774  .506  .789  .915  .910

Below, I show the correlations for year-to-year changes. The picture's pretty much the same as before, although it is important to note that many of the variables have lower correlations for the change than they had for the raw values. This indicates that a lot of the accuracy of the forecasts comes from the year-to-year consistency of players. There is one exception and that is the prediction of doubles, where the change is better forecast than the absolute value. Whether that is important or not, I have no idea.

                                          |<----------Rates---------->|
Changes        BA   SLG   OBA   OPS  RC/O   1B    2B    3B    HR    BB
STATS        .622  .649  .624  .647  .667  .515  .565  .631  .604  .395  
Vlad         .479  .488  .438  .489  .452  .445  .354  .424  .449  .294
S-V          .758  .708  .697  .715  .681  .645  .560  .607  .656  .445

Let's also look at how well the systems predict the overall 'shape' of the statistics. In other words, do the systems make predictions that look like the real world. A system that predicted regular players would get either 0 or 40 home runs might get the mean home run rate for the league, but would make horribly unrealistic predictions for almost all players. Also, in some systems, such as Vlad, the conversion to actual values from what the forecast actually puts out depends on an estimate of the league performance. Below are the mean and standard deviations for the observed and predicted variables.

                                           |<----------Rates---------->|
Mean           BA   SLG   OBA   OPS  RC/O   1B    2B    3B    HR    BB       
STATS        .274  .432  .346  .778  .209  .168  .045  .005  .029  .099  
Vlad         .282  .456  .357  .813  .231  .168  .046  .005  .033  .104
1997         .277  .444  .347  .791  .218  .167  .049  .005  .030  .097      
 
                                          |<----------Rates---------->|
St. Dev.       BA   SLG   OBA   OPS  RC/O   1B    2B    3B    HR    BB       
STATS        .021  .065  .033  .089  .052  .024  .008  .003  .015  .035  
Vlad         .025  .078  .040  .108  .069  .025  .014  .005  .019  .043
1997         .028  .075  .039  .105  .063  .028  .012  .004  .018  .037

In the mean, STATS is a little light on everything except 1B and BB (the rounding minimizes the difference), while Vlad overforecasts home runs and walks. The differences in the standard deviation are much larger, however. STATS underestimates the variability between players. As a result, it is incapable of predicting really big seasons by anyone. To drive this point home, let's look at the high home run rate predictions for 1997 from STATS.

Player   STATS  1996  Vlad  1997
Belle     .067  .068  .065  .044
Gonzalez  .066  .080  .089  .074
Bichette  .065  .046  .045  .044
Griffey   .062  .079  .080  .082
Thomas    .061  .063  .068  .055
McGwire   .060  .096  .076  .090

STATS forecast only five players to hit home runs at a rate greater than .060 per PA, with a peak of .067 for Albert Belle. This was a forecast for a huge falloff from 1996, when 22 players hit homers at greater than .060 per PA. Vlad forecast 15 players at or above .060 and there was a total of 13 that did it. Of the 23 highest home run rates STATS forecast, 19 of the players actually were forecast to hit home runs less frequently than in 1996. As a result, it appears unlikely that STATS can provide any guidance in identifying candidates in the hunt for Roger Maris's record. This problem of pulling high performers back towards the mean is not as big for other offensive categories, but it makes the STATS predictions less useful for home run prediction than for other variables.

I'll close by looking at one specific prediction. It provides a wonderful example of the dangers in evaluating a prediction using a single measure, I present you with Joe Randa. Here's Randa's line from 1996 and as predicted for 1997 by STATS and Vlad.

                                    |<----------Rates---------->|
Source  BA   SLG   OBA   OPS   RC/O   1B    2B    3B    HR    BB
1996  .3027 .4332 .3526 .7859 .2191 .1956 .0661 .0028 .0165 .0716  
STATS .2642 .3802 .3134 .6936 .1619 .1751 .0507 .0046 .0161 .0668  
Vlad  .2864 .4850 .3283 .8132 .2231 .1522 .0804 .0043 .0326 .0587
1997  .3025 .4515 .3616 .8130 .2340 .1880 .0558 .0186 .0145 .0847

Well, STATS thought Joe would go into the tank after 1996. It's a curious prediction, given that Randa turned 27 in December of 1996, but it may be going back to his less-than-stellar 1995. Vlad, on the other hand, went for a big jump in the power numbers and a loss of walks. Except for an increase in the triples, though, Randa's extra-base hits dropped off, while he had his highest full-season walk rate since A ball back in '92.

Using the distance measure for the rates, Randa was STATS 's worst forecast of the year, while Vlad only missed Fernando Vina worse. But look at that OPS forecast from Vlad! One of the three OPS forecasts with errors that round to .000 (Dan Wilson and Dante Bichette are the other two). The forecast overall wasn't very good, but the errors cancelled out in such a way as to make a summary measure of performance look wonderful. It's a little like my putting--you can count on me misreading the green pretty often and mis-hitting the putt pretty often. Every once in a while, though, the two cancel out, I sink a 30-footer, and people think I can putt.

In short, STATS appears to have been more accurate in 1997, although the inability to model the observed variability of performance seems to me to limit the growth potential for improvement in an important way.

I'll be back soon with a look at the great and not-so-great predictions by STATS and Vlad, in an attempt to see if there's any particular kind of player who causes the systems problems. I'll also offer some ideas on how the user can make the most out of having information from two different systems.