Forward Into the Past, or How Did You Think Joe Randa Would Do?
by Harold Brooks
1) Rates of performance are the most important thing to evaluate. All of the methods give an estimate of playing time, but the crux of the forecasts are how well the rates are estimated. Implicitly, I assume that predicting Mark McGwire to hit 30 HR in 300 PA is a better forecast of a performance if he hits 60 in 600 PA than a forecast of 35 in 600 PA would be.
2) I'm only going to worry about regular or semi-regular players. As a result, I'm limiting the evaluation only to those players who got 300 PA and were forecast to get 300 PA from all three of the methods. Note that this means I'm not evaluating first-year players or guys whose playing time was horribly missed. (That leaves me with 178 players for 1997.)
3) For consistency between the sources, I'm going to evaluate ten predictions for each player--BA, SLG, OBA [(H+BB)/(AB+BB)], OPS, the simplest version of runs created/out and the rates (per PA) of 1B, 2B, 3B, HR, and BB. The rates probably are the hardest test of any forecast system.
4) Forecasting the change from last year is just as important, if not more important, than forecasting the exact magnitude of performance. Predicting that Frank Thomas and Barry Bonds will be among the best offensive players in the league is not very challenging. Predicting that Gary Sheffield will fall off a cliff or that Sandy Alomar, Jr. will have a career year is a challenge.
Here, I'll show only a few measures derived from the study. One that I want to do more with, but haven't yet, is a distance measure for the rates. If you square the error of the forecasts, that gives you a 'distance' between the forecast and observed performance. If you sum the squared errors for the five rates, that sum is the 'distance' squared in 5-dimensional space. This is a fairly strict criteria, since if your forecast is really bad in one of the five, you'll have a large distance. (To make things a little fairer, albeit harder still, I'm normalizing the rate by the mean of the overall sample for each item. That is, the fact that there aren't very many triples and, therefore, no system will be awful at predicting the absolute rate, I divide the predicted and observed rates of triples by the league average rate and use that value as the predicted variable.)
Note, however, that STATS and Vlad are much better correlated with each other than either of them are correlated to the observed performance. That is not an uncommon thing in prediction systems. For instance, numerical weather prediction models correlate better with each other than any of them do with the real atmosphere.
|<----------Rates---------->| Raw BA SLG OBA OPS RC/O 1B 2B 3B HR BB 1996-1997 .494 .668 .665 .651 .639 .701 .403 .513 .810 .800 STATS-1997 .617 .757 .766 .753 .738 .696 .426 .610 .846 .804 Vlad-1997 .520 .703 .705 .703 .673 .619 .256 .558 .806 .794 S-V (1997) .813 .882 .902 .891 .877 .774 .506 .789 .915 .910
Below, I show the correlations for year-to-year changes. The picture's pretty much the same as before, although it is important to note that many of the variables have lower correlations for the change than they had for the raw values. This indicates that a lot of the accuracy of the forecasts comes from the year-to-year consistency of players. There is one exception and that is the prediction of doubles, where the change is better forecast than the absolute value. Whether that is important or not, I have no idea.
|<----------Rates---------->| Changes BA SLG OBA OPS RC/O 1B 2B 3B HR BB STATS .622 .649 .624 .647 .667 .515 .565 .631 .604 .395 Vlad .479 .488 .438 .489 .452 .445 .354 .424 .449 .294 S-V .758 .708 .697 .715 .681 .645 .560 .607 .656 .445
|<----------Rates---------->| Mean BA SLG OBA OPS RC/O 1B 2B 3B HR BB STATS .274 .432 .346 .778 .209 .168 .045 .005 .029 .099 Vlad .282 .456 .357 .813 .231 .168 .046 .005 .033 .104 1997 .277 .444 .347 .791 .218 .167 .049 .005 .030 .097 |<----------Rates---------->| St. Dev. BA SLG OBA OPS RC/O 1B 2B 3B HR BB STATS .021 .065 .033 .089 .052 .024 .008 .003 .015 .035 Vlad .025 .078 .040 .108 .069 .025 .014 .005 .019 .043 1997 .028 .075 .039 .105 .063 .028 .012 .004 .018 .037
In the mean, STATS is a little light on everything except 1B and BB (the rounding minimizes the difference), while Vlad overforecasts home runs and walks. The differences in the standard deviation are much larger, however. STATS underestimates the variability between players. As a result, it is incapable of predicting really big seasons by anyone. To drive this point home, let's look at the high home run rate predictions for 1997 from STATS.
Player STATS 1996 Vlad 1997 Belle .067 .068 .065 .044 Gonzalez .066 .080 .089 .074 Bichette .065 .046 .045 .044 Griffey .062 .079 .080 .082 Thomas .061 .063 .068 .055 McGwire .060 .096 .076 .090
STATS forecast only five players to hit home runs at a rate greater than .060 per PA, with a peak of .067 for Albert Belle. This was a forecast for a huge falloff from 1996, when 22 players hit homers at greater than .060 per PA. Vlad forecast 15 players at or above .060 and there was a total of 13 that did it. Of the 23 highest home run rates STATS forecast, 19 of the players actually were forecast to hit home runs less frequently than in 1996. As a result, it appears unlikely that STATS can provide any guidance in identifying candidates in the hunt for Roger Maris's record. This problem of pulling high performers back towards the mean is not as big for other offensive categories, but it makes the STATS predictions less useful for home run prediction than for other variables.
|<----------Rates---------->| Source BA SLG OBA OPS RC/O 1B 2B 3B HR BB 1996 .3027 .4332 .3526 .7859 .2191 .1956 .0661 .0028 .0165 .0716 STATS .2642 .3802 .3134 .6936 .1619 .1751 .0507 .0046 .0161 .0668 Vlad .2864 .4850 .3283 .8132 .2231 .1522 .0804 .0043 .0326 .0587 1997 .3025 .4515 .3616 .8130 .2340 .1880 .0558 .0186 .0145 .0847
Well, STATS thought Joe would go into the tank after 1996. It's a curious prediction, given that Randa turned 27 in December of 1996, but it may be going back to his less-than-stellar 1995. Vlad, on the other hand, went for a big jump in the power numbers and a loss of walks. Except for an increase in the triples, though, Randa's extra-base hits dropped off, while he had his highest full-season walk rate since A ball back in '92.
Using the distance measure for the rates, Randa was STATS 's worst forecast of the year, while Vlad only missed Fernando Vina worse. But look at that OPS forecast from Vlad! One of the three OPS forecasts with errors that round to .000 (Dan Wilson and Dante Bichette are the other two). The forecast overall wasn't very good, but the errors cancelled out in such a way as to make a summary measure of performance look wonderful. It's a little like my putting--you can count on me misreading the green pretty often and mis-hitting the putt pretty often. Every once in a while, though, the two cancel out, I sink a 30-footer, and people think I can putt.
I'll be back soon with a look at the great and not-so-great predictions by STATS and Vlad, in an attempt to see if there's any particular kind of player who causes the systems problems. I'll also offer some ideas on how the user can make the most out of having information from two different systems.