Johnny B. Goode
Controlling the run game, pitcher fielding and ERA
Run & Glove
Johnny Cueto has been mocking his peripherals ever since his big league debut. For the most part FIP serves as a terrific gauge for pitcher performance, but in 2011 Cueto made FIP look like a heart monitor trying to explain the weather. On what most consider a separate note, base runners have a healthy and robust fear of Cueto’s pickoff move, which is one of the best in the show.
FIP measures outcomes a pitcher can control (home runs, walks and strikeouts) and chalks the rest up to random variation. Studies have shown that stolen bases contribute relatively little to run creation and perhaps on that basis the ability to control the run game has generally been ignored or deemed overrated.
It is difficult, however, to ignore the six runs Cueto saved the Reds via his contributions to controlling the run game in 2012. By contrast, A.J. Burnett’s inability to control runners cost the Pirates four runs. The typical scale is that 10 runs amount to one team win – and teams will pay about $5 million per win.
Acknowledging run game control cannot fully explain how Cueto has routinely outperformed his peripherals, just as it cannot wholly explain Pittsburgh’s inability to keep pace with Cincinnati in the NL Central last season. It does, however, get us closer.
Incorporating a pitcher’s fielding ability proved of comparative importance in explaining and predicting performance. Here we’ll turn to Mark Buehrle, whose glove has saved four runs per year since 2004, and among fellow hurlers the fast-working lefty has been one of the decade’s most steadily superb fielders. FIP underestimated Buehrle in eight of the past nine seasons, slighting his ERA by an average of .30 per year over that span.
Numbers
The numbers indicate that a pitcher’s defense and ability to control the run game should both be considered in assessing and forecasting the pitcher’s value.
Focusing on seasons in which pitchers hurled 100 same-league (AL or NL) innings from 2003-2012 (n=1400), I ran a multiple linear regression to create a formula (“MBRA”) incorporating run control (rSB) and pitcher fielding (rPM) on top of line drive and infield fly ball percentages (credit to BABIP guru Steve Staude) and a regressed take on FIP.
MBRA = (55.25*HR + 14.05*BB – 8.57*K)/TBF – .041*rPM – .056*rSB + (5.71*LD – 8.27*IFFB)/(LD+GB+FB) + 2.34
Correlation |
Mean Absolute Error |
|
MBRA |
.7750 |
.4570 |
FIP |
.7647 |
.4697 |
BERA |
.7477 |
.4922 |
tERA |
.7472 |
.5616 |
MBRAT |
.7216 |
.5394 |
xFIP |
.6451 |
.5649 |
SIERA |
.6290 |
.5768 |
MBRA is engineered to properly credit pitchers who can field and control the run game. When I subtracted MBRA from FIP to locate the pitcher-seasons that benifitted most from my formula, I was encouraged seeing Buehrle show up twice in the top ten, and five times in the top 100 (again, that is out of 1400).
Next, I looked at seasons in which pitchers threw 100 same-league innings in consecutive seasons from 2003 to 2012 (n=791). This time I ran a regression to create a model suited to predict a pitcher’s ERA based on his previous year’s statistics.
MBRAT = (20.12*HR + 7.13*BB – 6.7*K)/TBF -.025*rPM -.034*rSB + 2.37*ZC% + 2.22
Correlation |
Mean Absolute Error |
|
MBRAT |
.4526 |
.6498 |
SBERA |
.4398 |
.6582 |
BERA |
.4347 |
.6634 |
xFIP |
.4220 |
.6803 |
MBRA |
.4198 |
.6987 |
FIP |
.4162 |
.7024 |
ERA |
.3630 |
.7920 |
MBRAT stands tall on the lofty pinnacle of public forward-looking ERA estimators, and if you factor in the percentage at which pitchers throw over the edge of the plate (EDGE%) its correlation jumps even higher (.4621). Unfortunately, I only have Edge% data from 2008 to 2012 (n=362) and cannot yet justify its inclusion.
On Deck
I will create expectations for pitchers with fewer innings pitched and convert my findings to a WAR measure that may serve as a middle-ground between fWAR and rWAR. I also stumbled on a potentially significant relationship between pick-off attempts and strand rates that may work its way into future formulas.
Cleveland native, University of Minnesota Law School alum, grand slam enthusiast and the last of the hardcore troubadours.
Fascinating article, thanks for putting this together.
You need to test whether the components you are adding in (rPM, rSB, etc.) are significant. Obviously the more factors you throw in the model the higher the overall correlation. However, if they’re not adding significant predictive value then there is no reason to put them in your model. When you keep adding predictors you’ll artificially inflate the correlation without actually doing a better job of explaining or predicting.. For instance, if you look at the start of your MBRAT, it looks remarkably like my pFiP equation (http://www.hardballtimes.com/main/article/more-on-standard-deviation-and-era-estimators/):
20*HR + 10*BB – 10*K/TBF + 4.60
And I”m sure the correlation between pFIP and MBRAT would be very close despite using way less factors.
You would also need to test this equation out of sample, as you’re actually explaining future runs with MBRAT rather than predicting it. If you use the entire population and do not test outside of it, you’ll risk overfittting your model.
Glenn is right. Overfit is a risk if you use an explanatory formula for prediction. I’m also cautious of using LD%, which is more volatile than GB%.
I may have come off as a little too harsh in my original comment. I really do like the idea. Cueto’s ability to keep runners from stealing is actually absurd. No one runs on him, he picks people off and when they run they get caught. I do believe he deserves credit for that in his run prevention ability.
The same can be said for Buehrle’s defensive. Defensive Runs Saved for pitchers is a very real thing and they also deserve credit for it. I think MBRAT is a really good starting point, I just think the overall model could be improved if you split the sample instead of using the entirety of 2003-12.
Keep up the good work!
Hey Glenn, I appreciate the feedback and I am very fond of your work. I essentially threw the kitchen sink into MBRA/MBRAT and then stripped my findings of insignificant variables. Each variable in MBRA boasted a p-value well under .05 in explaining the current year, and the formula had the lowest AIC among its competitors. I was careul not to overfit the model.
Everything in MBRA/MBRAT should look familiar (regressed take on FIP + Steve Staude’s BABIP findings), save for the run control/fielding considerations, which were explanatory staples in each formula. For example, rSB’s p-value was .01 when used to “predict” a pitcher’s ERA based on the prededing season’s statistics.
I would like to test the equations out of sample, but I only had access to a decade of rSB/rPM data and I wanted to milk it for all it was worth. I would split the data, but I don’t think I can get enough statistical footing in a sample size of 4-5 seasons (which is why I excluded EDGE%, even though it had a .008 p-value and was outwardly as significant as HR/TBF in “predicting” the following year’s ERA).
Again, I appreciate the feedback. I am relatively new to this field and it is just as much fun as I suspected it would be.
Brilliant work Mr. Greenlee. As the poster boy for FIP overachievers, it is perhaps no coincidence that I was also one of the best fielders to ever take the mound.
Greenlee,
Again great work and I figured that you had found p-values that were less than .05 for the predictors that you left, I just was not sure because you didn’t explicitly say it in the piece.
You still run the risk of an overfit even if you only have significant predictors though. The weights for the coefficients of all of the estimators you tested (well at least of SIERA, xFIP and FIP) were tested out of sample rather than being fit specifically for this sample; which inflates MBRAT’s correlation.
I know you would like to use as much of the sample as possible, but I can almost promise you that using 4-5 seasons will be a big enough sample to find similar weights, and that your six predictors will still be significant in those seasons. Once you do that you can test the correlation on the other four-five seasons. If MBRAT comes out ahead of SIERA, xFIP, FIP, pFIP, etc. Then you’ll know that this new estimator may, in fact, be the way to go.
I really like the first effort and like I said a pitcher’s defense and ability to control the running game do have an effect on their ERA/and future ERA.
Keep up the good work
Very cool article, I’ve actually been hoping to see something like this for a while. Glenn, I disagree with your criticism to a certain extent. The author’s correlation will certainly be higher than the competing formulas if he incorporates their variables into his data-specific regression, which he has. Yet, so long as the author’s contributions are significant (rPM & rSB) there is added value beyond the “artificial inflation.” I think it boils down to — which time frame do you want to use to explain or predict data? If you want to use the last decade to do so, I would use the author’s formula for its advantage in considering pitcher fielding and run control. Nice work.
MBRA is named after Mark Buehrle then I take it? Very solid article. Eager to see if edge% can join pitcher-fielding and steal-prevention in future models.
I think this is a great idea. Why not just make a rSB or rPM adjustment to kwERA, FIP, xFIP, BERA or SIERA though? Is there reason to think that there’s an association between the stats in those metrics and defense or stolen base prevention such that they already take them into account in some accidental or indirect way. Maybe there is, I’m just posing the question.
Great article! Loved your analysis.
I really appreciate the feedback everyone.
Jared, I’m a big Steamers fan, thanks for the read. I would incorporate rSB and rPM into any existing metric that lacks them (unless simplicity reigns surpreme). ERA estimators deal in many common ingredients, and I suspect the coefficients MBRA advances will evolve or be trumped. Any unique and material value is housed in the added ingredients — I simply preferred to select my own supporting cast (regressed FIP + select BABIP measures).
Glenn, it may help to know that I did not aim to uproot or refine the coefficients attached to familiar variables, and did not include pFIP simply because it would have been bested by artificial inflation alone. I developed MBRA and MBRAT in part to promote the inclusion of rSB and rPM (and potentially edge%) in the likes of pFIP, BERA, etc.
Thanks again for all the feedback. Below are some figures associated with the first MBRA chart in the article (2003-2012).
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.337790 0.170431 13.717 < 2e-16 ***
HRpTBF 55.250151 2.111286 26.169 < 2e-16 ***
BBpTBF 14.046871 0.742663 18.914 < 2e-16 ***
kpTBF -8.574038 0.373300 -22.968 < 2e-16 ***
rPM -0.041436 0.006966 -5.949 3.41e-09 ***
rSB -0.056300 0.010261 -5.487 4.85e-08 ***
LDpTOH 5.707936 0.640496 8.912 < 2e-16 ***
IFFBTOH -8.265642 1.140658 -7.246 7.07e-13 ***
Awesome work Mr. Greenlee! Glad I stumbled upon this before my fantasy draft. This looks even better than the predictor I used last season. It just may provide me with the edge I need to defend my championship.
Wow, great writing and outstanding research! Thanks for the shout-outs, by the way.
Anyway, were you able to find many details on what goes into rSB and rPM? I’m just wondering if they actually include run data within them, which would be a bit circular (unless you’re just using them to guess the next year’s ERA).
Thank you Steve, and that is a great question . Rather than actual run data (in terms of results), the Fielding Bible uses a run expectancy matrix to determine rSB/rPM (http://www.fieldingbible.com/methodology-pitchersb.asp)/(http://www.fieldingbible.com/methodology-plusminus.asp), so I don’t think it props up MBRA any more than, say, home runs allowed.
I also would not be surprised if the correlation between rSB and ERA runs deeper than rigid run expectancies. For example, if a pitcher is struggling to control runners he may be lured into throwing more fastballs with men on base, resulting in fewer double plays. This (along with other peripheral possibilities) is something I would like to examine in the future.
Very nice — great idea to include those.
Just double checking a couple of your numbers, I found that the same-year correlations for FIP averaged only 0.716, while BERA’s were 0.741 (though I didn’t go so far as to only look at same-league numbers there). FBERA was at 0.756, by the way. So your BERA sounds about right, but would you mind re-checking the 0.7647 you got for FIP?
Ya know, I was just as skeptical of FIP’s success here Steve, but I just re-crunched the numbers and was faced with the same results. The same-league requirement excluded more seasons than I first anticipated, and perhaps FIP’s coefficients struck gold (or were FF-propelled), but if you email me at gree1340@umn.edu I would be very happy to send over a spreadsheet. And FBERA belonged to a slew of formulas I would have included, but I was so impressed with your work that I recruited most of your variables and just snagged what struck me as your two flagship metrics(because with common variables “artificial inflation” was going to carry the day regardless). Granted, for the article’s purposes, FBERA probably would have made the most sense to include.
To Greenlee
Fantastic article. Definitely want to use MBRA for my fantasy league. I just though I’d make a quick suggestion. Have you considered an interactive term between GB% and pitcher fielding? It would seem to me that the more ground balls a pitcher produces, the more likely he is to get fielding chances, and thus his fielding would increase in importance for predictions.
Thanks beckett, and I appreciate the suggestion. I was not, however, able to squeeze much out of GB%, and I’m afraid the use of a GB% * rPM interaction variable (“rpGBean” below) was no exception.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.524621 0.224417 11.250 < 2e-16 ***
HRpTBF 53.962114 2.341566 23.045 < 2e-16 ***
BBpTBF 13.956992 0.746607 18.694 < 2e-16 ***
kpTBF -8.649608 0.377968 -22.885 < 2e-16 ***
rPMean -0.037815 0.007286 -5.190 2.41e-07 ***
rSB -0.057140 0.010264 -5.567 3.11e-08 ***
LDpTOH 5.280521 0.709091 7.447 1.67e-13 ***
IFFBTOH -9.698939 1.608508 -6.030 2.10e-09 ***
GBMean -0.502867 0.407177 -1.235 0.217
rpGBean -0.140948 0.099912 -1.411 0.159
This is sweet. Do you think ZiPS, Steamers and PECOTA will pick up on this?
FYI, everybody, I was off in my FIP calculation, somehow, and Greenlee’s looks about right. It does make a big difference whether you’re looking at a group of each player’s individual seasons or all of those seasons averaged together, though — FIP beats BERA in the former but not the latter (FBERA is tied in the former, and even better in the latter).
You know MBRA and MBRAT are not really defense-independent like their competitors here, but I think only accounting for a pitcher’s defense makes perfect sense. I think the category should be redefined as something like “non-pitcher-defense-independent”…
Excellent stuff, glad I have been witness to the great work coming out of community research over the last couple months
Thanks Bill (but not at all Ted). And JakeZilla, I’m afraid I couldn’t tell you. Steve’s IFFB/ZC findings really took off when they made their way to the FG forefront. Projection systems strive to be as accurate as possible, so if rSB/rPM/EDGE truly have legs they should ultimately make their way there. I have no doubt that the concept will be embraced, but projection systems may wait on a fielding metric that boasts more transparency/objectivity than rSB/rPM.
And Steve — you da man. “NPDIPs” slides off the tongue quite nicely.
One more reason to steer clear of AJ Burnett. Cool idea, it definitely makes sense for pitchers to be held accountable for their defense.
Just curious, at what quantities does edge%, rPM, rSB, and ZC% stabilize? How consistent are they year to year?
Hey JDM, the year-to-year correlation was .462 for rSB and .354 for rPM in pitchers who met the threshold (100 same-league IP, n=791). I may reorganize the data to gauge stability against a multi-season backdrop as well. Year-to-year correlations for ZC% and neighboring stats (but not edge) can be found at the bottom of Steve’s first BABIP article (http://www.fangraphs.com/community/index.php/proejcting-babip-using-batted-ball-data/).