A player comes up to the plate. He’s a very good hitter; he’s hitting .300 on the year and has 40 home runs. On the mound stands a pitcher, also very good. The pitcher is a Cy Young candidate, and his ERA sits barely over 2.00. He leads the league in strikeouts and issues very few walks.
After a 10-pitch battle, the pitcher is the one to crack and the batter slaps a hanging curveball into the gap for a double. The batter has won. His batting average for the at bat is a very nice 1.000. Same for his OBP. His slugging percentage? 2.000. Fantastic. If he did this every time, he’d be MVP, no question, every year. The pitcher, meanwhile, has a WHIP for the at bat of #DIV/0!. Hasn’t even recorded a single out. His ERA is the same. He’s not doing too great. But let’s be fair. We’ll give him the benefit of the doubt, since we know he’s a good pitcher – we’ll pretend he recorded one out before this happened. Now his WHIP is 3.000. Yeesh – ugly. If he keeps pitching like this, his ERA will climb, too, since double after double after double is sure to drive every previous runner home.
Now, obviously, this is a bit ridiculous. Not every at bat is the same. The hitter won’t double every single at bat, and the pitcher won’t allow a double every time either. Baseball is a game of random variation, skill, luck, quality of opponents and teammates, and a whole bunch of other elements. In our scenario, all those elements came together to result in a two-bagger. But, like we said, you can’t expect that to happen every single time just because it happens once.
So… how do we predict what will happen in an at bat? Any person well-versed in baseball research knows that past performance against a specific batter or pitcher means little in terms of how the next at bat will turn out, at least not until you get a meaningful number of plate appearances – and even then it’s not the best tool.
Of course, if we knew the result of every at bat before it happened, it would take most of the fun out of watching. But we’re never going to be able to do that, and so we might as well try to predict as best we can. And so I have come up with a methodology for doing so that I think is very accurate and reliable, and this post is meant to present it to you.
To claim full credit for the inspiration behind this idea would be wrong; FanGraphs author and baseball-statistics aficionado Steve Staude wrote an article back in June 2013 aiming to predict the probability of a strikeout given both the batter’s and the pitcher’s strikeout rates, which led me to this topic. In that article he found a very consistent (and good) model that predicted strikeouts:
Expected Matchup K% = B x P / (0.84 x B x P + 0.16)
Where B = the batter’s historical K% against the handedness of the pitcher; and P = the pitcher’s historical K% against the handedness of the batter
He then followed that up with another article that provided an interactive tool that you could play around with to get the expected K% for a matchup of your choosing and introduced a few new formulas (mostly suggested in the comments of his first article) to provide different perspectives. It’s all very interesting stuff.
But all that gets us is K%. Which, you know, is great, and strikeouts are probably one of the most important and indicative raw numbers to know for a matchup. But that doesn’t tell us about any other stats. So as a means of following up on what he’s done (something he mentioned in the article but I have not seen any evidence of) and also as a way to find the probability of each outcome for every type of matchup (a daunting task), I did my own research.
My methodology was very similar. I took all players and plate appearances from 2003-2013 (Steve’s dataset was 2002-2012; also, I got the data all from retrosheet.org via baseballheatmaps.com – both truly indispensable resources) and for each player found their K%, BB%, 1B%, 2B%, 3B%, HR%, HBP%, and BABIP during that time. This means that a player like, say, Derek Jeter will only have his 2003-2013 stats included, not any from before 2003. I further refined that by separating each player’s numbers into vs. righty and vs. lefty numbers (Steve, in another article, proved that handedness matchups were important). I did this for both batters and pitchers. Then, for each statistic, I grouped the numbers for the batters and the numbers for the pitchers, and found the percentage of plate appearances involving a batter and a pitcher with the two grouped numbers that ended in the result in question. That’s kind of a mouthful, so let me provide an example:
These are my results for strikeout percentage (numbers here are expressed as decimals out of 1, not percentages out of 100). Total means the total proportion of plate appearances with those parameters that ended in a strikeout, while batter and pitcher mean the K% of the batter and pitcher, respectively. Count(*) measures exactly how many instances of that exact matchup there were in my data. Another important point to note – this is by no means all of the combinations that exist; in fact, for strikeouts, there were over 2,000, far more than the 20 shown here. I did have to remove many of those since there were too few observations to make meaningful assumptions…
…but I was still left with a good amount of data to work with (strikeout percentage gave me just over 400 groupings, which was plenty for my task). I went through this process for each of the rate stats that I laid out above.
My next step was to come up with a model that fit these data – in other words, estimate the total K% from the batter and pitcher K%. I did this by running a multiple regression in R, but I encountered some problems with the linearity of the data. For example, here are the results of my regression for BB% plotted against the real values for BB%:
It looks pretty good – and the r^2 of the regression line was .9653, which is excellent – but it appears to be a little bit curved. To counter that I ran a regression with the dependent variable being the natural logarithm of the total BB%, and the independent variables being the natural logarithms of the batter’s and pitcher’s BB%. After running the regression, here is what I got:
The scatterplot is much more linear, and the r^2 increased to .988. This means that ln(total) = ln(bat)*coefficient + ln(pitch)*coefficient + intercept. So if we raise both sides from the e, we get total = e^(ln(bat)*coefficient + ln(pit)*coefficient + intercept). This formula, obviously with different coefficients and intercepts, fits each of K%, BB%, 1B%, 3B%, HR%, and HBP% remarkably well; for some reason, both 2B% and BABIP did not need to be “linearized” like this and were fitted better by a simple regression without any logarithm doctoring.
Here are the regression equations, along with the r^2, for each of the stats:
The first thing that should jump out to you (or at least one of the first) is the extremely high correlation for BABIP. It totally blew my mind to think that you can find the probability, with 96% accuracy, that a batted ball will fall for a hit, given the batter’s BABIP and pitcher’s BABIP.
Another immediate observation: K%, BB%, and HBP% generally have higher correlations than 1B%, 2B%, 3B%, and HR%. This is likely due to the increased luck and randomness that a batted ball is subjected to; for example, a triple needs to have two things happen to become a triple (being put in play and falling in an area where the batter will get exactly three bases), whereas a strikeout only needs one thing to happen – the batter needs to strike out. Overall, I was very satisfied with these results, since the correlations were overall higher than I expected.
Now comes the good part – putting it all together. We have all the inputs we need to calculate many commonly-used batting stats: AVG, OBP, SLG, OPS, and wOBA. So once we input the batter and pitcher numbers, we should be able to calculate those stats with high accuracy. I developed a tool to do just that:
For a full explanation of the tool and how to use it, head over to to my (new and shiny!) blog. I encourage you to go play around with this to see the different results.
One last thing: it is important to note that I made one big assumption in doing this research that isn’t exactly true and may throw the results off a little bit. The regressions I ran were based off of results for players over their whole career (or at least the part between 2003-2013), which isn’t a great reflector of true talent level. In the long run, I think the results still will hold because there were so many data points, but in using the interactive spreadsheet, your inputs should be whatever you think is the correct reflection of a player’s true talent level (which is why I would suggest using projection systems; I think those are the best determinations of talent), and that will almost certainly not be career numbers.
Coming into this season, the Boston Red Sox had high hopes. Obviously, they were coming off a World Series title, and they had every reason to expect that they could contend again. Jarrod Saltalamacchia was gone, but he could be replaced by A.J. Pierzynski; the drop-off there wouldn’t be too large. Ryan Dempster was gone, but the Red Sox’s rotation of Jon Lester, John Lackey, Clay Buchholz, Jake Peavy, and Felix Doubront was what they had gone with during last year’s stretch run anyways. Jacoby Ellsbury was gone, but Jackie Bradley Jr. (and Grady Sizemore!) should have been able to play well enough to make his departure bearable. And Stephen Drew was gone, but uber-prospect Xander Bogaerts was ready to take over the Red Sox’s shortstop position and dominate the league.
Needless to say, none of those really worked out like the Red Sox and their fans had hoped or planned. Boston currently resides in the AL East cellar, all but certain to go from first to worst just the year after they had done the very opposite. And perhaps no individual part of that failure this season has been a bigger disappointment than Bogaerts. Instead of being the hitter he was supposed to be, he has struggled mightily at the plate, to the tune of a .223/.293/.333 slash line — good for a 74 wRC+ (as of September 1) and a major contributor to his negative WAR.
Where do we start in trying to assess the reasons for Bogaerts’s struggles? Well, time-wise, we can place a pretty neat cutoff point at June 4: That is when Bogaerts started to slump (I think I cursed him). For the first two months of the season, actually, Xander was quite good: he had a 140 wRC+ through April and May, and that figure would have been higher if not for a mini-slump that came towards the very beginning of the season. He was drawing walks roughly 11% of the time (above average) and striking out at a clip a shade below 22% (not much below average). And then came June. It started out OK — he went 4-for-13 in his first 3 June games. But after that, for the rest of the month, he recorded a mere 9 hits and 3 walks in 88 plate appearances. July was better, but not good: Bogaerts managed just a .228/.253/.342 line, and now through most of August he has been even worse than he was the previous two months, with a paltry .123/.195/.164 triple slash.
His wRC+, by month:
Yeesh. Not the way you want to be trending. So what happened? Well, the easy answer is to point to BABIP:
This looks right, right? His best month by wRC+ was his best month by BABIP. His worst month by wRC+ was his worst month by BABIP. And the same can be said for every month in between. But that, of course, doesn’t tell the whole story. Why is his BABIP from the first two months so much higher? What can he do to fix it? Will he fix it? Can he? Let’s explore.
A .364 BABIP like Bogaerts had in April is unsustainable. The .421 BABIP he had the following month is way too high for even the best players to keep up. So naturally, we would expect some regression from him. But his batted ball profile did suggest a decent BABIP – high line drive rate and low popup rate. The only thing overly suspect was his 17.1% infield hit rate in June. Nothing there would suggest such an outrageously high BABIP for the first two months, but nothing would suggest the low BABIPs that were to come later either. So something must have changed. What was it?
It wasn’t Bogaerts’s average flyball distance; that stayed more or less intact. But he did start hitting many fewer line drives…
…and started striking out more, which didn’t affect his BABIP directly but did have an impact on his overall hitting (somewhat astonishingly and coincidentally, his K% has been the exact same – to one decimal – each of the past 3 months):
And in the same vein, he walked much less, which helped contribute to his very low wRC+ as well:
So while it may be easy to ascribe Bogaerts’s recent struggles to his abnormally low BABIPs, there is more to the story. He simply isn’t hitting anywhere near as well as he did earlier in the season. I can think of a few potential reasons for this:
1. Pitchers are pitching to him differently, and he will have to adjust
2. He is in a prolonged slump, and will snap out of it eventually
3. He isn’t actually that good, and his first few months were just very lucky
4. He was playing third base
I think we can ignore the last two. Bogaerts, after all, was ranked a top-5 prospect coming into the season by almost anyone worth listening to, and he has hit very well in the majors before; he’s almost certainly not actually bad at hitting. As for the last one — that was a theory many people floated out when Bogaerts stopped hitting well at almost the exact same time as Stephen Drew returned and kicked Bogaerts over to third. The argument was that since short was Bogaerts’s natural position, and he felt most comfortable there and could focus on his hitting, he would do better when playing there.
And that theory holds some water: this season, his wRC+ as a third baseman is 37 (in 180 PA), and as a shortstop it is 95 (in 312). That is too large of a difference to dismiss offhandedly. But here’s the problem: when Drew was traded, and Bogaerts returned to shortstop, he continued to hit poorly. In fact, throughout the entire month of August, Bogaerts played shortstop, and he had a -3 wRC+. I am going to say that that theory, while compelling, doesn’t really explain Bogaerts’s struggles at all. He’d tell you that himself.
So what does? Pitchers pitching him differently? Yes, to an extent. Here is how Bogaerts has done all season long against certain pitches:
And here is how he has been pitched:
The pitches in that gif are ordered by how many runs above average Bogaerts has been against them, descending. You can see that from June 4 (the date of the start of Bogaerts’s extended slump) on, he has seen many fewer fastballs and many more sinkers and sliders than before. That could be the cause of his BABIP, strikeout, and general hitting struggles since he excels against fastballs and cannot hit sliders or sinkers (sliders more so).
But there’s only one issue: the problem isn’t that Bogaerts is getting fewer pitches he can hit, it’s that he’s not hitting the pitches he used to. Here’s Bogaerts against four-seam fastballs (from Brooks Baseball; BIP means balls in play):
He’s cut down a bit on his swings and misses, but everything else looks bad. He’s drastically decreased his line drive rate and drastically increased his popup rate. His groundball rate has gone down a bit, which can be good or bad (in this case I don’t think it’s had a huge effect on anything), and his flyball rate has gone up a lot — which could be good, but Bogaerts is averaging a mere 266.75 feet on his fly balls — 230th out of 284 qualified hitters. So how has this changed his results? Again, Bogaerts against fastballs:
Wow. That is quite the drop in production. League average wOBA against four-seamers this year is .416 (which makes you question why they are thrown so much, but that’s a different article) and so Bogaerts’s wRC+ relative to other fastballs went from a 113 to a 61 in those two timeframes (park-unadjusted).
And look where Bogaerts is hitting balls, too. The following charts aren’t only fastballs — it’s all balls put in play by him. In the beginning of the year, he was sending line drives to all fields, getting grounders through the infield, and pulling balls deep. In the second part, you see lots of shallow line drives and fly balls — in fact, in the three months covered in the second half of the gif, there are all of TWO ground balls that make it through the infield, and only one opposite-field line drive that makes it to the outfield. There are more popups, too, and the fly balls seem to be shallower generally.
Now, some of the things you’re seeing here could be a result of teams shifting on him more as the year goes on, which is why no ground balls are getting to the outfield. But more likely it is Bogaerts making weaker contact and allowing fielders to get to his ground balls; in addition, he isn’t hitting many ground balls up the middle, where you’re more likely to get hits.
Take a look at the gif above. What you’re seeing is the same thing as the last one, only with the at bat result instead of the batted ball type. In the first part of the year, you see Bogaerts getting lots of hits to all parts of the outfields, including deep balls that end up in home runs or doubles. Then, many more balls end up in the infield and most of his hits are shallow balls to the outfield.
This doesn’t look good, especially since it’s been going on for so long. I’m no expert in swing mechanics, so I can’t tell you why Bogaerts has suddenly stopped hitting everything, fastballs especially. My guess is that it’s just a long, long slump that is happening because he’s only 21 years old. I don’t think this means that we should give up on him. He has already proven that he can hit, albeit in a very small sample.
Take a look at the list of all the players who had a wRC+ below 100 in a year where they were listed as top-10 prospects by Baseball America (since 1997):
There are a lot of really good players on that list. Bogaerts is one of the worst there in terms of wRC+ that year, but he’s also younger and higher-ranked than most. That doesn’t concern me. What concerns me is that almost all of the ones on that list from the past few years haven’t succeeded: all of the ones that have are from 2009 or earlier. This is consistent with semi-recent findings by Jeff Zimmerman that the aging curve is changing: hitters don’t improve with age anymore. Further research by Brian Henry shows that players who start in the big leagues at 21 tend to stay steady with their production for a while, then decline at around 30. This does not bode well for the young Red Sox shortstop.
But who knows? If I had to guess, I would say that Bogaerts regains his stroke and starts driving the ball more. He’s too good of a hitter to be so bad against fastballs. After all, he is only 21 years old. Plus… I mean, look at that swing. Number two prospects go far. All the prospects on the list above ranked first or second had some degree of success in the majors, with the exception of Rocco Baldelli, who was good until injuries ruined his career. (Brandon Wood didn’t have enough plate appearances to qualify for the list.) If he was playing a little over his head in April and May, he’s been playing well below his feet for the past three months, and those kinds of things tend to right themselves in time.
Note: This was written before Bogaerts played today, Monday 9/1. He went 1 for 4 with a double and two strikeouts.
The plate discipline stats at FanGraphs are fantastic. Lots of stuff can be drawn from them – and the articles I’ve linked to are only scratching the surface both of what’s already been done and what we can still do with them. So many things are great about them: they’re very stable, they’re good indicators of other statistics that might be less stable, and they’re completely isolated to the batter and pitcher. The problem is, they only go back to 2002 (for the BIS ones) or 2007 (for the Pitchf/x ones). So what if we want plate discipline numbers for players from before then? How do we know how often Babe Ruth or Willy Mays or Hank Aaron swung at pitches inside the zone, or how often they made contact on pitches outside the zone?
Regressions, that’s how.
Using the Baseball Info Solutions plate discipline data (only because it goes back farther, and also has the SwStr% and F-Strike% stats), I ran a multivariate regression with R to find all the plate discipline numbers provided on FanGraphs: O-Swing%, Z-Swing%, Swing%, O-Contact%, Z-Contact%, Contact%, Zone%, F-Strike%, and SwStr%. I used the following stats as variables in the regression: BB% and K% (for obvious reasons), ISO (I figured maybe power hitters were more prone to different types of numbers), BABIP (same goes for hitters who could maintain higher BABIPs), HR% (same thinking as ISO), and OBP (combining hitting ability and plate discipline, even if somewhat crudely). My dataset was every qualified hitting season from 2002 until now. I couldn’t use any batted ball data (GB%, FB%, etc.) as a variable because we don’t have that prior to 2002 either. So that was what I had.
Some stats worked better than others – for example, the r^2 for Contact% was an excellent 0.8089, while for Zone% it was a measly 0.1551. And of course, it’s possible that the coefficients would be different for prior eras than they are now. But, hey, what can you do. Here, first, are the r^2s for each statistic, so you know how much to trust each number:
And now for the actual coefficients:
(If you can’t see the whole table, here)
Note that for all the percentages – including the plate discipline numbers – I turned them into decimals: for example, a BB% of 12.5% will be turned into 0.125, and an O-Swing% of 20.7 will be 0.207, so if you’re calculating these on your own, keep that in mind.
There are some strange things in that table that I wouldn’t really expect. Here’s one: a higher O-Contact% leads to a much lower OBP, or maybe vice-versa*. The only logical explanation that I can offer is that balls out of the zone that are hit fall for hits less often, so BABIP and therefore OBP will each be lower. League average BABIP on balls out of the zone in 2013 (based on a quick search I did at Baseball Savant) was .243, well below the league average of .297. But that -1.89 coefficient still seems like too much. Some more explainable ones: HR% and Zone% are strongly inversely correlated (the more dangerous a hitter’s power, the fewer pitches they’ll see in the zone), BB% and O-Swing% are strongly inversely correlated (the fewer pitches you swing out of the zone, the more you’ll walk), and K% and SwStr% are fairly strongly correlated (the more you swing and miss, the more you’ll strike out).
To first examine these stats a little bit more, let’s take a look at the regressed numbers for players who have played since 2002 and compare them to their real numbers. Here’s Barry Bonds’s 2002 (the asterisk means it is the regressed, not real, numbers)
Hmmm… not off to the greatest start. Z-Contact, Zone, F-Strike, and Contact percentages were pretty good, but the rest were waaaay off. O-Swing gave out a negative number. As good as Barry Bonds might have been, that just isn’t possible. SwStr% is also pretty off – only pure contact hitter Marco Scutaro has ever posted a swinging strike percentage that low since the BIS data started being recorded, and nobody has every been lower. (Scutaro had 1.5% in 2013). Not terrible, though. How about Miguel Cabrera’s 2013 MVP season?
Hey, not bad! The O-Swing is pretty off, and the O-Contact is a little too low, but other than that they’re all fairly close to the real values. I think we’re getting somewhere here.
Now let’s look at some seasons for which we don’t have the real numbers. Ever wondered how Babe Ruth’s plate discipline was in 1927?
Not bad. We obviously can’t verify this (at least not without a lot of painstaking effort, and likely not at all) but that seems reasonable enough. Average contact rates in the zone, good swinging strike percentage, not very many swings outside the zone. How about the king of plate discipline, Ted Williams? Here are his numbers from his 1957 season, in which he had a 223 wRC+ and nearly 10 WAR:
Wow. Really, really good. That’s a crazy low O-Swing% and yet a fairly middle-of-the-pack Swing% overall, which goes exactly with what we would expect from a man with a famed, disciplined plate approach. He rarely swung and missed, making contact on nine out of ten swings and only whiffing on one out of every twenty five pitches he saw.
I could really go on and on, but I think I’ll end by showing you the (supposed) single worst season by these regressed plate discipline numbers between 1903 and 2001. See if you can guess who it is:
This will shock you, I’m sure, but… It’s Dave Kingman.
* Most likely, high O-Contact% causes low OBP and not vice-versa. This brings us into dangerous territory, however, because we don’t want to assume that everyone with low OBP has high O-Contact%. There are other factors that go into low OBP as well, and somebody could very easily have a low O-Contact% and a low OBP. It is like this with each of the regressed stats. But this is the best I could really do.
Pitching statistics are mostly based on rates. Sure, we have innings pitched, and if you want to annoy me, you can talk about wins and losses, and of course there are the “three true outcomes” of strikeouts, walks, and home runs, plus WAR. But nobody ever looks at how many runs a pitcher was above or below average. Runs allowed isn’t all that common of a statistic; you’re more likely to see ERA or RA9. Even strikeouts and walks are often expressed as a percentage of all plate appearances, or as an amount per nine innings. The defense-independent ERA estimators like FIP and its spinoffs are rates, just like ERA. Where batters have regressed plus-minus or counting stats like wRAA and wRC, pitchers have nothing.
However, there has got to be some value in counting stats for pitchers. If we want to know how many more or fewer runs a team would allow by putting in an average pitcher instead of any given pitcher, that statistic would be able to tell us. So I’m going to present here three basic different numbers, one based off of FIP, one based off of straight runs allowed, and the third based off of linear weights. Each will be in two forms – raw runs allowed and runs allowed above or below average. I’ll call them FIP-Runs and FIP-Runs Above Average (FIPRAA), wRC-Runs and wRC-Runs Above Average (wRCRAA), and, obviously Runs and Runs Above Average (RAA). Kind of long, yeah, but I didn’t want to call the FIP one FRAA because that already exists.
All data was obtained from FanGraphs except for the singles, doubles, triples, home runs, walks, and HBP against used to calculate wRC; FanGraphs does not have some of those so I used Baseball-Reference.
This should be pretty simple. Take a pitcher’s FIP. FIP is scaled like ERA, but we want to scale it to RA9 because we want to scale it to all the runs a pitcher allows, not just the earned runs. To do this, multiply it by a constant that changes yearly – for 2013, it was 1.08. This is the league RA9 divided by the league ERA.
Take that figure, multiply it by the number of innings they pitched, and divide by nine to get the number of runs that FIP says a pitcher should have allowed. That’s their FIP-Runs. Great. But now how do we get that to express how many runs above average they were worth?
Well, we already have FIP-, which tells us how much better a pitcher’s FIP was than league average – and it’s already park- and league-adjusted, to boot. So what I did was subtract each pitcher’s FIP- from 200 to get the inverse of their FIP- (so if a pitcher had a 90 FIP-, the inverse would be 110) and multiplied that by their FIP-Runs. That gave me the number of runs (adjusted to the park and league) that an average pitcher would give up in the same number of innings. Just subtract the pitcher’s FIP-Runs, and you have your FIPRAA. And while I was at it, I did the same thing for xFIP. You can find the numbers at the end of the article. But first, the next part:
I didn’t have to calculate these like I did with FIP-Runs because the numbers are already there – it’s just the total number of runs a pitcher allowed. I did, however, have to calculate RAA, for which I used the same method as I did with FIP-Runs: find the RA9- (this was not park- or league-adjusted because I calculated it myself), take the inverse, multiply it by the runs, and subtract the runs from that. Piece of cake. Now for the last, and hardest, part of this:
These were tricky. I had to calculate each pitcher’s wRC against by first finding their wOBA against with the raw number of singles, doubles, triples, etc. they gave up and converting that into wRC. (I’ve actually already put this in a community post in a different form). But from there, I could follow the same instructions as before: use the wRC against as runs allowed, find the wRC/9- (if you didn’t read the article I linked to earlier, wRC/9 is just wRC against scaled like RA/9), and from those two find the wRCRAA (quite a mouthful, I know).
So, without further ado, here are the numbers (sorted by FIPRAA):
There you have it. I have to say, I am surprised a little that the very best pitchers don’t even save their team 30 runs over the course of a season compared to an average pitcher – at least if you trust these numbers. Of course, on the other end, we have pitchers costing their team 50+ runs, but I suppose it’s easier to be bad than it is to be good.
Obviously, the more you pitch, the more these numbers can go up/down, so these shouldn’t be used to draw too many conclusions – I still think the plain old rate stats are better. But this certainly is valuable if you want to know exactly how many runs a pitcher can save. For the record, I would trust the FIP-based one the most, because it is defense-independent while still being descriptive, unlike xFIP; also, it is park- and league-adjusted unlike wRCRAA and RAA. The others obviously have their uses, though. This is not a predictive stat, because it can’t predict innings pitched, but I think it does a pretty good job being a descriptive one.
You are reading this right now. That is a fact. Since you are reading this right now, many things can be reasonably inferred:
1. You probably read FanGraphs at least fairly often
2. Since you probably read FanGraphs at least fairly often, you probably know that there are a lot of differing opinions on the MVP award and that many articles here in the past week have been devoted to it.
3. You probably are quite familiar with sabermetrics
4. You probably are either a Tigers fan or think that Mike Trout should have won MVP, or both
5. You might know that Josh Donaldson got one first-place vote
6. You might even know that the first-place vote he got was by a voter from Oakland
7. You might know that Yadier Molina got two first-place votes, and they both came from voters from St. Louis
8. You might even know that one of the voters who put Molina first on his ballot put Matt Carpenter second
9. You might be wondering if there is any truth to the idea that Miguel Cabrera is much more important to his team than Mike Trout is
I have thought about many of those things myself. So, in this very long 2-part article, I am going to discuss them. Ready? Here goes:
Lots of people wanted Miguel Cabrera to win the MVP award. Some of you reading this may be shocked, but it’s actually true. One of the biggest arguments for Miguel Cabrera over Mike Trout for MVP is that Cabrera was much more important and “valuable” than Trout. Cabrera’s team made the playoffs. Trout’s team did not. Therefore anything Trout did cannot have been important. Well, let’s say too important. I don’t think that anybody’s claiming that Trout had zero impact on the game of baseball or the MLB standings whatsoever.
OK. That’s reasonable. There’s nothing flawed about that thinking when it’s not a rationale for voting Cabrera ahead of Trout for MVP. As just a general idea, it makes sense: Cabrera had a bigger impact on baseball this year than Trout did. I, along with many other people in the sabermetric community, disagree with the fact that that’s a reason to vote for Cabrera, though. But the question I’m going to ask is this: did Cabrera have a bigger impact on his own team than Trout did?
WAR tells us no. Trout had 10.4 WAR, tops in MLB. Cabrera had 7.6 – a fantastic number, good for 5th in baseball and 3rd in the AL, as well as his own career high – but clearly not as high as Trout. Miggy’s hitting was out of this world, at least until September, and it’s pretty clear than he could have at least topped 8 WAR easily had he stayed healthy through the final month and been just as productive as he was April through August. But, fact is, he did get hurt, and did not finish with a WAR as high as Trout. So if they were both replaced with a replacement player, the Tigers would suffer more than the Angels. Cabrera was certainly valuable – if replaced by a replacement, the 7 or 8 wins the Tigers would lose would probably not be enough to win them the AL Central. But take Trout out, and the Angels go from a mediocre-to-poor team to a really bad one. The Angels had 78 wins this year, and that would have been around 68 (if we trust WAR) without Trout. That would have been the 6th worst total in the league. So, by WAR, Trout meant more to his team than Cabrera did.
But WAR is not the be all and end all of statistics (though we may like to think it is sometimes). Let’s look at this from another angle. Here’s a theory for you: the loss of a key player on a good team would probably not hurt that team as much because they’re already good to begin with. If a not-so-good team loses a key player, though, the other players on the team aren’t as good so they can’t carry the team very well.
How do we test this theory? Well, we have at our disposal a fairly accurate and useful tool to determine how many wins a team should get. That tool is pythagorean expectation – a way of predicting wins and losses based on runs scored and allowed. So let’s see if replacing Trout with an average player (I am using average and not replacement because all the player run values given on FanGraphs are above or below average, not replacement) is more detrimental to the Angels than replacing Cabrera with an average player is to the Tigers.
The Angels, this year, scored 733 runs and allowed 737. Using the Pythagenpat (sorry to link to BP but I had to) formula, I calculated their expected win percentage, and it came out to .497 – roughly 80.6 wins and 81.4 losses*. That’s actually significantly better than they did this year, which is good news for Angels fans. But that’s not the focus right here.
Trout, this year, added 61.1 runs above average at the plate and 8.1 on the bases for a total of 69.2 runs of offense. He also saved 4.4 runs in the field (per UZR). So, using the Pythagenpat formula again with adjusted run values for if Trout were replaced by an average hitter and defender (663.8 runs scored and 741.4 runs allowed), I again calculated the Angels’ expected win percentage. This came out to be .449 – roughly 72.7 wins and 89.3 losses. 7.9 fewer wins than the original one. That’s the difference, for that specific Angels team, that Trout made. Now, keep in mind, this is above average, not replacement, so it will be lower than WAR by a couple wins (about two WAR signifies an average player, so wins above average will be about two less than wins above replacement). 7.9 wins is a lot. But is it more than Cabrera?
Let’s see. This year, the Tigers scored 796 runs and allowed 624. This gives them a pythagorean expectation (again, Pythagenpat formula) of a win percentage of .612 – roughly 99.1 wins and 62.9 losses. Again much better than what they did this year, but also not the focus of this article. Cabrera contributed 72.1 runs above average hitting and 4.4 runs below average on the bases for a total of 67.7 runs above average on offense. His defense was a terrible 16.8 runs below average.
Now take Cabrera out of the equation. With those adjusted run totals (728.3 runs scored and 607.2 runs allowed) we get a win percentage of .583 – 94.4 wins and 67.6 losses. A difference of 4.7 wins from the original.
Talk about anticlimactic. Trout completely blew Cabrera out of the water (I would say no pun intended, but that was intended). This makes sense if we think about it – a team with more runs scored will be hurt less by x fewer runs because they are losing a lower percentage of their runs. In fact, if we pretend the Angels scored 900 runs this year instead of 733, they go from a 96.5-win team with Trout to an 89.8-win team without. Obviously, they are better in both cases, but the difference Trout makes is only 6.7 wins – pretty far from the nearly 8 he makes in real life.
The thing about this statistic is that it penalizes players on good teams. Generally, statistics such as the “Win” for pitchers are frowned upon because they measure things that the pitcher can’t control – just like this one. But if we want to measure how much a team really needs a player, which is pretty much the definition of value, I think this does a pretty good job. Obviously, it isn’t perfect: the numbers that go into it, especially the baserunning and fielding ones, aren’t always completely accurate, and when looking at the team level, straight linear weights aren’t always the way to go; overall, though, this stat gives a fairly accurate picture. The numbers aren’t totally wrong.
Here’s a look at the top four vote-getters from each league by team-adjusted wins above average (I’ll call it tWAA):
This is interesting. Like expected, the players on better teams have a lower tWAA than the ones on good teams, just as we discussed earlier. One notable player is Yadier Molina, who despite being considered one of, if not the best catcher in the game, has the lowest tWAA of anyone on that list. This may be because he missed some time. But let’s look at it a little closer: if we add the 2 wins that an average player would provide over a replacement-level player, we get 5.1 WAR, which isn’t so far off of his 5.6 total from this year. And the Cardinals’ pythagorean expectation was 101 wins, so obviously under this system he won’t be credited as much because his runs aren’t as valuable to his team. Another factor is that we’re not adjusting by position here (except for the fielding part), and Molina is worth more runs offensively above the average catcher than he is above the average hitter, since catchers generally aren’t as good at hitting. But if Molina was replaced with an average catcher, I’m fairly certain that the Cardinals would lose more than the 3 games more that this number suggests. They might miss Molina’s game calling skills – if such a thing exists – and there’s no way to quantify how much Molina has helped the Cardinal pitchers improve, especially since they have so many rookies. But there’s also something else, something we can quantify, even if not perfectly. And that’s pitch framing. Let’s add the 19.8 runs that Molina saved (measured by Statcorner) to Molina’s defensive runs saved (for which, by the way, I used the Fielding Bible’s DRS, since there is no UZR for catchers – that may be another reason Molina’s number may seem out of place, because DRS and UZR don’t always agree; Trout’s 2013 UZR was 4.4, and his DRS was -9. Molina did play 18 innings at first base, where he had a UZR of -0.2. We’ll ignore that, though, since it is such a small sample size and won’t make such a big difference).
Here is the table with only Molina’s tWAA changed, to account for pitch framing:
Now we see Molina move up into 5th place out of 8 with a much better tWAA of 5.4 – more than 2 wins better than without the pitch framing, and about 7.4 WAR if we want to convert from wins above average to wins above replacement. Interesting. I don’t want to get into a whole argument now about whether pitch framing is accurate or actually based mostly on skill instead of luck, or whether it should be included in a catcher’s defensive numbers when we talk about their total defense. I’m just putting that data out there for you to think about.
But as I mentioned before, I used DRS for Molina and not UZR. What if we try to make this list more consistent and use DRS for everyone? (We can’t use UZR for everyone.) Let’s see:
We see Trout go down by almost a win and a half here. I don’t really trust that, though, because I really don’t think that Mike Trout is a significantly below average fielder, despite what DRS tells me. DRS actually gave Trout a rating of 21 in 2012, so I don’t think it’s as trustworthy. But for the sake of consistency, I’m showing you those numbers too, with the DRS and UZR comparison so you can see why certain people lost/gained wins.
OK. So I think we have a pretty good sense for who was most valuable to their teams. But I also think we can improve this statistic a little bit more. Like I said earlier, the hitting number I use – wRAA – is based off of league average, not off of position average. In other words, if Chris Davis is 56.3 runs better than the average hitter, but we replace him with the average first baseman, that average first baseman is already going to be a few runs better than the average player. So what if we use weighted runs above position average? wRAA is calculated by subtracting the league-average wOBA from a player’s wOBA, dividing by the wOBA scale, and multiplying by plate appearances. What I did was subtract the position average wOBA from the player’s wOBA instead. So that penalizes players at positions where the position average wOBA is high.
Here’s your data (for the defensive numbers I used UZR because I think it was better than DRS, even though the metric wasn’t the same for everyone):
I included here both the regular and position-adjusted wRAA for all players for reference. Chris Davis and Paul Goldschmidt suffered pretty heavily – each lost over a win of production – because the average first baseman is a much better hitter than the average player. Molina got a little better, as did Carpenter, because they play positions where the average player isn’t as good offensively. Everyone else stayed almost the same, though.
I think this position-adjusted tWAA is probably the most accurate. And I would also use the number with pitch framing included for Molina. It’s up to you to decide which one you like best – if you like any of them at all. Maybe you have a better idea, in which case you should let me know in the comments.
As I mentioned in my introduction, Josh Donaldson got one first-place MVP vote – from an Oakland writer. Yadier Molina got 2 – both from St. Louis writers. Matt Carpenter got 1 second-place vote – also from a St. Louis writer. Obviously, voters have their bias when it comes to voting for MVP. But how much does that actually matter?
The way MVP voting works is that for each league, AL and NL, two sportswriters who are members of the BBWAA are chosen from each location that has a team in that league – 15 locations per league times 2 voters per location equals 30 voters total for each league. That way you won’t end up with a lot of voters or very few voters from one place who may be biased one way or another.
But is there really voter bias?
In order to answer this question, I took all players who received MVP votes this year (of which there were 49) and measured how many points each of them got per 2 voters***. Then I took the amount of points that each of them got from the voters from their chapter and found the difference. Here’s what I found:
Where points is total points received, points/2 voter is points per two voters (points/15), points from city voters is points received from the voters in the player’s city, % homer votes is the percentage of a player’s points that came from voters in his city, and homer difference is the difference between points/2 voter and points from city voters. Charts are sorted by homer difference.
I don’t know that there’s all that much we can draw from this. Obviously, voters are more likely to vote for players from their own city, but that’s to be expected. Voting was a little bit less biased in the AL – the average player received exactly 1 point more from voters in their city than from all voters in the AL, whereas that number in the NL was 1.21. 8.08% of all votes in the AL came from homers compared to 8.31% in the NL. If you’re wondering which cities were the most biased, here’s a look:
Where all these numbers are just the sum of the individual numbers for all players in that city.
If you’re wondering what players have benefited the most from homers in the past 2 years, check out this article by Reuben Fischer-Baum over at Deadspin’s Regressing that I found while looking up more info. He basically used the same method I did, only for 2012 as well (the first year that individual voting data was publicized).
So that’s all for this article. Hope you enjoyed.
*I’m using fractions of wins because that gives us a more accurate number for the statistic I introduce by measuring it to the tenth and not to the single digit. Obviously a team can’t win .6 games in real life but we aren’t concerned with how many games the team won in real life, only their runs scored and allowed.
**Carpenter spent time both at second base and third base, so I used the equation (Innings played at 3B*average wOBA for 3rd basemen + Innings played at 2B*average wOBA for 2nd basemen)/(Innings played at 3B + Innings played at 2B) to get Carpenter’s “custom” position-average wOBA. He did play some other positions too, but very few innings at each of them so I didn’t include those. It came out to about .307.
***Voting is as such: Each voter puts 10 people on their ballot, with the points going 14-9-8-7-6-5-4-3-2-1.
Before I explain to you what this new metric – SkaP – does, I am first going to warn you that I can’t provide you with a formula or individual statistics for it. It’s a theory right now, and something for which I need access to data I don’t have in order to find a formula.
This statistic was inspired in part by Colin Dew-Becker’s article the other day here on FanGraphs Community Research. In his article, he argued that the the way a hit or out is made matters – not just the result of the hit or out. A single to the outfield, for example, is more likely to send a runner from first to third or from second to home than an infield single. Likewise, a flyout is more likely to advance runners than a strikeout is.
This statistic was also inspired in part by UZR. UZR attempts to quantify runs saved defensively by a player partially by measuring if they make a play that the average fielder would not. In the FanGraphs UZR Primer, Mitchel Lichtman explains that
“With offensive linear weights, if a batted ball is a hit or an out, the credit that the batter receives is not dependent on where or how hard the ball was hit, or any other parameters.”
This means that a line drive into the gap in right-center that is a sure double but is caught by Andrelton Simmons ranging all the way from shortstop (OK, maybe that was an exaggeration) will only count for an out, even though in almost any other situation it would be a double. The nature of linear-weight based hitting statistics (and most other hitting statistics as well) is that they are defense-dependent. Hitters have been shown to have much more control over their batted balls than pitchers do, which is why so far only pitchers have commonly used defense-independent statistics, but it would probably be useful for hitting too, no?
Now, if we want a defense-independent and linear weights-based hitting statistic, it would not be possible to formulate something similar to the hitting equivalent of the current model of tERA (or tRA) because that generalizes all batted balls into categories such as grounders, line drives, or fly balls, because hitters can control where and how hard and at what angle their batted balls are hit at least to some extent. Instead, what I would use is something more similar to a hitting equivalent of this version of tERA I found on a baseball blog. What that article proposes is something much more detailed than what we have now (by the way, tERA has been supplanted by SIERA, but is still an interesting theory). Their idea is that instead of finding expected run and out values for grounders, line drives, and fly balls, find the expected run value for a ball, to use their words, “with x velocity and y trajectory [that] lands at location z.” This is similar to UZR in that exact (or as close to exact as possible) batted-ball data is processed and the expected run/out values are calculated.
So now for the statistic: SkaP, or Skill at (the) Plate, is a number that uses all that batted-ball data to find the expected run and out values of each at-bat. It would weight the following things: home runs (although maybe a regressed version could use lgHR/FB%*FB instead), walks, strikeouts, HBP, and each ball put in play by the player. This makes it so that it is not defense-dependent, and so that Andrelton Simmons catching that sure double does not penalize the hitter. I haven’t calculated this statistic, though, so I don’t know if this would be best as a rate, counting, or plus-minus statistic (maybe all three?).
There’s one catch to this, however: Skill at the Plate is really only a measure of skill at the plate. It doesn’t account for some batters’ ability to stretch hits or beat out infield singles. Billy Hamilton is going to be more likely to reach on an infield single than Prince Fielder. However, this stat would treat them both the same, and not reward Hamilton’s speed for allowing him to reach base on what might have most likely been an out. It would be very hard to separate defense independence and batter-speed independence for hitting statistics, though, and I’m not sure it’s possible to do without an extreme amount of effort. Maybe a crude solution would be to quantify a player’s speed using Spd, UBR or BsR and add it somehow to this statistic.
I can’t calculate this myself, as I don’t have access to Baseball Info Solutions’s (or some other database that tracks batted balls) data. FanGraphs does, however, and I would love to see this looked into further.
wRC is a very useful statistic. On the team level, it can be used to predict runs scored fairly accurately (r^2 of over .9). It can also be used to measure how much a specific player has contributed to his team’s offensive production by measuring how many runs he has provided on offense. But it is rarely used for pitchers.
Pitching statistics are not so much based on linear weights and wOBA as they are on defense-independent stats. I think defense-independent stats are fine things to look at when evaluating players, and they can provide lots of information about how a pitcher really performed. But while pitcher WAR is based off of FIP (at least on FanGraphs), RA9-WAR is also sometimes looked at. Now, if the whole point of using linear weights for batters is to eliminate context and the production of teammates, then why not do the same for pitchers? True, pitchers, especially starters, usually get themselves into bad situations, unlike hitters, who can’t control how many outs there are or who’s on base when they come up. But oftentimes pitchers aren’t better in certain situations, as evidence by the inconsistency of stats such as LOB%. So why not eliminate context from pitcher evaluations and look at how many runs they should have given up based on the hits, walks, and hit batters they allowed?
To do this, I needed to go over to Baseball-Reference, as FanGraphs doesn’t have easy-to-manipulate wOBA figures for pitchers. Baseball-Reference doesn’t have any sort of wOBA stats, but what they do have is the raw numbers needed to calculate wOBA. So I put them into Excel, and, with 50 IP as my minimum threshold, I calculated the wOBA allowed – and then converted that into wRC – for the 330 pitchers this year with at least 50 innings.
Next, I calculated wRC/9 the same way you would calculate ERA (or RA/9). This would scale it very closely to ERA and RA/9, and give us a good sense for what each number actually means. (The average wRC/9 with the pitchers I used was 3.95; the average RA/9 for the pitchers I used was 3.96). What I found was that the extremes on both sides were way more extreme (you’ll see what I mean soon), but overall it correlated to RA/9 fairly closely (the r^2 was .803).
Now, for the actual numbers:
The first thing that jumps out right away is that Koji Uehara had a wRC/9 of 0.08. In other words, if that was his ERA, he would give up one earned run in about 12 complete game starts if he were a starter, which is ridiculous. The second thing that jumps out is that most of the top performers are relievers – in fact, 12 out of the top 13 had fewer than 80 innings, with the only exception being Clayton Kershaw. Also, the worst pitchers by wRC/9 had a wRC/9 much higher than their ERA or RA/9. Pedro Hernandez, for example, had a wRC/9 of 7.68, and there were 6 pitchers over 7.00. Kershaw actually has a wRC/9 that is lower than his insane RA/9, so maybe he’s even better than his fielding-dependent stats give him credit for.
But wait! There’s more! The reason we have xFIP is because HR/FB rates are very unstable. So let’s incorporate that into our wRC/9 formula and see what happens (we’ll call this one xwRC/9):
Not a huge difference, although we do see Uehara’s number go down, which is incredible, and Tanner Roark’s – the second-best pitcher by wRC/9 – nearly double. Also, Tyler Cloyd becomes much worse, and is now the worst pitcher by almost half a run per nine innings. Kershaw’s wRC/9 goes up by a considerable amount, so much so that his xwRC/9 is now higher than his RA/9. All in all, however, xwRC/9 actually has a smaller correlation with RA/9 (an r^2 of .638) than wRC/9 does, so it isn’t as useful.
Now, logically, the people who outperformed their wRC/9 the most would have high strand (LOB) rates, and vice-versa. So let’s look at the ten players who both outperformed and underperformed their wRC/9 the most. The ones who underperformed:
We can see that everyone here – except for Koji Uehara, who had the fourth-highest LOB% out of all pitchers with 50 innings – is below the league average of 73.5%. Only Uehara and Joel Peralta are above 70%. Clearly, a low LOB% makes you allow many more runs than you should. But what about Koji Uehara? How did he allow all those runs (10, yeah, not a lot, but his wRC/9 was way lower than his RA/9) without allowing many baserunners to score and not allowing many damaging hits? If you know, let me know in the comments, because I have no idea.
Now for the people who outperformed their wRC/9:
Just what you would expect: high LOB%’s from all of them (each is above the league average). Stephen Fife and Alex Sanabia are the only ones below 80%.
So what does this tell us? I think it’s a better way to evaluate pitchers than runs or earned runs allowed since it eliminates context: a pitcher who lets up a home run, then a single, then three outs is not necessarily better than one who lets up a single, home run, then three outs, but the statistics will tell you he is. It might not be as good as an evaluator as FIP, xFIP, or SIERA, but for a fielding-dependent statistic, it might be as good as you can find.
Note: I don’t know why the pitchers with asterisks next to there name have them; I copied and pasted the stats from Baseball-Reference and didn’t bother going through and removing the asterisks.