On June 24, 2017, Joe Mauer went 2-for-2 with two walks against Corey Kluber. It was, perhaps, one of his most impressive performances of all last season; Kluber allowed only one other base runner in seven innings, and struck out 13. For this performance, and many others, he got his second Cy Young last year.
But on that particular day – June 24 – the numbers say (my numbers say) Corey Kluber had reached his peak. In the third inning, when he faced Joe Mauer, he was more difficult to reach base against than any other pitcher, at any time, in 2017.
I present… relOBP! (Also its cousins, relAVG and relSLG, but they can wait.)
relOBP attempts to quantify how good a batter is at reaching base, and how good the opposing pitcher is at preventing that, for each play in the season. relOBP therefore requires two numbers, a pitcher-score and a batter-score. Let’s jump in
Math follows, but not particularly hard math.
Suppose that, in a given play, the probability of a batter getting on base is given as P(reach) = o*c, where o = the effective number of opportunities the pitcher gives per plate appearance (average 1.000) and c = the batter’s rate of capitalizing on those opportunities.
If every pitcher always has o-score = 1.000, then the c-score is simply a batter’s on base percentage (or, for my purposes, his on base percentage in the surrounding +-30 plate appearances).
But once we have an initial estimate for each batter’s c-score by plate appearance, we can use it to estimate the opposing pitcher’s o-score at the appearance as well. How? Well, let’s suppose that a pitcher has a constant o-score o over an interval of plate appearances 1…n. Then the expected number of batters to reach base is E[Total_Reach] = o*∑c_i, where c_i is the c-score of the ith batter (at this point, his on-base percentage over his recent and future plate appearances). Thus, the pitcher’s o-score in the center of the interval can be estimated as the number of batters that do reach base, divided by the sum of their c-scores.
We now have first estimates of o- and c-scores; but with better o-scores, we can calculate better c-scores, and vice versa. Thus, we can just iterate this process until o- and c-scores converge (which they do, rather rapidly).
I did 20 iterations of this process on all plays from the regular season 2017, calculating c-scores for on-base percentage, slugging percentage, and batting average, though as I said I’ll focus on on-base percentage in this article. For purely arbitrary reasons, my intervals were a batter’s previous and next 30 plate appearances, and a pitcher’s previous and next 50 plate appearances (when available); however, the final numbers are not especially sensitive to interval sizes.
Let’s check out some leaderboards! Consider the following table, showing the Top 10 batters by relOBP in 2017, as well as the actual Top 10.
The following table shows a player’s relOBP as his average c-score, weighted by how many adjacent plate appearances were available to calculate it (e.g. early and late season plate appearances are weighted less heavily).
|RK||Player||relOBP||avg. opponent o-score||PA|
And 2017’s actual season leaders:
While the order changes a bit, the top-10 are mostly the same (Freddie Freeman falls from 10th in OBP to 16th in relOBP, however, and is replaced by Eric Hosmer, who was 11th in OBP).
Note, however, that not everyone has faced the same quality of competition. Justin Turner faced weaker pitchers, on average; he had effectively had 1.017 as many opportunities to get on base as a hitter facing neutral pitching (~9 more PA over the course of the season), while Jose Altuve faced tougher competition, effectively losing 32 plate appearances.
relOBP and Luck
relOBP lets us see (approximately) how good the opposing pitcher is in each plate appearance (of course, we’re not accounting for handedness in our simple model).
For example, here’s a season of plate appearances from Mike Trout.
When the o-score (orange) dips low, that’s a tough matchup; when it spikes, it’s an easy one. In gray, you can see Mike Trout’s actual rolling OBP, and in blue, his c-score. When the c-score is higher than the OBP, Trout was hitting better than he appeared (given the matchup); and vice-versa. You might wonder what the cumulative difference in those scores is; did he gain or lose expected times on base? On net, he was unlucky; he lost almost 6 times on base because he faced harder pitching.
Trout was not the hardest hit, however. That would be Miguel Cabrera (and indeed, much of the Detroit Tigers’ lineup):
|Player||Times on base lost|
The Tigers were facing some tough pitching, apparently. One wonders if some of the other fine hitters here (Altuve; Machado; Abreu) were particularly prone to face difficult relievers, and if this would explain their presence on the list. On the flip side, the luckiest player of 2017 in this respect was Ozzie Albies, with 9.7 bases.
I have a lot more graphs, but that’s really not the point of the article: I only wish to introduce relOBP to you all, and now you’ve met it and can be friends.
I would love to hear your reaction to relOBP, to the methodology behind it, and any suggestions you might have for improving it. Also, if you would like to see my code or play with some of the data, let me know in the comments!
I'm a math/CS major at Bob Jones University in Greenville, SC.