Introducing relOBP

by cuniculus

June 14, 2018

On June 24, 2017, Joe Mauer went 2-for-2 with two walks against Corey Kluber. It was, perhaps, one of his most impressive performances of all last season; Kluber allowed only one other base runner in seven innings, and struck out 13. For this performance, and many others, he got his second Cy Young last year.

But on that particular day – June 24 – the numbers say (my numbers say) Corey Kluber had reached his peak. In the third inning, when he faced Joe Mauer, he was more difficult to reach base against than any other pitcher, at any time, in 2017.

I present… relOBP! (Also its cousins, relAVG and relSLG, but they can wait.)

relOBP attempts to quantify how good a batter is at reaching base, and how good the opposing pitcher is at preventing that, for each play in the season. relOBP therefore requires two numbers, a pitcher-score and a batter-score. Let’s jump in

[Editor’s note

Defining relOBP

Math follows, but not particularly hard math.

Suppose that, in a given play, the probability of a batter getting on base is given as P(reach) = o*c, where o = the effective number of opportunities the pitcher gives per plate appearance (average 1.000) and c = the batter’s rate of capitalizing on those opportunities.

If every pitcher always has o-score = 1.000, then the c-score is simply a batter’s on base percentage (or, for my purposes, his on base percentage in the surrounding +-30 plate appearances).

But once we have an initial estimate for each batter’s c-score by plate appearance, we can use it to estimate the opposing pitcher’s o-score at the appearance as well. How? Well, let’s suppose that a pitcher has a constant o-score o over an interval of plate appearances 1…n. Then the expected number of batters to reach base is E[Total_Reach] = o*∑c_i, where c_i is the c-score of the ith batter (at this point, his on-base percentage over his recent and future plate appearances). Thus, the pitcher’s o-score in the center of the interval can be estimated as the number of batters that do reach base, divided by the sum of their c-scores.

We now have first estimates of o- and c-scores; but with better o-scores, we can calculate better c-scores, and vice versa. Thus, we can just iterate this process until o- and c-scores converge (which they do, rather rapidly).

I did 20 iterations of this process on all plays from the regular season 2017, calculating c-scores for on-base percentage, slugging percentage, and batting average, though as I said I’ll focus on on-base percentage in this article. For purely arbitrary reasons, my intervals were a batter’s previous and next 30 plate appearances, and a pitcher’s previous and next 50 plate appearances (when available); however, the final numbers are not especially sensitive to interval sizes.

/end math

relOBP Leaders

Let’s check out some leaderboards! Consider the following table, showing the Top 10 batters by relOBP in 2017, as well as the actual Top 10.

The following table shows a player’s relOBP as his average c-score, weighted by how many adjacent plate appearances were available to calculate it (e.g. early and late season plate appearances are weighted less heavily).

RK	Player	relOBP	avg. opponent o-score	PA
1	Joey Votto	0.462	0.990	707
2	Mike Trout	0.455	0.975	507
3	Aaron Judge	0.436	0.969	678
4	Jose Altuve	0.434	0.952	662
5	Paul Goldschmidt	0.411	1.000	665
6	Justin Turner	0.408	1.017	543
7	Kris Bryant	0.407	1.010	666
8	Tommy Pham	0.405	1.003	530
9	Anthony Rendon	0.404	1.008	605
10	Eric Hosmer	0.402	0.961	671

And 2017’s actual season leaders:

RK	Player	OBP	PA
1	Joey Votto	0.454	707
2	Mike Trout	0.442	507
3	Aaron Judge	0.422	678
4	Justin Turner	0.415	543
5	Tommy Pham	0.411	530
6	Jose Altuve	0.41	662
7	Kris Bryant	0.409	665
8	Paul Goldschmidt	0.404	665
9	Anthony Rendon	0.403	605
10	Freddie Freeman	0.403	514

While the order changes a bit, the top-10 are mostly the same (Freddie Freeman falls from 10th in OBP to 16th in relOBP, however, and is replaced by Eric Hosmer, who was 11th in OBP).

Note, however, that not everyone has faced the same quality of competition. Justin Turner faced weaker pitchers, on average; he had effectively had 1.017 as many opportunities to get on base as a hitter facing neutral pitching (~9 more PA over the course of the season), while Jose Altuve faced tougher competition, effectively losing 32 plate appearances.

relOBP and Luck

relOBP lets us see (approximately) how good the opposing pitcher is in each plate appearance (of course, we’re not accounting for handedness in our simple model).

For example, here’s a season of plate appearances from Mike Trout.

When the o-score (orange) dips low, that’s a tough matchup; when it spikes, it’s an easy one. In gray, you can see Mike Trout’s actual rolling OBP, and in blue, his c-score. When the c-score is higher than the OBP, Trout was hitting better than he appeared (given the matchup); and vice-versa. You might wonder what the cumulative difference in those scores is; did he gain or lose expected times on base? On net, he was unlucky; he lost almost 6 times on base because he faced harder pitching.

Trout was not the hardest hit, however. That would be Miguel Cabrera (and indeed, much of the Detroit Tigers’ lineup):

Player	Times on base lost
Miguel Cabrera	16.3
Justin Upton	14.3
Jose Altuve	13.9
Ian Kinsler	13.8
Nicholas Castellanos	13.6
Manny Machado	12.5
Melky Cabrera	12.0
Jonathan Schoop	11.8
Adam Jones	11.5
Jose Abreu	11.2

The Tigers were facing some tough pitching, apparently. One wonders if some of the other fine hitters here (Altuve; Machado; Abreu) were particularly prone to face difficult relievers, and if this would explain their presence on the list. On the flip side, the luckiest player of 2017 in this respect was Ozzie Albies, with 9.7 bases.

I have a lot more graphs, but that’s really not the point of the article: I only wish to introduce relOBP to you all, and now you’ve met it and can be friends.

I would love to hear your reaction to relOBP, to the methodology behind it, and any suggestions you might have for improving it. Also, if you would like to see my code or play with some of the data, let me know in the comments!

5 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

tricher00

7 years ago

Hey I’d love to check out your code

cuniculus

Reply to tricher00

Thanks! Code + filtered version of 2017 data here: https://github.com/polkerty/relobp.

It’s not super heavily commented. However, to get a leaderboard you want to set the “OUTPUT_PLAY_STRENGTH” constant at the top to false, which by default it’s not, otherwise you’ll get a CSV showing the o/c scores for each plate appearance. Also of note – while Statcast includes plaintext batters’ names, pitchers appear just as IDs, but if you google “mlb [id]” you’ll find the player of interest. And also of note – the code is set up to work nicely with generating hitting leaderboards, but not pitching leaderboards. I guess what I’m saying is it’s not professional quality haha. Again, if you have any questions, fire away.

pedeysRSox

In other words, the worse the lineup, the more their opportunities are affected.

Reply to pedeysRSox

This is a good observation! It seems difficult to remove the bias of one’s lineup when evaluating the opposing pitcher. Maybe we could look at the pitcher’s last start and next start, but then we also lose some precision. Any ideas on this front are welcome.

Also of note – the 2017 Tigers were 16/30 in MLB in OBP, so the effect isn’t *all that* strong.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG