An Introduction to Calculated Runs Expectancy

by Walter King

January 12, 2015

Introduction first: my name is Walter King and over the next few weeks I plan on sharing my counter to Wins Above Replacement, which I call PEACE: Player Evaluator and Calculated Expectancy. The engine behind PEACE is Calculated Runs Expectancy, which is what this article will cover.

Calculated Runs Expectancy (CRE) is an analytical model that estimates runs produced by a player, team, or league for any number of games. CRE operates under the assumption that every single play on the field is relevant to output and thus can be translated into a statistical measure.

In its general form, the Calculated Runs Expectancy formula looks like this:

CRE = (√ {[(Bases Acquired) * [(Potential Runs) * (Quantified Advancement) / (Total Opportunities)]] / Outs Made²} * (Total Opportunities) + (Hit and Run Plays) + Home Runs) / Runs Divisor, relative to the league

This formula was reached by following a particular line of logical reasoning, which starts with the assumption that the singular objective of baseball is to win every game (well, duh!). Winning every game mathematically requires one of two scenarios: either a team allows zero runs, or they score an infinite number of runs, both resulting in one team scoring 100% of the runs, assuring 100% of the wins. Because the objective is to win the game, and the only way to assure victory is to score the most runs, then the only two ways players can contribute to winning are by scoring runs or by preventing the opponent from doing so. This sounds painfully simple, but we have to establish that metrics are limited in usefulness if there is no clear link to runs, and therefore wins. This assumption forces us to define what makes a run in terms of statistics.

With so many different statistics to represent the happenings on the field, it can be tough to form a clear definition. Keep it simple. Break down what a run is in the simplest way possible: a run scored is when a player safely touches all four bases, ending by touching home plate. That’s it. A team must acquire at least 4 bases in order to score 1 run, so the first formula we can use in our analysis is Bases Acquired:

Bases Acquired = TB + BB + HBP + ROE + XI + SH + SF + SB + BT (bases taken)

This is a complete representation of the number of individual bases a hitter acquired, which is often overlooked as valuable information.

My second definition of a run comes directly from Bill James’ Runs Created statistic: to score a run, a batter needs to first reach base, and then advance among the bases until they reach home plate. This focus looks at offensive production through the completion of those two smaller goals. These concepts have already been identified by James using three basic principles: On-Base Factor, Advancement Factor, and Opportunity Factor to calculate runs created.

But what composes these factors? Well, this is where I venture slightly away from James, attempting to encompass a more complete representation of a hitter in my calculations. I’ve altered them a bit and given them new names:

Potential Runs = TOB (times on base) – CS – GDP – BPO (basepath outs)
Quantified Advancement = TB + SB + SH + SF + BT
Total Opportunities = PA + SB + CS + BT + BPO

With these now defined, my modified Runs Created formula looks like this:

Modified Runs Created = [(TOB – CS – GIDP – BPO) * (TB + SB + SH + SF + BT)] / (PA + SB + CS + BT + BPO)

Bases Acquired and Runs Created are counting statistics, but we want rate statistics. I believe strongly in the principles of VORP, which asserts that production must always be measured relative to cost in terms of outs. To amalgamate our measures of offensive production and outs made, we simply divide each by outs made to create two “per out” statistics.

So what we have now are two different measures of a batter’s efficiency; one that calculates bases acquired per out made and another that finds calculated runs scored per out made. By multiplying the two, we can incorporate two different statistics of efficiency in our evaluation of hitters. Conceptually, this represents a reconciliation of two different philosophies on how runs are produced. We’ll call the resulting quantity Offensive Efficiency.

Offensive Efficiency = (Bases Acquired * Runs Created) / Outs Made²

I particularly like this formula because the two key components that comprise it are largely considered obsolete by modern sabermetrics. Both Total Average (bases/outs) and Runs Created are from the 1970s and are throwbacks to better uniforms and simpler ways of thinking. If you were to approach a stathead today championing total average or runs created as “the answers,” they would first dismiss you, and then suggest more modern metrics. Much like the struggle sabermetrics saw when first attempting to become a respected pursuit, modern sabermetrics seems to scoff at the idea that older, simpler calculations can be valuable. But both Total Average and Runs Created per Out are logically sound in their function; they break down the aspects of hitting into real-life objectives that correspond to real-life results. Offensive Efficiency will definitely tell you which batters performed most efficiently, but it is sensitive to outliers. To counter this, recall the general CRE equation:

CRE = (√ {[(Bases Acquired) * [(Potential Runs) * (Quantified Advancement) / (Total Opportunities)]] / Outs Made²} * (Total Opportunities) + (Hit and Run Plays) + Home Runs) / Runs Divisor, relative to the league

Multiplying Offensive Efficiency by Total Opportunities creates a balance between efficient and high-volume performers. The next step, inspired by Base Runs, is to add “Hit and Run Plays” along with Home Runs to the equation because those are instances when a run is guaranteed to score. Hit and Run Plays are my name for situational baserunning plays (found on Baseball-Reference) that result in a batter advancing more bases than the ball in play would suggest. For example, when a batter hits a single with a runner on first, the runner would be definitely expected to reach second base. Reaching third or scoring, however, would indicate a skillful play (or a hit and run) by an opportunistic baserunner. Three stats make up Hit and Runs Plays: 1s3/4 (reaching third or home from first on a single), 2s4 (scoring from second on a single), and 1d4 (scoring from first on a double).

At this point, all that’s left is the Runs Divisor. If you’re following along at home, an individual batter season without a Runs Divisor would be somewhere between 200-500, while a team single season would typically be between 2000-3000. The Runs Divisor is specific to each season and league (so the 2014 AL and NL both have unique divisors), and is the average optimal divisor that would result in actual runs scored, relative to the specific league. Let’s use a 2-team league as an example. Team A scores a raw CRE of 2500 while scoring 700 actual runs, so their optimal divisor would be 3.57. Team B, on the other hand, has a raw CRE of 2250 and scored 600, a divisor of 3.75. The league’s Runs Divisor would be the average of the two: 3.66. This divisor would be used for every individual player in that league, as well. Divisors vary every year, but always remain very similar.

A full list of Runs Divisors from the seasons 1975-2014 can be seen here:

The average divisor across that time span was 3.7631, with a standard deviation of just 0.0268. This provides strong evidence of the relationship between CRE and runs; the two are related in the same way across generations of ballplayers. When we graph the results of CRE against actual runs for all 1114 teams in that timespan, we can see some very convincing results:

The R2 value (0.9682) corresponds to an average difference between actual and calculated runs of 14.02. When compared to other run estimators, the differences are significant:

Runs Estimator (Creator), Average, R2

Base Runs (David Smyth), 18.77, 0.9441
Estimated Runs Produced (Paul Johnson), 18.15, 0.9480
Extrapolated Runs (Jim Furtado), 18.33, 0.9515
Runs Created (Bill James), 20.01, 0.9383
Weighted Runs Created (Tom Tango), 19.37, 0.9443

The gap between CRE and the 5 other estimators is consistent across the entire span of 40 seasons.

There is a lot of new information to take in here, so feel free to comment below with any questions or feedback. Part 2 will be uploaded in a few days.

7 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

novaether

9 years ago

Do the 5 runs estimators you compared against also use annual runs divisors? If not, it seems unfair to compare against them. What’s the R^2 without normalizing with the annual runs divisors?

Walter King

Reply to novaether

This is a good point. Of the 5 I compared to, only wRC has yearly adjustments to the formula. When I applied the 40-year average runs divisor (3.7631) the R2 value was 0.9611, with an average difference between actual and calculated of 15.63 runs, which is still a significant gap above the others.

Thanks, Comcast

Who was the outlier that outperformed CRE by ~60 runs?

Reply to Thanks, Comcast

2013 Cardinals

ndrobinson

Reply to Walter King

Does anything in particular stand out about them? I see they did have an awfully high RE24. Is that just good/lucky situational hitting?

Stanatee the Manatee

Reply to ndrobinson

That was one of the years that their batting average with RISP was crazy high, which would easily throw off run expectancy numbers.

This article covers that team’s RISP performance pretty well:

http://hardballtalk.nbcsports.com/2013/09/30/risp-cardinals-shattered-the-all-time-clutch-hitting-record/

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG