Automate the Strike Zone, Unleash the Offense

by foxinsox

February 19, 2015

Hello World! As a software developer, automation is my way of life. It kills me to see the tedious yet important job of calling balls and strikes performed at less than 90% accuracy. Worse, catcher framing is now a thing, which is essentially baseball’s equivalent of selling the flop.

Today, I want to talk about how automating the strike zone would affect the MLB run-scoring environment. Don’t we all want to save the environment?

Let’s pretend that before the 2014 season, home plate umpires were fitted with earpieces giving them a simplified Pitch f(x) feed of balls and strikes. They heard a high beep for a strike, a low beep for a ball. They then called balls/strikes exactly as they were told, resulting in a perfect zone.

Experiment 1: Walks/Strikeouts overturned

The most damaging ball/strike errors happen when ball 4 or strike 3 was thrown but not called. Sometimes the umpire is redeemed by luck, and a walk/strikeout happens eventually anyway, but not nearly every time. Think of how many times you’ve seen a 3–0 count where a ball was called a strike, only to have the hitter swing and ground out harmlessly on the 3–1 pitch.

For these experiments, let’s look at short description of the situation, the number of instances of that situation in 2014, and net runs that would have been added if a perfect zone had been called.

Data courtesy of Baseball Savant; click on a situation to see the query I used.

Situation	Instances	Net Runs (Rough)
Strike 3 thrown, batter safe	146	-88
Ball 4 thrown, eventual out	691	415
Difference	545	327 (.07 team runs per game)

Are you surprised? The umpires made 545 more extra outs than extra ‘safes’. Using a rough walk minus out run differential of 0.6 runs, we see that a perfect zone would have added 0.07 runs per game. Interesting, but not huge.

But think again—this effect isn’t limited to plate appearances that should have ended with a bad call. We all know that the count affects the expected run value all on its own. So let’s expand this to all ‘bad calls’ in 2014.

Experiment 2: All balls/strikes called correctly

Balls and strikes don’t obviously translate to runs. So I’ll use someone else’s much more careful research and use a ball minus strike run value of approximately 0.14 runs. Here’s what happens when we apply a perfect zone to all balls and strikes. Brace yourself!

Situation	Instances	Net Runs (Rough)
Strike thrown, ball called	8724	-1212
Ball thrown, strike called	40557	5633
Difference	31833	4422 (.91 runs per game per team)

Whoa. Are you kidding me? If we’d run last season with a perfect strike zone, the run environment would go from 4.07 runs/game to nearly 5! That’s the highest level since 2000. I know what you’re thinking: this is crazy, and probably wrong.

Sanity checking

I also found this result to be larger than expected, to say the least. So let’s back up, check the mirrors, and look at the frequency of called strikes vs. balls.

Called Ball	233421
Called Strike	123922
Difference	109499

There are a ton more called balls than called strikes. This makes sense because batters are more likely to swing at strikes. But the ratio of balls to strikes is only about 2:1, that doesn’t account for the 5:1 ratio among ‘mistaken’ balls/strikes! How do we account for this?

A possible explanation

Here we dive into speculation, but stay with me for a minute. Maybe there’s a logical explanation.

What sequence of events must occur in order for a Pitch f(x) strike to become a ball?

Pitcher throws in strike zone: ~45% (Zone %)
Hitter takes said pitch in the strike zone: ~35% (100% – Z-Swing %)
Umpire makes bad ‘ball’ call: ~10%

By this ridiculously rough method, we would expect bad ‘ball’ calls about 1.5% of the time (0.10 * 0.35 * 0.45). Compare that with the observed value of 1.2%

Conversely, the sequence for a Pitch f(x) ball becoming a called strike is as follows:

Pitcher throws out of zone: ~55% (100% – Zone %)
Hitter takes said pitch outside the strike zone: 70% (100% – O-Swing %)
Umpire makes bad ‘strike’ call ~15%

We therefore expect bad ‘strike’ calls about 5.7% of the time (0.15 * 0.7 * 0.55). Again, compare that to the observed value of, wait for it, 5.7%. Boom!

More reasons to automate

Automatic things happen faster. As a professional automator, I guarantee this will speed up play, by more than you think. I bet the umpire thinks for about 1 second on every pitch. That’s just the obvious part.
Set the umpires free. Focusing on something as difficult as calling balls/strikes squeezes out the umpire’s attention on other important matters, such as enforcing pace of play.
Crazy cool things will happen. For example, we will finally see what happens to an insane control pitcher’s K-BB%. V-Mart might never strike out!

I welcome your comments, criticisms, or even praise 🙂

Changes in WAR from 2000 to 2014 (Part 4)

Whiffs of Success? Theo Rolls the Dice

Steve Lind software developer and baseball lover living in San Francisco

12 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Rawson

10 years ago

If your analysis and assumptions are correct, this is incredible information. Thanks for doing it. However, it also starts the conspiracy-theory gears to grinding. Why on earth would MLB tolerate such incompetence? There is no way the league office could be unaware that the umpires are missing so many calls in ways that impact the same so markedly. Why would the league tolerate performance that artificially elevated the value of pitching at the expense of hitting? I’m baffled, especially seeing it laid out this clearly.

foxinsoxMember since 2016

10 years ago

Reply to Rawson

Thanks for your comment!

I don’t think there’s any conspiracy going on. This is probably the way things have always been, though it may be helpful to look back to 2007 to see if things have gotten worse recently. MLB is notorious for being slow to make changes, and the umpires have a tough job.

Also, it’s possible that MLB simply doesn’t realize how anti-offense bad ball/strike calls are in aggregate. Send them a letter!

Green Mountain Boy

10 years ago

I was never a fan of potentially automating ball-strike calls, but this piece is eye-opening and is starting to get me to reconsider the benefits. What would be the basic parameters of your system? And specifically, how do you propose to deal accurately with different stances (upright vs crouched) and different heights of batters (5’6″ to 6’7″)? How would you deal with pitches that nip the front corner of the plate in the zone but end up 6″ outside or in the dirt when caught?

AdamD

10 years ago

Reply to Green Mountain Boy

The Pitch f(x) system is able to handle this with the help of its operators. The operators set the height of the strike zone for each individual according to their stance, so they can stretch it for tall guys or guys that standupright, and shrink it for short guys or crouching Tigers.

As for the second part of your question, the Pitch f(x) system doesn’t consider the strikezone a flat plane (e.g., where the ball touches the front of plate). The strike zone is a polyhedra (a pentagonal prism?), so if the ball touches that polyhedron at any point, it would be considered a strike. So that would take into account the slider you reference that touches the corner of the plate, but is caught six inches outside or in the dirt. It also allows for the high strike that crosses the front of the plate high, but dips into the back of the zone.

Peter Jensen

10 years ago

Reply to AdamD

AdamD – Wrong on both your statements. Pitch Fx setting of lower and upper bounds of the strike zone has been shown to be an extremely inconsistent process that results in as much as a 6 inch difference from game to game. Because of this many of the studies on the Pitch Fx strike one have substituted upper and lower bounds calculated from the batter’s height and handedness as suggested by Mike Fast in a BP article several years ago.

Pitch Fx doesn’t physically measure the ball’s location anywhere near the strike zone. Its measurements begin 35 to 45 feet from the back of the plate and end 8 to 12 feet from the back of the plate depending on the stadium. From this information an estimation of the ball’s path to the plate is derived. The only ball position of the ball near the plate is at the front of the plate and that is an extrapolation based on the estimated patha and not a measurement.

10 years ago

let us also build cyborgs that don’t break and watch them play instead of humans all together

foxinsoxMember since 2016

10 years ago

Reply to Ok

Since technology removes the humanity, I assume you’d like to remove scoreboards, lights, TV, trains to go to the game, and the internet you’re currently using 😉

vonstott

10 years ago

Reply to foxinsox

It is generally advisable not to engage trolls.

Steven

10 years ago

Hitters would ostensibly be able to tighten up the location of pitches they’re looking for and might see an increase in quality of contact or power.

rory

10 years ago

I wonder if there’d be a way to cross-check the errors of the calls against the salaries of the players. My guess is that you’d also find that bad calls disproportionately favor stars over rookies.

Trolly McTrollerson

10 years ago

Micro chips embedded in the lettering and knees of the uniforms broadcast signals that serve as the top and bottom of the zone while lasers project upward from home plate to create a real time grid that senses the microchipped ball as it Passes thru the zone.

Also, magnets or something…

Captain Tenneal

10 years ago

I used Markov chains to predict league strikeout and walk rates if every call were made perfectly. I then ran the predicted rates through the FIP formula and came to a similar conclusion, that offense would be expected to rise nearly a run per game. So it’s not crazy to think that perhaps MLB should get on this if they want people to keep watching baseball.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG