Author Archive

Reworking and Improving the Outcome Machine

This post was inspired by a couple of articles that I remembered reading from Jonah Pemstein back in 2014. The intention of those posts was to predict the result of any given batter/pitcher matchup, dubbed the “Outcome Machine.” Have you ever wondered what the probability Mike Trout strikes out when he steps into the box against Justin Verlander? Of course, there are variables that are specific to any plate appearance (umpires/situation/stadium/etc.) that are harder to quantify, but it set out to predict the outcome in a vacuum. Trout vs. Verlander and nothing else (For the record, in 2020, I would estimate the answer is about 27.5%).

Being able to predict the outcomes in sports would take most of the fun out of being a spectator, sure, but I still found myself coming back to those articles. While reading and re-reading in an attempt to understand the logic and fool around with the equations, I came to a few questions of my own:

  • With all of the hubbub of juiced balls and increased launch angles, do equations that were based on data from 2003-13 still apply to the game today?
  • The regression equations were composed of the at-bat result and the stats of the batter and pitcher from the same year. This stuck out to me as an issue because it means the player’s performance later in the season, say in July, influences the prediction of an at-bat in May, and to a lesser extent, the result of that specific at-bat is already baked into that season’s performance. Shouldn’t you use data exclusively before a given at-bat to predict the outcome? Hindsight is 20/20, after all.

Eventually curiosity got the best of me and I decided to emulate the original exercise. Before I really start to nerd out on the inner workings, you can find this iteration of the Outcome Machine as a Google Sheet here. You can either select a pitcher/batter combination through the dropdown or hard key in the rates in a custom, hypothetical matchup below that. League average is set by default to projections for 2020 but can be updated as desired in the custom matchup. I would note that the preset statistics in this tool are total projections for 2020 but not broken out into L/R splits, as to my knowledge that data is currently behind a paywall. Read the rest of this entry »