Author Archive

Computer Vision and Pitch Framing

Quantifying catcher framing was a huge step for the analytical community in trying to understand the position more fully. It has allowed evaluators to have more accurate numbers on what a catcher is adding to the team. It has seemingly also brought more organizational focus to framing at the expense of blocking across the league, as can be seen in the increased prevalence of catching from a knee.

Perhaps all this work will be moot if robo-umpires are ever implemented, but teams clearly see marginal advantages to be gained by research and development on this topic for now. With this in mind, the quantification of a catcher’s ability to frame is only the first step in the journey. Next we should be looking to find what makes a catcher good or bad at framing in order to improve player development practices. Finding this from a statistical perspective is tricky, as we don’t really have easily accessible data on what the catcher is doing behind the plate other than the video of it happening. This may not be the case on the team side as markerless motion capture is a developing technology in this space which can record more data, but publicly, we just have video. Instead of sitting down and trying to watch thousands of pitches like surely many coaches have done, I’ll try my hand with OpenCV and Tensorflow. Read the rest of this entry »


Learning a Lesson From Basketball Analytics

I read an interesting article here by Brian Woolley which attempted to adjust batter performance for the quality of pitching they face. It’s interesting because we tend to assume when we look at a player’s performance that they faced more or less the same quality of competition as everyone else, despite the fact we know, especially in small samples, that may not be the case. This is even more evident when we look at minor league performance, where the quality of competition can vary wildly from one prospect to another. How can we discard the assumption of equal quality of competition and try to get a more accurate picture of a player’s performance? In basketball analytics, this quality of competition piece is an even more pronounced issue because of the fact that players are selected to play in specific situations by a coach, unlike the lineup card which dictates when everyone bats.

There is a metric in basketball called Regularized Adjusted Plus-Minus (RAPM) which attempts to value individual players based on their contribution to the outcome of a game while accounting for the quality of the teammates and opponents when on the court. The initial idea in the public sphere came from Dan T. Rosenbaum in a 2004 article detailing Adjusted Plus-Minus (APM). You can read more about the basketball variant in the linked article, but I’ll describe how I adapted it to a baseball context.

To setup the system, I created a linear regression model which takes each player as an independent variable, every plate appearance as an observation, and the outcome of the plate appearance as the dependent variable we’re trying to predict. Specifically, if a player is not part of a plate appearance, the value for their independent variable is a 0, if they are the pitcher, they are a -1, and if they are the batter, they are a 1. Note that for players who appear as both a pitcher and hitter, they are given two independent variables so we can measure their impact on both sides of the ball separately. The outcome of the plate appearance is defined in terms of weighted On-Base Average (wOBA). Read the rest of this entry »


Quantifying Rumor Mongering in the Baseball Media Ecosystem

In what feels like interminable scrolling of the internet this offseason waiting for something to finally happen, it occurred to me to ask, does any of this rumor-mongering actually tell us anything? It is certainly strange that we as consumers of baseball, a modified game of tag with hitting and throwing a ball, care so much about the internal machinations of billion-dollar organizations and the personal decision-making calculus of people we will never meet. Regardless of this peculiarity, I myself still spend hours a week wondering if George Springer would be willing to play for a team who doesn’t have a guaranteed home stadium for the foreseeable future and subsequently will be located in a foreign country in Canada.

This interest is what feeds the North American baseball media ecosystem and employs thousands of people, from reporters to web designers, social media managers to news aggregators, and many more. I wouldn’t necessarily argue that this content holds no value if it is biased or inaccurate, because the time we spend consuming this offseason content really just satiates our longing for baseball when we can’t watch our favorite teams live. But the question remains, does this content hold any predictive value, or are we just fooling ourselves?

This article is based on data scraped from MLB Trade Rumors, the leading aggregator of rumors around baseball, on December 9, 2020. I pulled the last 2,000 posts that each team was tagged in and analyzed what information we’re actually getting from reading and discussing the rumors and reports inside the baseball media ecosystem. To begin, we can observe the volume of rumors for teams by seeing how many days one would have to go back to reach a cumulative 2,000 posts. Read the rest of this entry »


Leverage and Pitcher Quality Through the Eyes of Managers

Much criticism has been levied onto baseball managers and their inability to see past the archetypal dominant closer who closes pitches in save situations. Writers in the statistical community have observed and critiqued the many flaws which come with the save statistic and how it’s perceived by fans, managers, and baseball decision-makers as far back at least 2008 [1]. Accumulating saves is a function of opportunity and degree of difficulty that is certainly not the best way to get at a relief pitcher’s ability to get outs. More objective methods such as ERA and its estimators, like Fielding Independent Pitching (FIP) and Skill-Interactive Earned Run Average (SIERA). are better ways to evaluate a pitcher’s talent, and Win Probability Added (WPA) is better for measuring a pitcher’s importance to winning specific games. This criticism has definitely been heard in the intervening years by people running ball which, can be shown by the number of pitchers who are getting saves on each team and the variance of save totals for a given team.

A team with high variance in their save totals means that there is one player who accumulates a lot of saves and some number who have very few, opposed to lower variance representing a more even distribution of saves among pitchers. This variance metric is heavily negatively correlated (-0.74) with the number of pitchers a team has record a save in a given season. This means the more pitchers recording a save on a team, the more likely the distribution is to be equitable and the insistence on using your best pitcher in only a save situation is lower. Based on this analysis, somewhere between 2008 and 2011 was the peak on the capital “C” Closer in the majors. A rather precipitous drop occurred in 2016 and has continued on a downward trajectory to the point where last year saw the most equitable distribution of saves among teams since 1987, excluding the lockout-shortened 1994 campaign. Read the rest of this entry »