Category: Research | Page 7

Archive for Research

Adjusting Batter Performance by the Quality of the Opposing Pitcher

February 12, 2021

In the 2020 season, American League MVP José Abreu faced 107 different pitchers, including the top four in Cy Young voting point totals (Shane Bieber, Trevor Bauer, Yu Darvish, and Kenta Maeda). Bauer was the only of the four not to allow a home run to Abreu in 2020. In comparison, MVP Runner-up José Ramírez faced 69 of the pitchers that Abreu faced. The third-place DJ LeMahieu faced a completely different set of pitchers, not a single one overlapping with Abreu’s.

While these batters were compared by their offensive production, it appears Abreu faced more challenging pitching. Using FanGraphs’s xFIP- (for which a lower number is better) as a measure of a pitcher’s quality, Abreu was up against a 96.75 xFIP- on average while LeMahieu faced pitchers with at a 105.93 mark. Both LeMahieu’s weighted on-base average (wOBA) of .429 and Abreu’s .411 were exceptional, but is the 18-point difference truly reflective of the difference between the two players’ seasons?

Overview

To answer the question, I derived a value with a similar intuition to Baseball Prospectus’s Deserved Run Average (DRA). DRA is a measure that adjusts a pitcher’s performance by the quality of the batters they are facing. This statistic also accounts for numerous context factors to attempt to better isolate the pitcher’s contribution. DRA shows that the quality of the batter can be influential in a pitcher’s performance, so it makes sense that the quality of pitcher is influential in a batter’s performance.

As for the statistic I will be working with, I choose to refer to this as “pitcher-adjusted weighted on-base average,” or pwOBA. The intuition is simple: a batter should get credit for offensive production against challenging pitching. The formula for pwOBA is based on the formula for wOBA. With wOBA, every event has a run value (ex. 1.979 for home runs in 2020) and a batter gets credit for these values accumulated over the course of the season. The sum of these values is then divided by (AB + BB – IBB + SF + HBP). Read the rest of this entry »

PitchingBot: Using Machine Learning To Understand What Makes a Good Pitch

by Cameron Grove

February 3, 2021

People have always been looking to understand what makes a good pitch. With advances in pitch tracking technology and computing power, we can begin to use large amounts of data to answer this question more definitively. I’ve created a model called PitchingBot which uses machine learning to try and find what makes a good pitch.

Machine learning describes a general class of algorithms that are very flexible and “learn” patterns from large amounts of data. This means I don’t have to tell PitchingBot what I think a good pitch is, but instead I can give it a load of pitches (and the results of those pitches) and it will train itself to recognize a pitch that gives good results.

I intend to investigate a couple of key questions:

Does PitchingBot reach the same conclusions as conventional wisdom about what makes a good pitch?

Naively, I would expect a good pitch to have the following qualities: high velocity, plenty of movement, and good location in the corner of the strike zone. I will look at whether these are true for PitchingBot and how the definition of a good pitch changes with the ball/strike count.

Can we meaningfully compare and evaluate pitchers using PitchingBot?

Are the pitchers who are best according to PitchingBot those who get the best results? PitchingBot isn’t very useful if it does not agree with real pitcher performance. Read the rest of this entry »

Building a Hitting Prospect Projection Model

by Joshua Mould

February 1, 2021

How well do you think you can predict the future of a minor leaguer? My computer may be able to help. Towards the end of the regular season, I found the prospects page at FanGraphs and started experimenting with it. I have always had a lot of fun thinking about the future and predicting outcomes, so I decided to try to build a model to predict whether or not a prospect would make it to the majors. I had all the data I needed thanks to FanGraphs, and I had recently been looking into similar models built by others to figure out how I could accomplish this project. I realized that all these articles I was reading detailed the results of their models, but not the code and behind-the-scenes work that goes into creating them.

With that in mind, I decided to figure it out on my own. I had a good idea of what statistics I wanted to use, but there were a few issues I needed to consider before I started throwing data around:

Prospects can play multiple years at a single level.
Not all prospects play at all levels of the minor leagues.
What do I do with players who skipped levels?
How can I make this model useful and practical?

Prospects playing multiple years at a single level isn’t too difficult to deal with because I can just aggregate the stats from those seasons. The fact that not all prospects play in every level of the minor leagues before reaching the majors is tough, however, because that makes for a lot of missing data that needs to be handled before building the model. I decided to replace all the missing values with the means of the existing data, and I created variables to indicate whether or not a player’s season stats for that particular level of the minor leagues were real. To make this model useful, I would want to take out certain variables. For example, I figured I wouldn’t need or want Triple-A stats included in the model because typically once a player has reached that level of the minors, you are more interested in how well they will do in the majors. Read the rest of this entry »

Applying Peta’s Wagering Methodology in 2020

by Bill Deere

January 21, 2021

For those unfamiliar with Joe Peta’s groundbreaking 2013 book Trading Bases, the author is a successful financial analyst and former Wall Street trader. Seriously injured in a traffic accident, Peta’s long and painful recovery included employing his professional skills to develop a baseball wagering methodology. His book is about more than that though, including observations about the 2008 economic meltdown and sports wagering writ large. Peta’s anecdotes alone make it worth the read — imagine being hit by a NYC ambulance and then being billed by the city for the ride to the hospital.

At its highest level, the Peta methodology is based on the utilization of a team’s previous season performance adjusted for cluster luck (a regression of OBP/SLG/ISO to arrive at “hits per run”) and WAR, as well as upcoming-season projected WAR. Arriving at an estimate of a team’s season win total, it is then used to identify and capitalize on inefficiencies between the model’s estimates and wagering lines.

Peta’s work produces two products: a season-long projection of wins (the long game) and the ability to handicap individual games through adjustments to each team’s lineup, starting pitcher, and home field. While conceptually straightforward, it is time-consuming to operate, requiring familiarity with Excel (particularly the ability to link sheets). In lieu of Peta’s regression calculation of cluster luck, I utilized FanGraphs’ calculation of BaseRuns, convinced of its utility as a proxy after reading a 2019 article at samkonmodels.com arguing it was one of a number of comparable and readily available such calculations. Read the rest of this entry »

The 3-0 Count Dilemma

by jchalem

January 11, 2021

While it might not appear so, baseball games constantly portray economic thought, such as in the mathematical model of game theory. There are many ways game theory takes place, but a classic example is the prisoner’s dilemma. Imagine a police officer is interrogating two suspects of robbing a bank together. The police officer has some evidence to put them in jail, but a confession would go a long way. Each suspect is contemplating confessing to the crime. If both suspects keep quiet, they will each receive five years in jail. If one suspect confesses and the other keeps quiet, the one who kept quiet will receive 20 years in jail while the suspect who confessed will receive just one year. If both confess, they each receive 10 years in jail. The logical choice for each suspect is called the dominant strategy. The end result, or the combination of each suspects decision, is called the Nash Equilibrium. By using game theory, we come to the conclusion that each suspect should confess to the crime, meaning they will each get 10 years in prison. I won’t go much into why this is the case, but feel free to research more about game theory and the Nash Equilibrium on your own.

What does this have to with baseball? We can think of each pitch as game theory, with each suspect as the pitcher and batter. Instead of confessing to a crime, the pitcher is contemplating throwing a ball in the strike zone while the batter is contemplating swinging. While the prisoner’s dilemma has a Nash Equilibrium, not only does a pitch to a batter not have a Nash Equilibrium, but the combination of decisions is constantly changing. If the batter’s dominant strategy is to swing, then pitchers will throw more balls outside the batter’s reach. If the pitcher’s dominant strategy is to throw a ball, then the batter will take more pitches.

We could observe this thought process for every pitch thrown. However, let’s look at one type of pitch: 3-0 counts. If you are the batter, it might seem obvious to take the pitch. The worst-case scenario is you end up with a 3-1 count. If you are the pitcher, it might seem obvious to throw an easy strike. You do not want to walk the batter, and you know the batter doesn’t want to swing and risk giving you an easy popup to get out of good count. So I guess the batter should take every pitch and the pitcher should throw the ball right down the middle every time. Read the rest of this entry »

wOBA – xwOBA vs. Defensive Metrics

by Matt Boyd

January 6, 2021

Introduction

For quite some time, wOBA has been used as a well-known, all-around statistic for measuring the output of a hitter. wOBA doesn’t treat the many different ways of getting on base equally. Instead, it gives credit to the hitter for the value of each outcome, whether that be a single, home run, or walk. For more information, FanGraphs goes more in-depth here.

With the emergence of Statcast, xwOBA has been introduced. xwOBA uses launch angle, exit velocity, and sometimes sprint speed of the batter to give an expected value of wOBA on batted balls. xwOBA can tell us at what exit velocity and launch angle the most meaningful outcomes are produced. That is important to know because we can now see if specific teams or players are underperforming, overperforming, or are performing as expected based on these two stats. wOBA and xwOBA are not in perfect correlation for hitters with at least 50 plate appearances in 2019, but they still have a very strong relationship (r = 0.918). As plate appearances increase, the two should eventually level out to be the same. At what amount of plate appearances that occurs at, I don’t know.

What I want to know is what goes into a team’s defense if they are allowing a larger xwOBA than wOBA. That would mean they are taking expected hits for the opposing team and turning them into outs. I got to thinking about this idea while watching the ALCS between the Tampa Bay Rays and the Houston Astros, specifically Game 3, and I have now had time to dive deeper into my initial question. The Rays put together a beautifully played defensive game while their offense seemed to struggle outside of Randy Arozarena.

In Game 3, the Rays pitching staff combined to give up an xwOBA of 0.337, but they only allowed a combined wOBA of 0.300. Their defense saved 0.037 points of wOBA from the Astros to take a commanding 3-0 lead. Read the rest of this entry »

Quantifying Rumor Mongering in the Baseball Media Ecosystem

by Peter L'Oiseau

December 30, 2020

In what feels like interminable scrolling of the internet this offseason waiting for something to finally happen, it occurred to me to ask, does any of this rumor-mongering actually tell us anything? It is certainly strange that we as consumers of baseball, a modified game of tag with hitting and throwing a ball, care so much about the internal machinations of billion-dollar organizations and the personal decision-making calculus of people we will never meet. Regardless of this peculiarity, I myself still spend hours a week wondering if George Springer would be willing to play for a team who doesn’t have a guaranteed home stadium for the foreseeable future and subsequently will be located in a foreign country in Canada.

This interest is what feeds the North American baseball media ecosystem and employs thousands of people, from reporters to web designers, social media managers to news aggregators, and many more. I wouldn’t necessarily argue that this content holds no value if it is biased or inaccurate, because the time we spend consuming this offseason content really just satiates our longing for baseball when we can’t watch our favorite teams live. But the question remains, does this content hold any predictive value, or are we just fooling ourselves?

This article is based on data scraped from MLB Trade Rumors, the leading aggregator of rumors around baseball, on December 9, 2020. I pulled the last 2,000 posts that each team was tagged in and analyzed what information we’re actually getting from reading and discussing the rumors and reports inside the baseball media ecosystem. To begin, we can observe the volume of rumors for teams by seeing how many days one would have to go back to reach a cumulative 2,000 posts. Read the rest of this entry »

Why a World Series Appearance Might Not Save the Rays in Tampa Bay

by Michael Lortz

December 18, 2020

As Major League Baseball prepares for 2021, teams are bracing for another season of COVID-19 related financial problems. There will undoubtedly be a smaller-than-usual capacity of fans at ballparks nationwide, and depending on the municipality, there might not be fans at all. Teams are hoping 2021 is not as bad as 2020. According to an analysis by the Tampa Bay Business Journal, the New York Yankees missed over $437 million in expected income. Near the bottom of the list, the Tampa Bay Rays lost only $67 million in expected income.

But the pandemic affected the Rays in additional ways, some of which could impair the ability of the team to stay in Tampa Bay. As the Rays recently appeared in the World Series, it is important to explore how the pandemic could impact the long-term sustainability of baseball in Tampa Bay.

In 2019, the Tampa Bay Rays won 96 games and made the playoffs for the first time in six years. Their series versus the Astros was the Rays’ first postseason under Kevin Cash and their first since Joe Maddon and Andrew Friedman left the organization following the 2014 campaign. After three mediocre seasons, the Rays had increasingly improved under the radar of all but the most dedicated baseball fans. Read the rest of this entry »

How Possible Is a Five-Homer Game?

by Joe Vasile

December 11, 2020

A recent post in the Effectively Wild Facebook group sparked my curiosity. A poster named Tim wrote: “Record I’d like to see set that isn’t inconceivable: Player gets 5 HR in a single game.” That record is not inconceivable, because it has been accomplished at least five times in the minor leagues.

In fact, the professional baseball record is eight home runs in a single game, set by catcher Jay Clarke of the Corsicana Oil Cities in a 51-3 win over the Texarkana Casketmakers in a Texas League contest in 1902. The last minor leaguer to hit five homers in a single game was Dick Lane of the Muskegon Clippers in 1948.

Known Five-Homer Games

Date	Player	Team	Opponent	Outcome	League	HRs Hit
6/15/1902	Jay “Nig” Clarke	Corsicana Oil Cities	Texarkana Casketmakers	W, 51-3	Texas League	8
5/11/1923	Pete Schneider	Vernon Tigers	Salt Lake City Bees	W, 35-11	Pacific Coast League	5
5/30/1934	Lou Frierson	Paris Pirates	Jacksonville Jax	L, 17-12	West Dixie League	5
4/29/1936	Cecil Dunn	Alexandria Aces	Lake Charles Skippers	W, 28-5	Evangeline League	5
7/3/1948	Dick Lane	Muskegon Clippers	Fort Wayne Generals	W, 28-6	Central League	5

But of course the poster was in all likelihood talking about the MLB record of four in a game, which has stood since 1894. But it was a commenter on the post that really piqued my interest. They simply asked: “Would a team really continue pitching to a guy who’s already had 4 HR in a game though?”

It’s a valid question to ask, and it set me down a rabbit hole of seeing just how many players had a plate appearance with four homers already in a game, and how those plate appearances went. Looking back at history isn’t necessarily the best way to predict future behavior, but it is a fun exercise if nothing else, because frankly, before conducting this research I had no idea how many players ever had a crack at a fifth home run. Read the rest of this entry »

How Much Value Is Really in the Farm System?

by Adam Daily

December 2, 2020

Everyone knows that a strong farm system is key to the long-term success of a major league organization. They make it possible for clubs to field competitive teams at affordable salaries and stay beneath the luxury tax threshold, but how much value can an organization truly expect from their farm system? How much more value do the best farm systems generate compared to the worst ones? I decided to take a closer look.

Methodology

The first thing I did was gather the player information and rankings from the Baseball America’s Prospect Handbooks from 2001-14 and entered them into a database. I then found players’ total fWAR produced over the next six seasons, and I added them together to find the values that each farm system produced. I chose six seasons to ensure that teams wouldn’t get credit for a player’s non-team-controlled years, since the value produced would not be guaranteed for the player’s current organization. This method will reduce the total value produced by players that are further away from the majors, but the purpose of this analysis is to focus on the value of the entire farm system and not an individual player’s value over the course of their career.

Let’s look at the 2014 Minnesota Twins as an example. Below is a list of the thirty players that were ranked and the amount of WAR that each player has produced by season. Read the rest of this entry »

« Previous Page — « Previous entries

Next entries » — Next Page »

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG