Category: Research | Page 14

Archive for Research

Projecting Risk in Major League Baseball: A Bayesian Approach

May 6, 2019

The following is an introduction to a new Bayesian projection system, which can be found here.

Introduction / Motivation

This project was partly inspired by a recent episode of the Driveline Baseball podcast which aired this offseason in which Kyle Boddy, founder of Driveline, and Mike Rathwell, CEO of Driveline, had a conversation about the topic that overwhelmed baseball for many months – whether teams should sign Manny Machado or Bryce Harper. Their initial reaction was to express disappointment that the debate had started in the first place. Machado and Harper, they argued, had almost nothing in common besides the fact they belonged to the same free agent class. They play different positions (implying different replacement levels and, thus, entirely different markets), and more importantly, have entirely different amounts of uncertainty associated with their projections. Machado has been as reliable as they come, playing premium positions (SS and 3B) and improving his defensive abilities every year. Harper, on the other hand, had just come off of what some have called one of the worst defensive seasons by an outfielder in recent history. However, he also had a 10-WAR season in 2015, a ceiling which Machado hasn’t touched. This led into a broader discussion about how to compare contracts from players with different levels of risk. Specifically, they explain how the sabermetric community’s approach to answering the question of valuing risk in Major League Baseball contracts has fallen short in three areas.

First, while many writers make note of the riskiness of certain assets, they fail to define that risk in precise terms. Most public projection systems output point estimates and public researchers suggest that the output is at the upper limit of predictive accuracy, and hence should be treated as a near certainty. It is worth noting that baseball is not the only field which has grown uncomfortable with uncertainty. Whether it is a decision to buy a certain stock, hire a particular company to ship your goods across the country, or decide who will be our next president, many analysts make the mistake of assuming a binary, discrete outcome is the result of a binary, discrete process. Instead, we posit that once we start to see the world as the outcome of several continuous, probabilistic processes, we can manipulate those processes in ways that give us an extreme competitive advantage (in baseball at the very least). Boddy explains:

“While a tweeter very fairly pointed out that sites like Baseball Prospectus and FanGraphs do mention upside and downside, rarely is it quantitatively actually approached in these articles. Rarely if ever, I should say. It’s very frustrating because just making a note that Tesla’s stock is more volatile than Microsoft’s is not enough. That wouldn’t be enough for a financial planner to be like ‘Oh, ok that’s a very deep analysis.’ It’s also not all downside, which is how a lot of these tweets [go].”

In other words, before describing the optimal mix of risky and safe players on a major league roster, there need to be accurate and reliable methods by which to describe that risk. There are very significant drawbacks to assuming too much downside, so carefully tracking exactly how uncertain you are of your team’s future performance as a whole is imperative. Also, as is the case with all science, precisely measuring levels of uncertainty and tracking resulting performance over time is the most reliable way to gain a deeper understanding of what exactly is uncertain about player projections and perhaps eliminating some of that ambiguity in the future. Read the rest of this entry »

Judge and Altuve: A Tale of Two Strike Zone Oddities

by John Sturdivant

May 2, 2019

Aaron Judge and Jose Altuve seem like they shouldn’t coexist in Major League Baseball, but their mutual success reveals an amazing fact about different paths to pitcher dominance.

Here’s a line of reasoning using the transitive property about short hitters and small strike zones:

If a player is shorter, then his shoulders are closer to his knees.
If his shoulders are closer to his knees, his strike zone should be smaller.
If his strike zone is smaller, then it should be harder for the pitcher to throw strikes.
If it is harder to throw strikes, he should get on base more.
Shorter players should have higher on-base percentages.

But this isn’t how it actually works. Why not?

Aaron Judge is tall and Jose Altuve is short. That’s analysis. They both are very productive at the plate. That’s deeper analysis. However, Judge gives pitchers a 25% larger strike zone to target due to his 6-foot-7 height, so he must be doing something different and better than Altuve to offset the larger area he gifts to pitchers with every pitch.

Read the rest of this entry »

The Angels Are Defying the Strikeout Trends

by Kevin McAteer

April 25, 2019

While perusing through the newly introduced +Stats section released recently by FanGraphs, I couldn’t help but notice at the time that three Los Angeles Angels players held the top three spots for the lowest K%+ for 2019 thus far among qualified hitters, with an additional two Angels players joining them to round out the top 30. The first two players were David Fletcher and Tommy La Stella; these two players are roughly average hitters at best, but they have each run a far-below-average K% thus far in their professional careers, so seeing them at the top here isn’t too shocking in a small sample size. The third was of course Mike Trout, who has decided he doesn’t feel like striking out anymore while still maintaining his incredible hitting prowess. Out of all position players that have been qualified hitters in both 2019 and 2018, only Matt Chapman has lowered his K%+ by more in absolute terms (Chapman’s -67 to Trout’s -62), and nobody has lowered their K%+ in percentage terms more than Trout has, as detailed in the chart below:

Overall, no other team in baseball has more than two players in the top 30 of this K%+ measure, and by simple deduction, a handful of teams have not had one single player within that cutoff. Devan Fink has also written about how the Angels are not striking out in 2019, but I was curious to see how their players are stacking up with other recent seasons, so I set the parameters to include all qualified seasons from this decade, and the results were surprising.

Although this is in just a small sample size as mentioned earlier, it’s still noteworthy that those three players make up three of the top four qualified seasons since the beginning of this decade. I’ve also highlighted Andrelton Simmons‘ 2018 season, which was also another top-10 placing for the Angels. Although Simmons doesn’t appear in this chart for his 2019 season, he wasn’t far off with his K%+ of 46, ranking 12th for the season. With all of these Angels players posting such low K%+ figures, it had me even more curious as to how they stack up as a team historically, and whether this is an intentional approach they’re implementing. Read the rest of this entry »

Getting Ejected Works

by Paul Schale

April 23, 2019

Getting mad at an umpire, and then tossed from the game, may seem like an ineffective display of emotion since calls are never reversed after a little more yelling. But what about future calls? In order to answer this question, we need good data on a large number of adjudicated events. Close out and safe calls happen fairly rarely, and good data quantifying how close the play was would be difficult to collect. But the home plate umpire calls balls and strikes for every batter, and pitches at the edges of the zone provide plenty of opportunities to grow or shrink the zone slightly.

It’s difficult to measure the zone in a particular game since there aren’t enough pitches at each spot on the boundary of the zone, but by combining data from many games, we can get a clear idea of what the average zone looks like. As for quantifying the zone, it’s easy to get carried away with details (location of each side, correcting for player height, etc.), but with enough data, all of those variables should average out and we can focus on the simplest measure: zone size.

During the past four years, there have been 308 games featuring an ejection over the strike zone, containing about 47,000 pitches. Splitting by team (team with ejected player/coach/manager and opposing team) and before/after the ejection, we have groups with between 9,500 and 14,000 pitches, plenty for a good estimate of the strike zone.

The results, shown below, show two clear trends: first, one team is clearly justified in being upset as their hitters face a larger zone. Second, we see that umpires fix this, even over-correcting slightly, after making an ejection.

Umpires are Human

We all see the humanity of umpires in their fallibility, but it shows in other ways too: the zone shrinks on 0-2 counts and expands on 3-0 ones, showing that they don’t like ending an at-bat with their own judgement call. This doesn’t mesh well with the fiery persona of the umpire and their emotive strike-three calls, but we have to remember that they are playing a part, and their main goal is to keep the game firmly in their control. We see more evidence of this here: if umpires ejected arguing players out of a sense of holy wrath, we would expect no change in the strike zone at all.

Instead, we see a clear reaction in the direction that the arguing player desires. While the data cannot point to the exact mechanism, I see two distinct explanations: signaling and aversion to conflict.

In the signaling hypothesis, we suggest that players are frequently sending messages to the umpire, but the umpire considers these messages according to the cost in sending it. A few words muttered under their breath doesn’t cost them anything, and so it is usually ignored. An ejection is costly, so the umpire takes that signal seriously.

The second hypothesis is a simple human aversion to being yelled at in front of a crowd of thousands. It’s not a fun experience for anyone, so they take action to avoid it happening again.

About the Models

To measure the zone, I took two approaches, k-nearest neighbor (which knows nothing about the expected shape of the strike zone) and a logistic regression based model (which looks for a rounded rectangle). Error estimates were calculated using bootstrapped samples. Both gave similar results, and the code and data behind this post are available on Kaggle.

Evaluating Trevor Bauer’s Pitch Usage

by GPB

April 22, 2019

Trevor Bauer is a walking headline. Whether he is turning himself into some kind of pitching robot in a lab or calling out his peers for using a foreign substance to enhance their spin rate, Bauer tends to attract plenty of attention away from the field. However, Bauer’s most noteworthy accomplishments lately have occurred on the field. Last season, Bauer had more fWAR than Blake Snell, winner of the American League Cy Young Award, despite pitching fewer innings and landing on the injured list for over a month. Bauer, unsatisfied with last year’s performance, developed his previously sparingly used changeup in the offseason to complement his already ample repertoire. Taking a look at Bauer’s pitch usage this year shows a clear difference in the way he attacks righties as opposed to lefties.

Here is his pitch mix vs. righties this season:

And his usage vs. lefties:

Of course, the small sample size caveat applies at this point of the season, but Bauer has been featuring a changeup against lefties at a much higher rate than last season.

Here’s Bauer’s 2018 pitch usage against lefties:

Bauer now throws his changeup twice as often as last season against lefties, and so far the results have been good. The pitch has produced a 75% ground-ball rate when put in play, and opposing batters have only recorded a single hit off of it.

While Bauer has certainly adjusted his method of attacking lefties, an early breakdown of how he has attacked righties is even more intriguing. Here’s his 2018 pitch usage vs. right-handed batters:

Comparing Bauer’s 2018 and 2019 pitch breakdown against righties reveals a few monumental adjustments. Bauer has evidently abandoned his signature knuckle curve and replaced it with a sharp increase in the usage of a cutter. In my opinion, these adjustments were made in the name of tunneling. Sliders and cutters both have primarily sideways movement, which makes it more difficult for the batter to differentiate between them. Curveballs and changeups both tend to break downwards, causing the same confusion for batters. By pairing these pitches against righties and lefties respectively, Bauer decreases the chance that a batter can read the pitch correctly out of his hand.

Up until this point, Bauer has been sharp, using his new changeup and dedication to tunneling to strike out a third of the batters he has faced and firing seven no-hit innings on April 4th against the Blue Jays. Through five starts, Bauer has struck out 32.6% of batters and at least seven in each outing. As Bauer continues to tweak his approach, perhaps he could benefit even further by lowering his fastball usage, mimicking the strategy of many pitchers before him, in order to combat hitters who sit on the pitch ready to unleash uppercut swings. By lowering his fastball usage and further utilizing his tunneling ability, Bauer will be even more unpredictable to hitters.

Time will tell if Bauer’s new strategy will be successful all year, but based on his dedication to both analytics and his craft, he seems to be on pace for another Cy Young caliber season.

Gabriel Billig is currently a student at Baruch College studying data analytics.

Are Ted Williams’ Hitting Philosophies Still Relevant Based on the Data?

by D.K. Willardson

April 19, 2019

In hindsight, it’s unfortunate that Ted Williams philosophies on hitting took so long to become universally accepted. His thoughts on batting were clearly ahead of his time and it has only been in the past few years that the more prevalent “swing down” views have largely exited the baseball community.

In his book, The Science of Hitting, Williams suggested an upward swing path that aligns the bat path and pitch path for a better chance of contact – about 5 degrees for a fastball and 10-to-15 degrees for a curveball. This research note is not about the total amount of loft in the swing today — everyone knows that swing loft is greater now than in Williams’ day. However, there are some very interesting findings in the data in terms of whether players are utilizing consistent amounts of swing loft for different pitch locations, which is implied in Williams’ book.

One observation that seems to hold in many sports is that the best performers are typically out in front of the popular views of the day in terms of changing mechanics for the better. However, as we will see in the data, this does not necessarily mean that these superior mechanics are being understood and directed by conscious understanding.

It turns out that there is a very important element that wasn’t considered by Williams in his book which the data shows the best hitters are “considering” — the amount of Vertical Bat Angle (VBA) in the swing. VBA can be defined as the amount of vertical swing tilt as viewed from the center field camera. The swings in Williams’ day as well as the illustrations in his book clearly have much less VBA than today’s hitters. While there is no broad data on VBA, a study of minor league hitters by David Fortenbaugh in 2011 showed the following averages of VBA at contact:

There is evidence which suggests that VBA goes well beyond player “style” and is more of a core swing mechanic that is associated with higher quality contact as well as superior levels of performance. Here is a chart showing VBA by playing level.

Read the rest of this entry »

The Most- and Least-Potent Pitch Combos in 2018

by John Sturdivant

April 17, 2019

I believe that pitches aren’t thrown in a vacuum, and the effectiveness of one pitch is certainly affected by the pitches that preceded it. Thus, I wanted to identify the most- and least-potent 1-2 pitch combinations in the 2018 Major League Baseball season. To accomplish this, I built a Pitch Combo Effectiveness Tool based on all 2018 pitches thrown in the major leagues.

The approach I took was to evaluate every pitch as the second pitch in a 1-2 combo (forcing us to exclude first pitches in an at-bat). I defined these pitch combos using the pitcher, the pitch types of both the first and second pitches (e.g. “four-seam fastball followed by a curveball”), and the pitch location change from the first to the second pitch (e.g. “the second pitch was further down and more inside than the first pitch”). I then gauged the effectiveness or value of these pitch combinations using the sum of the wOBA added for both the first and second pitches. Lastly, to ensure we were only looking at common pitch combos, we filtered the results to pitch combos observed at least 10 times in 2018.

The chart showing every pitch combo is below, and you can click it to go to the full tool and results:

Most and Least Effective Pitch Combos by wOBA Added

Read the rest of this entry »

Why There May Just Be Hope for the Miami Marlins in 2019

by Anthony Lucchese

April 16, 2019

As the 2019 season begins, Las Vegas determines the annual over/under win totals for all 30 major league teams and gives us a chance to examine intriguing over/under win lines for the upcoming season. Not surprisingly, the Miami Marlins found a spot right at the bottom of the list at over/under 63.5 wins. Will the Miami Marlins, under the ownership of Derek Jeter and the tutelage of Michael Hill, elude the worst record in baseball? Call me crazy, but there are a number of reasons why Vegas’ determination of 63.5 wins is undervaluing the Marlins.

J.T. Realmuto, a 2018 All-Star and arguably the last star on the Marlins roster, was acquired by the Philadelphia Phillies for Jorge Alfaro, Sixto Sanchez, and Will Stewart this past offseason. While Sanchez is a potential budding ace pitcher and Stewart has a real future as a middle-of-the-rotation starter, Alfaro is the most interesting addition for the 2019 season. He rates as a guy with incredible raw power when he puts the bat on the ball, with the only issue thus far in his career being that his contact percentage is quite low:

The K% is good for 245th out of 247 players (min. 350 PAs) and the BB% ranks in the 8th percentile among those same 247. By looking at his O-Swing%, it’s good for second-to-last and 16% above the 2018 league average of 30.9%, and clearly he’s not making enough contact at 61%. However, when Alfaro does manage to put bat on ball, the results are quite impressive:

How about a video of the swing in action? This ball, at 115 mph off the bat of Alfaro, was absolutely crushed, and I think Junichi Tazawa’s reaction says it all…

In honor of Super Ball Sunday, here's the Phillies' hardest-hit dinger of 2018

Date: April 7
Batter: Jorge Alfaro
Distance: 433 feet
Speed: 115 MPH pic.twitter.com/ClIPj3gecA

— The Good Phight (@TheGoodPhight) February 3, 2019

With more patience and a better approach at the plate, the Marlins could have something special in Alfaro. It’s evident that this improved approach could be on it’s way by analyzing his second-half statistics from July 2018 to September 2018:

Alfaro managed to cut his K% and increase his BB%, while performing as an above-average hitter according to wRC+. He made strides at the plate by lowering his whiff percentage outside of the zone from 28% in the first half to 25% in the second half, and his batted ball quality improved against breaking pitches, which he had struggled with mightily in the first half, as his xwOBA increased from 0.246 to 0.338 in the second half and his whiff percentage on breaking balls decreased from 34.68% in the first half to 26.52% in the second half. Read the rest of this entry »

An Analysis of the Relationship Between Pitcher Size and UCL Tears

by Zachary Rewolinski

April 9, 2019

A UCL tear is a death sentence for a player’s season, and it can have large repercussions for the team and league as a whole, making it crucial for front offices to understand what puts players at a heightened risk for this injury. In this research, the height, weight, age, and fastball velocity of MLB pitchers in the years 2000-17 are analyzed to determine the impact of pitcher size on UCL tear probability. The results of this study will aid executives and front offices in evaluating pitchers and their risk of needing Tommy John surgery. Moreover, these findings may aid pitchers in lowering chances for injury by guiding their offseason training goals.

1. Introduction

As Tommy John surgery and UCL tears are thrust further into the spotlight, more is revealed about possible factors and causes. In this paper, I will inspect the correlation between pitcher size (BMI) and UCL tear probability in order to determine whether the former has a statistically significant impact on the latter. The data used in this study was taken from FanGraphs, the Lahman Database, and Jon Roegele’s Tommy John Database, all of which are publicly available sources. Due to the many variables which are closely correlated with BMI and have an impact on UCL health, such as age and velocity, pitcher size was analyzed independent of these variables, which are controlled through partial correlations.

2. Analysis

2.1 BMI and Tommy John: In Aggregate

When the data set is viewed in its entirety, the results are overwhelming. The mean BMI of pitchers who have undergone Tommy John surgery is 27.09, whereas the mean BMI of pitchers who have not is 26.34. The difference between these means is statistically significant, as the p-value (odds of the difference existing due to chance) in a two sample t-test is .000001153, far below the .05 benchmark commonly used in statistics. To test this relationship in a different way, the BMIs of the 2,383 pitchers in the data set (298 who had torn their UCL, 2085 who had not) were split into deciles. The correlation between decile number and probability of Tommy John was .91, with a p-value of .0002556, revealing that there is statistically significant linear correlation between UCL tears and BMI, with higher-BMI pitchers having higher risk for Tommy John surgery. The graph of these deciles and the probability of Tommy John is shown below. Read the rest of this entry »

Is Yoan Moncada’s Breakout Coming?

by MRDXol

April 4, 2019

Yoan Moncada has frustrated talent evaluators over the past two years. He’s about as physically talented as a baseball player can be; while still a prospect, the team here at FanGraphs thought he merited future grades of 60 hit, 60 power, 70 speed, 50 field, and 70 throw, with an OFP of 70 good for No. 1 overall prospect status. Prospects don’t get evaluated much better than that; in fact, a 70 OVR on a position player is as good as it gets. He was the kind of prospect that could headline a trade for a top-five starting pitcher, a bonafide ace, in his prime on a team-friendly contract with three years left.

Flash forward two years, about a year and a half into Moncada’s major league career, and he hasn’t performed quite as billed. Instead, in 901 career plate appearances before Opening Day 2019, he posted a 97 career wRC+ and 3.1 total fWAR, almost exactly league-average or slightly below. His defense at second base has not impressed, and so he’s being moved to the hot corner in the wake of 1) the White Sox whiffing on Manny Machado, and 2) the White Sox drafting “future Gold Glove second sacker” Nick Madrigal with the 4th overall pick in 2018. If nothing changes, he’s be in danger of becoming a utilityman.

Moncada’s offensive struggles are a little unusual. He has two traits required to be an offensive monster — power and patience — in abundance. Last year, his average exit velo of 90.6 mph was in the 86th percentile, while his 4.12 pitches seen per PA was in the 81st percentile. However, those positive traits were offset by the modern game’s bugaboo — strikeouts. Moncada struck out in an ugly 33.4% of his PAs last year, behind only Chris Davis and Joey Gallo, and his career K rate sat at 33.6% this offseason. This is very concerning, as contact issues are a flaw that are difficult to resolve.

The profile above seems to describe a three-true-outcomes hitter like the aforementioned Gallo. Dig a little deeper, though, and you’ll find that how Moncada struck out that often is not normal, and in a sense he doesn’t actually have contact issues, at least not 33.4% bad. He didn’t chase many pitches out of the zone last year — only 23.3% — sitting in the 87th percentile of qualified hitters. Neither does his whiff rate of 12.2% (league average in 2018 was 10.7%) jibe with that huge strikeout rate. Taken together, we can conclude that while Moncada’s contact ability may be somewhat below-average, he limits how much he swings-and-misses by rarely chasing pitches out of the zone. So if Moncada doesn’t chase much, and doesn’t swing and miss that much, how is he striking out so much? Read the rest of this entry »

« Previous Page — « Previous entries

Next entries » — Next Page »

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG