Archive for Research

The Rise of the Hit by Pitch

With the current trend of thinking in the front offices in MLB, we are seeing many aspects of the game at all-time highs and lows. Strikeouts, home runs, and a lack of stolen bases get a lot of content created about them, but there is also something else at an all-time high: hit by pitches. We saw 1,984 hit by pitches in 2019, the most in MLB history, surpassing the previous high of 1,922 in 2018. There has not been this rate of hit by pitches per game since 1900, and baseball is very different from how it was then.



You may say that there are more pitches thrown in games than ever before, so the rate per game may be rising because of that. But when we account for that and look at the rate per pitch (which we can do so from 2008 onwards), you can still see that sharp increase in the last two seasons.



What has gone on here? There should be something responsible for this increase in hit by pitches. Is it the pitchers? Do we have guys who can throw hard but have less command, so they are hitting more batsmen? Are they throwing inside more often? Is it the hitters? Do we have guys who are getting tighter to the plate or players who are just more willing to take the hit to get on-base?

Let’s start with the pitchers. Thanks to the PITCHf/x and TrackMan data we have the location of every pitch since 2008. I will be using the Statcast zones to bucket the data. Read the rest of this entry »


Modeling Strikeout Rate with Plate Discipline Part 1: Hitters

Strikeout and walk rates are perhaps the most popular and widely used peripheral statistics, particularly for pitchers. However, with pitch level data, these statistics now have “peripherals” of their own. I was curious if I could create an accurate-yet-interpretable model using FanGraphs’ plate discipline metrics that could offer insight on what drives the differences in strikeout and walk rates between players.

While many have noted individual correlations between a single statistic and strikeout rate, I have not seen many unifying models that incorporate several plate discipline metrics. For the first part in this study, I will focus on hitter strikeout rate, but I intend on also looking at walk rate and, later on, pitchers’ strikeout and walk rates.

If you are not a fan of mathematical details, feel free to skim or skip these next few sections to get to my overall conclusions.

Methodology

Plate Discipline Flash Card 12-29-15

Note: I used BIS discipline statistics rather than PITCHf/x. I do not think this made a significant difference, but I think it is important to keep in mind.

FanGraphs gives us nine plate discipline statistics to work with. However, several of them can be removed as they can be derived using the other statistics. In a regression setting, this phenomenon is called perfect multicollinearity, which is when an explanatory variable can be perfectly formulated by other explanatory variables. With a high degree of multicollinearity, it can be extremely difficult to tell which particular variable is responsible for a change in the response variable, which is problematic for inference. Using some basic dimensional analysis, I found formulas for all three of these: Read the rest of this entry »


Finding Ray Fagan: A Minor League Mystery

Sometimes numbers tell a story. Sometimes that story is a mystery.

I came across the Baseball-Reference page for Raymond Fagan and was stunned by what I saw. It says Fagan went 13-0 with a 1.16 ERA for the Class D Oklahoma City Senators in 1915. Now the stunning part – it says it was his only professional season. Despite those dominant results, it appears Fagan never pitched again.

What happened to Raymond Fagan? Did he suffer a career-ending injury? Did he get into legal trouble and change his name? A Google search yielded no answers. This mystery required a deeper dive. Read the rest of this entry »


All Stolen Bases Were Not Created Equal

Fielding percentage is often criticized for the selection bias introduced by a player’s range (good defenders attempt more difficult plays, leading to more errors). A similar issue of selection bias is present in stolen bases. On any given pitch, it is at the sole discretion of the runner if he will steal a base or not. Naturally, the runner will only attempt a stolen base when he believes he has an advantage over the pitcher and catcher.

Ivan Rodriguez caught 46% of base-stealers throughout his career, topping out at a 60% caught stealing rate in his prime and leading the league in CS% in nine seasons. Knowing that stealing against Pudge is little more than a pipe dream for most, only the best baserunners would dare to attempt a steal. If this assumption holds, Rodriguez’s CS% would in fact be far more impressive than initially reported due to the level of competition he faces relative to a typical catcher.

To adjust for selection bias in stolen-base attempts, I developed an ELO model. For those unfamiliar, ELO ratings are a method of calculating the relative skill levels of players in zero-sum games. You might recognize ELO from chess rankings or FiveThirtyEight’s sports prediction models. These ratings can be used to directly estimate the probability of winning a match between two individuals or teams. The ratings change after each match, rewarding a win by an underdog more than a win by the favorite.

On a stolen-base attempt, the runner, pitcher, and catcher all play a major role in the outcome of the play. An argument could also be made for the importance of the fielder receiving the throw, especially when considering the select few who can make tags like this: Read the rest of this entry »


The Effect of Umpires on Baseball: Umpire Runs Created (uRC)

It’s a cool and breezy April afternoon down by Baltimore’s Inner Harbor, and the mid-rebuild Orioles are taking on the division-winning and record-breaking Minnesota Twins. Trying to salvage the final contest of a three-game series, the O’s — to no one’s surprise — find themselves trailing in the bottom of the ninth. But not all hope is lost. The Twins’ lead is small — two runs — and the Orioles have some of their best players due up. Out of the gate, Twins pitcher Taylor Rogers hits the first Orioles batter, Joey Rickard, in the foot. Then, after a Chris Davis lineout, Jesús Sucre resurrects the inning with a single to left that advances Rickard to third. The comeback is on.

Hanser Alberto then plunges the Orioles hopes back down to earth with a swinging strikeout that gives his team just one more out with which to work. But then comes Jonathan Villar, who rips a double to deep left, scoring Rickard and advancing Sucre to third. The Twins lead is cut in half. After an intentional walk to Trey Mancini that loads the bases, the game now rests in Pedro Severino’s hands. With two outs and the bases loaded, still down by one, Severino manages to work the count to 3-0. His team is one pitch away. The crowd is on its feet. Rogers winds and delivers his pitch. It’s outside! “Ball 4!” the commentator exclaims. The fans cheer, Severino begins to walk towards first, and the tying run starts his trot towards home. But suddenly, the umpire punches his arm through the air. He called it a strike. Severino walks back towards home plate, distraught. He pops up the very next pitch, and just like that, the game is over.

***

Using data from Baseball Savant’s pitch-by-pitch library, we can begin to understand the role that these incorrect calls play in baseball. By matching up the database’s pitch locations to the calls associated with those pitches, we can see which calls were supposedly correct, and more importantly, which were not. The results are pretty astounding. Last year, by this data, MLB umpires made a total of 33,277 incorrect calls. That’s good for 13.8 per game, or just over 1.5 per inning. While not every bad call is a comeback-killer, these mistakes have the ability to greatly alter an at-bat, a game, and maybe even a season. Read the rest of this entry »


Reframing Catcher Pop Time Grades Using Statcast Data

With the advent of Statcast, statistics like exit velocity, spin rate, and launch angle have become easily accessible to baseball fans. Catcher pop time data has also become available. However, unlike some of the other Statcast metrics, catcher pop time data has existed for much longer, with scouts measuring pop times in the minor leagues years before Statcast entered the mix.

This sounds all well and dandy, right? Well, it would be, if the Statcast numbers were consistent with scouting pop time tool grades. Baseball Prospectus, for example, calls a pop time from 1.7-1.8 a 70 pop time, which sounds reasonable enough without any context. However, considering the best average Statcast pop time to second base from 2015 to 2019 was JT Realmuto’s 1.88 (minimum 10 throws to second), something seems amiss here. I decided to take a deeper look into Statcast’s pop time data to get a better idea of what’s going on.

Read the rest of this entry »


Turning Quarterbacks Into Pitchers

Why don’t teams ever sign former quarterbacks to try and turn them into pitchers?

This thought stems from watching Patrick Mahomes and his pre-draft NFL tape and discovering that his father was a former major league pitcher. Can a quarterback’s arm strength transfer to pitching? What can be learned from football velocity to uncover a future successful pitcher?

ESPN was ramping up their coverage in the weeks leading up to the 2017 NFL Draft, and Mahomes was gaining momentum. A SportsCenter interview with the future MVP explored his multi-sport background, which caught my attention.

I was vaguely familiar with the story about Mahomes’ father reaching MLB as a pitcher. Apparently there was a time when Mahomes considered following in his father’s baseball footsteps. The interview spilled over into the prospect’s appearance in the Gruden QB Camp. He mentioned then that he was drafted by the Detroit Tigers in high school, but due to a strong desire to play quarterback at Texas Tech, he went in the 37th round. If his football passion wasn’t as strong, scouts told him that the top three rounds were a likely landing spot.

As the video continued, it featured highlights of in-game play and practices where Mahomes showed a dynamic skill set. He had special throwing abilities, and his baseball background and natural talent was obvious in just a few of his tosses. There were impressive clips of him throwing a football from his knees about 50 or so yards, and another highlighting a final pregame warmup toss and ritual: throwing the ball about 75-80 yards in the air. Read the rest of this entry »


What Actually Makes a Curveball Effective?

The other day I began pulling together Savant data to determine whether there was an ideal zone percentage for different types of curveballs (CUs) and sliders (SLs). I haven’t found much on that front yet. However, I did realize that I don’t really know what makes curveballs effective, both from a results standpoint (extra whiffs, weaker contact, etc.) or a trait standpoint (vertical break, horizontal break, velo). I took a look at all of these factors for the curveballs in the 2019 baseball season to see if anything stuck out.

I analyzed a sample of 214 pitchers, representing everyone from 2019 who threw at least 20 innings, a curveball at least 10% of the time, and qualified for Savant’s pitch movement leaderboard. From this sample I pulled info on every pitcher’s spin profile, wOBA, xwOBACON, zone percentage, SwStr %, and RHB/LHB splits. I even noted all that same info for the rest of their arsenal as well as just to have a full view. Then they were bucketed in every way imaginable with averages and standard deviations to see which ones stood out. I do want to preface all my findings by saying that the sample size is not ideal, as the buckets were mostly of decent size (roughly 100-plus players), but I did get granular at times (the smallest group was 48).

I am most focused on the following metrics: CU wOBA, CU xwOBACON, CU SwStr %, CU Drop & Tail (as a % difference vs. the average pitcher at similar velocity). Here are the averages across the entire sample: Read the rest of this entry »


What Is a Run Worth?

I recently began thinking about how teams can know that they are efficiently spending their money, or where teams actually get the runs that they spend all their money on. With players signing massive contracts in the 2018-19 offseason, I began to wonder if any players were really worth that much money. The process begins with one big question: What is a run worth? I quickly realized that each team theoretically needs to manufacture the same number of runs as all the other teams do if they want a better chance to make the postseason. What is different from team to team is budget. This means that a run is worth a different monetary value to each team, and that each team would be willing to pay a different amount of money for the same number of runs. The problem is that to each player, a run costs the same amount, causing Billy Beane, played by Brad Pitt in the movie Moneyball, to claim that “It’s an unfair game”.

Figuring out what each team values their runs at would enable me to evaluate how efficient the signing of certain contracts was for each team and furthermore would allow me to figure out where the most value comes from in the payroll of a team. First, I had to figure out how to convert the basic statistics of a player into the number of runs that player actually contributed to the team. I eventually came across the Estimated Runs Produced statistic from the 1985 Bill James abstract. Below is the calculation.

ERP = (2 (TB + BB + HBP) + H + SB – (.605 (AB + CS + GDP – H))) .16

This is a stat created by Paul Johnson in order to obtain more accuracy than Runs Created, which he succeeded in doing. I then fired up R and ran some tests on team statistics to see how well it lined up with the actual number of runs that each team scored. I graphed ERP against Runs Scored first for every team dating back to the beginning of the 30-team era in MLB: Read the rest of this entry »


Did the Baseballs Carry More in 2019?

As much as baseball fans would like a simple explanation for the astronomical increase in home runs in 2019, it is becoming clearer that many factors have played into the surge. Among the possible reasons are batters prioritizing hitting homers more than ever before, pitchers having difficulty gripping the seams of the baseball, and of course the famous “juiced balls.” Last month, a committee released initial results of a comprehensive study attempting to determine the driving forces behind the home run rate growth.

I am particularly interested in the idea that fly balls were supposedly carrying more in 2019. On multiple occasions throughout the year, I listened to announcers observe that outfielders seemed to be severely misjudging fly balls. For instance, the center fielder would be drifting back toward the wall, as if he had a bead on it, and the ball would end up 15 rows deep. Although this may seem like evidence for increased carry of the baseball, such observations can easily be driven by confirmation bias. There was a tendency this year to believe that every ball in the air would be a homer, so when a ball would carry a lot, it fit with expectations and the belief continued to grow. It may just have simply been the case that the wind was blowing out that day, or that the batter struck the ball in a particular way, and the carry had nothing to do with the ball itself. To determine if the perception was in fact reality, I focus on the following question: Did similarly struck balls travel farther in 2019 than previous years? Read the rest of this entry »