The MVP Batter Through the First Month

In one of the later chapters of The MVP Machine, the authors describe a working relationship between an unnamed position player and a writer at an “analytically inclined” baseball website. The player felt that his club’s advanced scouting data wasn’t granular enough and asked the writer to supplement the information he was given by the club with additional detail. The writer was eventually performing scouting reports on the player himself, opposing pitchers, as well as the home plate umpires’ strike zones. In terms of evaluating his own performance, the writer summarized that the player was basically looking at three things: “Am I squaring up the ball? Am I swinging and missing? Am I swinging at strikes?”

With the first month of the season in the books, who would be some of the best performing hitters in the league according to this particular player’s criteria? Thanks to Statcast, we have the tools at our disposal to try and figure out just that. Note that the dataset I used for this exercise was all qualified batters as of the morning of April 30th, 2021.

First, we need to decide which parameters to use to represent each of the three questions posed by the player. Two of the three are pretty easy. “Am I swinging and missing?” We can look up a player’s whiff percentage on Statcast. “Am I swinging at strikes?” That information is represented in a player’s chase percentage. “Am I squaring up the ball?” The natural candidates here would be, if we’re using just one number, the average exit velocity, hard hit percentage, and barrel percentage. I decided to go with the average exit velocity because it takes into account every batted ball put in play by the batter. Let me explain. Read the rest of this entry »


Expected Pitch Value

There is an index called pitch value that calculates the increase or decrease in runs scored depending on the pitch type. In this article I will look to create an environment-neutral version of pitch value.

Shortcomings of Existing Pitch Value

Pitch Value (hereafter PV) and RV use the average or sum of the variable values of RE288. This method has the advantage of being able to measure how much a pitch actually increased or decreased the number of runs scored on that pitch. However, the metric is not consistent enough to be used in a single year given that it depends on a relatively small number of batted balls and plate appearances.

The following is the average delta_run_exp (RV/100) of sliders for pitchers who threw 500 or more of them in each year from 2017-20, with the data obtained from Statcast.

The correlation coefficient is 0.14, which means that there is almost no correlation. Even if a pitcher records an excellent RV/100 in one year, there is no way to know what kind of value he will record the following year. It seems that it is difficult to measure the stable value of a pitch type with the existing PV and RV.

Using xwOBAvalue for Situation-Neutral Run Value and Batted Ball Evaluation

We can try to make improvements in measuring the value of pitches with a small number of at-bats or pitches in a single year.

First, we use a situation-neutral scoring value for events that occur rather than a change in scoring value. For example, a home run with no runners on base and a home run with runners on base have different values in the existing RV, but the situation-neutral scoring value is calculated using the average scoring value of home runs in all situations combined. The reason for this is that it is not appropriate to evaluate the ability of a single pitch to prevent runs from being scored if it depends on the circumstances in which it is thrown.

Another correction is to use the xwOBAvalue (estimated_woba_using_speedangle in Statcast) instead of the actual batting result when a pitch is hit. The pitcher has little control over whether a batted ball becomes a hit or an out, and it is known that the number tends to be unstable in a single year. If we consider that it is difficult for a pitcher to control the number of batted balls in a season, the batted ball number of pitch type in a season is even smaller, so the index becomes less stable. Therefore, for batted balls, we use the value of runs (xwOBA_value), which is estimated from the speed and angle of the batted ball. The purpose of this is to remove the influence of defense and chance as much as possible.

In this way, we try to calculate the pitch value as situationally neutral as possible.

Calculate wOBA by count

I will call this situation-neutral pitch value xPV (expected pitch value) for now.

The first step is to find the wOBA by count. Here, the wOBA by count is calculated based on “all final batting results that have passed that count.” Note that this is not the same as the batting results recorded at the time of that count.

For example, if a batter misses a strike in an 0-1 count and the count goes to 0-2, and then strikes out on three pitches, one strikeout is recorded in the 0-1 record. But if a batter hits a single in that 0-2 count, a single hit is recorded in the 0-1 record1. Also note, 0-0 is the count that has elapsed in all counts, so 0-0 = wOBA for all at-bats in that period.

Calculating the Run Value by Count

Using this wOBA by count, we can calculate the value of points scored by count.

(count wOBA after pitching – count wOBA before pitching) / wOBAscale (≈1.15 in Statcast csv data)

First, when the count changes, the actual RAA is calculated as:

(wOBA of the count after the pitch – wOBA of the count before the pitch) / 1.15

If a batted ball occurs, then this is used to calculate RAA:

(xwOBAvalue – wOBA of the count before the pitch) / wOBAscale

Total the value, Take the Average

The xPV is calculated by summing and averaging the RAAs calculated in this way.

The advantage of this xPV is that it reduces the influence of chance as much as possible and increases the consistency of the index by giving it a situation-neutral value. The following is the year-to-year correlation of the xPV/100 (xPV per 100 pitches) of sliders for pitchers who threw at least 500 sliders from 2017-20.

The correlation coefficient was 0.49, which is a moderate correlation and much improved over the 0.14 of RV/100.

For xPV, I referred to this article.

1The reason why we use hitting stats through a count instead of hitting stats at that count is that we can take into account the effects of events that occur only in a particular count, and we can also evaluate pitches that are not directly related to the batting results. For a detailed explanation, snin’s article is very helpful.

I have also put the R code here.


The Cardinals Should Send the Angels A Very Large Check

Baseball’s compensation system ensures that teams have a long time before they need to pay their superstars in market value. The whole structure is broken, and I’m not just talking about “the Kris Bryant problem,” when a player’s debut is deliberately delayed in order for the team to gain an extra year of control. The issues go way beyond that.

This isn’t an article presenting a solution for this issue per se, mainly due to the fact that any restructuring requires flexibility and willingness to sacrifice some current profits for the long-term welfare of the game in what ultimately is a dispute about money. This change will come if and when it does primarily due to leverage that one side has over the other.

I want to talk about Albert Pujols specifically, the future first-ballot Hall of Famer who was recently released by the Los Angeles Angels. Looking back at his career, a glorious one at that, the difference between what he earned and produced with the Cardinals and with the Angels is quite staggering.

Instead of focusing on the negative and all that went wrong during his time in California, let’s look at it from a different perspective: how everything ultimately evened out for this all-time legend. Read the rest of this entry »


Your Team’s Prospects Are Probably Not Going To Work Out

Serious prospect hounds know that only about 10% of minor leaguers ever participate in a major league game. However, even the most discerning fans can be deluded into believing that their team’s farm system can overcome the odds and build a perennial contender based on their prospects alone.

I decided to investigate how much average WAR a prospect generates based on their ranking in Baseball America’s Prospect Handbook. I used a similar process in a previous article in which I calculated the amount of WAR based on the next six seasons of a player’s career since being listed (instead of when a player makes their major league debut). This means that players closer to the majors get a boost to their value, since they will have more opportunities to accumulate WAR than players in the lower minors.

Next, I grouped the players by their ordinal ranking in their organization from the 2001-2015 seasons and calculated each group’s average WAR to create the visualization below. Read the rest of this entry »


Introducing pWAR: A Predictive Wins Measurement for Pitchers

When WAR was first introduced, it attempted to answer a baseball question that has existed as long as teams have played professionally: how much can one player contribute to a team’s record? It is an ongoing debate to this day, but WAR is widely recognized as an excellent measure of overall performance. WAR numbers for major league players are regularly cited and recognized by both Major League Baseball and the Elias Sports Bureau. However, there is another question behind WAR, also long predating the existence of the statistic, and it’s a question that has yet to be answered by it. When a front office is deciding whether or not (or how aggressively) to pursue a player, they’re ultimately searching for the answer to one question — how many more games will we win if we get this guy?

Enter pWAR.

Read the rest of this entry »


Introducing xxxFIP

ERA, FIP, xFIP, and beyond…

There are a wide range of pitching stats available to the discerning baseball fan. From Wins and ERA to DRA- and xBACON, there’s something for all tastes. In this post I’ll introduce a new stat, xxxFIP, which is definitely NSFW (Not Safe For Wise decision-making).

Before diving into the details of xxxFIP, let’s discuss its predecessors and what they are trying to measure. Read the rest of this entry »


Enhancing Prospect Outlooks Using Scouting Report Text

Wander Franco is the latest prospect to be discussed as a top player in the game before stepping on a major league field field. Vladimir Guerrero Jr. was likely the recipient of even more hype in 2018, though he has reminded us at times that there are no automatic superstars in baseball. Franco and Guerrero Jr. have the unique distinction as the only two players to be given the maximum “hit tool” score of 80 on MLB.com’s prospect rankings. Guerrero Jr. (in 2018) scored higher on “power” while Franco has the edge in running and fielding. They were both rated 70 overall and were the respective No. 1 prospects in baseball at the time.

When comparing the two players’ ratings, we might stop at this point and declare a virtual tie. The same could be said for any number of lower level prospects with similar ratings. However, there is still a significant amount of data available describing the players: the words used in the scouting reports. On MLB.com, below the numeric ratings, there is a blurb detailing the prospects’ exploits. At first glance, we might not think the text provides information that can separate players, as many of the writeups are similar in both style and substance. Yet there is a possibility that there are indicators in the text that are not obvious to a human reader (or at least a human reader with my minimal experience analyzing text).

To examine the importance of the scouting report text, I developed two models — one with the text data and one without — to predict whether a prospect has made his major league debut as of the end of the 2020 season. Both models use variables such as year, position, numerical skill ratings, etc. to account for all of the non-text information available on MLB.com. Thus, if there is a difference in model effectiveness, it will be a result of the text data adding information that is not captured by the other features. Read the rest of this entry »


Constructing the Perfect Right-Handed Pitcher

The pitching talent in the major leagues has never been as good as it is at this very moment.

Strikeout rates have risen in 13 straight seasons, hitting an all-time high of 23.4% in 2020. When pitch tracking started in 2002, the average fastball velocity was 89 mph. That figure was up to 93.1 mph in 2020. Every other pitch has followed suit, whether it’s the slider (84.1 mph in 2020), change-up (84.5 mph), or curveball (79.2 mph). Despite these massive gains in swing-and-miss stuff, walk rates haven’t gotten worse. The walk rate in 2000 (9.6%), for example, was higher than the walk rate in 2020 (9.2%).

While some of this may be due to a shift in approach from batters, mainly a wider acceptance of strikeouts and a fly-ball heavy mindset, pitchers are undoubtedly better than ever at this current moment. Pitching staffs are loaded with velocity, movement, and specialization that makes hitting harder than it’s ever been. When looking at the landscape of major league pitching, there are so many names and pitches to choose from. But which pitchers and pitches stand out the most? Read the rest of this entry »


Clayton Kershaw Is Breaking Barriers in Breaking Ball Usage

Knuckleballer R.A. Dickey set record marks for a starting pitcher in the 21st century in terms of breaking ball percentage, usually hovering around 50%. The most prominent other examples are:

Which Starters Have Thrown the Most Breaking Stuff?
Pitcher Season Breaking% xERA
Patrick Corbin 2018 50.3 3.39
Jon Gray 2018 48.7 4.03
Madison Bumgarner 2015 48.7 2.92

But Clayton Kershaw could do something unprecedented this season. It may still be early, this goes beyond simply what happened during this first few weeks of baseball.

It’s well-documented that the future Hall of Famer doesn’t have the same zip on that fastball that he used to, which was most concerning during the 2019 season where the heater topped out at 90.3 mph. Giving up a lead to the Nats in Game 5 on back-to-back homers before being knocked out of the NLDS might have been the catalyst for a change in approach during the following offseason, but that’s just pure speculation on my part.

Kershaw went to Driveline, and through a meticulous study of his mechanics was able to make some minor adjustments and give himself a bit of velocity back. For a pitcher into his thirties, that’s especially huge.

One could look to that moment and mark it as a turning point, but to discuss Kershaw’s pitch selection, we must recognize that since the development of his slider way back when in a bullpen section in Wrigley Field (and even more so since his first Cy Young campaign in 2011), the left-hander has been steadily increasing the slider’s usage to the same rate he was decreasing the fastball selection. Just take a look at the graphic.

We are four starts into Kershaw’s 2021 season, and so far he’s maintained a steady pace towards the first season of a breaking ball rate of 60% or above.

The bulk of this change comes from Kershaw’s significant increase of sliders, specifically against right-handed hitters.

As you can see, he went from the low 40s in slider percentage (40.2 and 39.8 in 2019 and 2020, respectively) to 48 over his first four starts. I know it’s early, but if you are a right-handed batter against Kershaw in 2021, you’re basically getting three fastballs out of 10 pitches. The sample size against left-handed batters is small, just a total of 81 pitches, but it follows his career trend with a not-so-accentuated increase in breaking ball usage.

Here is a table of his pitch usage during each game this season:

Clayton Kershaw’s First Four 2021 Starts
Pitch Usage @ Rockies @ Athletics vs. Nationals @ Padres
Fastballs: 28 (36%) 34 (37%) 30 (35%) 39 (40%)
Sliders: 35 (45%) 44 (48%) 41 (48%) 44 (45%)
Curveballs: 14 (18%) 13 (14%) 15 (17%) 15 (15%)
Breaking%: 63% 62% 65% 60%

It’s about as close to the same approach as you can get in terms of pitch selection. As the first graphic showed, somewhere around the 2018 season Kershaw began throwing the slider just as much as he threw the fastball, and this could be the year he takes it one step forward. After all, he looked pretty good striking out Fernando Tatis Jr. three times with the same pitch just one night after Tatis hit a bomb against Walker Buehler in his return from the IL.

What also helps the left-hander throw that slider so often is the variations of it. Just ask Mitch Moreland.

The first slider Kershaw threw him here went for a ball with a horizontal break of three inches, but Moreland would strike out swinging (on a slider) as the next two broke eight and seven inches, respectively.

The second time up, Moreland saw six pitches and five were sliders, including the last one to punch him out, which had 30 inches of vertical break and five horizontal.

Third time’s the charm? Not so much. Kershaw threw another couple of sliders to finish Moreland off, the first for a ball with a vertical break of 24 inches, but the second one had 30, resulting in a weak grounder to the first baseman.

We’ll see what the season delivers, but so far Kershaw has looked really good following the career trend of pitching selection that’s been a constant since 2011. How far will he go in terms of breaking ball usage? My guess is we’ve seen the limit, or roughly that, for the foreseeable future. It remains to be seen if he can sustain it at that level moving forward, but whether that 60% mark is reached or not, it’s almost a given that he’ll set a career high and also perhaps the new record for non-knuckleballers, his only obstacle being this new “cutterless” Shane Bieber, although I’m betting he’ll bring it back at some point.

Note: No homers over the first four starts is an encouraging sign for Kershaw, and if he experiences a decrease in a trend that was working against him, let’s just say the rumors of his demise might’ve been a little exaggerated. He could may no longer be the overwhelming best pitcher in the game, but he can certainly still be a bona fide ace.

Estevão Maximo is an aspiring sportswriter from Brazil. You can find more of his writing here and here.


Rethinking How We Look at Team Defense

They say that a run prevented is as important as a run scored, and this checks out. In fact, based on the coefficient of determination (r^2) for the two variables, a run prevented has actually been more correlated with team success than a run scored. This has indeed been labeled as the “run prevention era,” and just by that measure, this would appear to be the case.

As we’ve discovered in the past, offense and pitching wins championships, especially compared to defense. However, that certainly does not mean that defense does not matter. Rather, it is a small advantage that teams can leverage to continue to win between the margins. Small-market organizations such as Cleveland, the Rays, and the D-backs have all benefitted from strong defense in the past, while the Mets have been a clear example of what poor defense can do to you.

How can teams gain an edge defensively, and how much does it matter? What are the most important defensive positions, and how does it vary from the defensive spectrum? Should teams tailor their defense specifically to their pitching? Let us change the way we look at team defense by crunching through the numbers! Read the rest of this entry »