Enhancing Prospect Outlooks Using Scouting Report Text

Wander Franco is the latest prospect to be discussed as a top player in the game before stepping on a major league field field. Vladimir Guerrero Jr. was likely the recipient of even more hype in 2018, though he has reminded us at times that there are no automatic superstars in baseball. Franco and Guerrero Jr. have the unique distinction as the only two players to be given the maximum “hit tool” score of 80 on MLB.com’s prospect rankings. Guerrero Jr. (in 2018) scored higher on “power” while Franco has the edge in running and fielding. They were both rated 70 overall and were the respective No. 1 prospects in baseball at the time.

When comparing the two players’ ratings, we might stop at this point and declare a virtual tie. The same could be said for any number of lower level prospects with similar ratings. However, there is still a significant amount of data available describing the players: the words used in the scouting reports. On MLB.com, below the numeric ratings, there is a blurb detailing the prospects’ exploits. At first glance, we might not think the text provides information that can separate players, as many of the writeups are similar in both style and substance. Yet there is a possibility that there are indicators in the text that are not obvious to a human reader (or at least a human reader with my minimal experience analyzing text).

To examine the importance of the scouting report text, I developed two models — one with the text data and one without — to predict whether a prospect has made his major league debut as of the end of the 2020 season. Both models use variables such as year, position, numerical skill ratings, etc. to account for all of the non-text information available on MLB.com. Thus, if there is a difference in model effectiveness, it will be a result of the text data adding information that is not captured by the other features. Read the rest of this entry »


Constructing the Perfect Right-Handed Pitcher

The pitching talent in the major leagues has never been as good as it is at this very moment.

Strikeout rates have risen in 13 straight seasons, hitting an all-time high of 23.4% in 2020. When pitch tracking started in 2002, the average fastball velocity was 89 mph. That figure was up to 93.1 mph in 2020. Every other pitch has followed suit, whether it’s the slider (84.1 mph in 2020), change-up (84.5 mph), or curveball (79.2 mph). Despite these massive gains in swing-and-miss stuff, walk rates haven’t gotten worse. The walk rate in 2000 (9.6%), for example, was higher than the walk rate in 2020 (9.2%).

While some of this may be due to a shift in approach from batters, mainly a wider acceptance of strikeouts and a fly-ball heavy mindset, pitchers are undoubtedly better than ever at this current moment. Pitching staffs are loaded with velocity, movement, and specialization that makes hitting harder than it’s ever been. When looking at the landscape of major league pitching, there are so many names and pitches to choose from. But which pitchers and pitches stand out the most? Read the rest of this entry »


Clayton Kershaw Is Breaking Barriers in Breaking Ball Usage

Knuckleballer R.A. Dickey set record marks for a starting pitcher in the 21st century in terms of breaking ball percentage, usually hovering around 50%. The most prominent other examples are:

Which Starters Have Thrown the Most Breaking Stuff?
Pitcher Season Breaking% xERA
Patrick Corbin 2018 50.3 3.39
Jon Gray 2018 48.7 4.03
Madison Bumgarner 2015 48.7 2.92

But Clayton Kershaw could do something unprecedented this season. It may still be early, this goes beyond simply what happened during this first few weeks of baseball.

It’s well-documented that the future Hall of Famer doesn’t have the same zip on that fastball that he used to, which was most concerning during the 2019 season where the heater topped out at 90.3 mph. Giving up a lead to the Nats in Game 5 on back-to-back homers before being knocked out of the NLDS might have been the catalyst for a change in approach during the following offseason, but that’s just pure speculation on my part.

Kershaw went to Driveline, and through a meticulous study of his mechanics was able to make some minor adjustments and give himself a bit of velocity back. For a pitcher into his thirties, that’s especially huge.

One could look to that moment and mark it as a turning point, but to discuss Kershaw’s pitch selection, we must recognize that since the development of his slider way back when in a bullpen section in Wrigley Field (and even more so since his first Cy Young campaign in 2011), the left-hander has been steadily increasing the slider’s usage to the same rate he was decreasing the fastball selection. Just take a look at the graphic.

We are four starts into Kershaw’s 2021 season, and so far he’s maintained a steady pace towards the first season of a breaking ball rate of 60% or above.

The bulk of this change comes from Kershaw’s significant increase of sliders, specifically against right-handed hitters.

As you can see, he went from the low 40s in slider percentage (40.2 and 39.8 in 2019 and 2020, respectively) to 48 over his first four starts. I know it’s early, but if you are a right-handed batter against Kershaw in 2021, you’re basically getting three fastballs out of 10 pitches. The sample size against left-handed batters is small, just a total of 81 pitches, but it follows his career trend with a not-so-accentuated increase in breaking ball usage.

Here is a table of his pitch usage during each game this season:

Clayton Kershaw’s First Four 2021 Starts
Pitch Usage @ Rockies @ Athletics vs. Nationals @ Padres
Fastballs: 28 (36%) 34 (37%) 30 (35%) 39 (40%)
Sliders: 35 (45%) 44 (48%) 41 (48%) 44 (45%)
Curveballs: 14 (18%) 13 (14%) 15 (17%) 15 (15%)
Breaking%: 63% 62% 65% 60%

It’s about as close to the same approach as you can get in terms of pitch selection. As the first graphic showed, somewhere around the 2018 season Kershaw began throwing the slider just as much as he threw the fastball, and this could be the year he takes it one step forward. After all, he looked pretty good striking out Fernando Tatis Jr. three times with the same pitch just one night after Tatis hit a bomb against Walker Buehler in his return from the IL.

What also helps the left-hander throw that slider so often is the variations of it. Just ask Mitch Moreland.

The first slider Kershaw threw him here went for a ball with a horizontal break of three inches, but Moreland would strike out swinging (on a slider) as the next two broke eight and seven inches, respectively.

The second time up, Moreland saw six pitches and five were sliders, including the last one to punch him out, which had 30 inches of vertical break and five horizontal.

Third time’s the charm? Not so much. Kershaw threw another couple of sliders to finish Moreland off, the first for a ball with a vertical break of 24 inches, but the second one had 30, resulting in a weak grounder to the first baseman.

We’ll see what the season delivers, but so far Kershaw has looked really good following the career trend of pitching selection that’s been a constant since 2011. How far will he go in terms of breaking ball usage? My guess is we’ve seen the limit, or roughly that, for the foreseeable future. It remains to be seen if he can sustain it at that level moving forward, but whether that 60% mark is reached or not, it’s almost a given that he’ll set a career high and also perhaps the new record for non-knuckleballers, his only obstacle being this new “cutterless” Shane Bieber, although I’m betting he’ll bring it back at some point.

Note: No homers over the first four starts is an encouraging sign for Kershaw, and if he experiences a decrease in a trend that was working against him, let’s just say the rumors of his demise might’ve been a little exaggerated. He could may no longer be the overwhelming best pitcher in the game, but he can certainly still be a bona fide ace.

Estevão Maximo is an aspiring sportswriter from Brazil. You can find more of his writing here and here.


Rethinking How We Look at Team Defense

They say that a run prevented is as important as a run scored, and this checks out. In fact, based on the coefficient of determination (r^2) for the two variables, a run prevented has actually been more correlated with team success than a run scored. This has indeed been labeled as the “run prevention era,” and just by that measure, this would appear to be the case.

As we’ve discovered in the past, offense and pitching wins championships, especially compared to defense. However, that certainly does not mean that defense does not matter. Rather, it is a small advantage that teams can leverage to continue to win between the margins. Small-market organizations such as Cleveland, the Rays, and the D-backs have all benefitted from strong defense in the past, while the Mets have been a clear example of what poor defense can do to you.

How can teams gain an edge defensively, and how much does it matter? What are the most important defensive positions, and how does it vary from the defensive spectrum? Should teams tailor their defense specifically to their pitching? Let us change the way we look at team defense by crunching through the numbers! Read the rest of this entry »


Pulling a Rockies Pitching Solution Out of Thin Air

The success of the Colorado Rockies franchise has historically been impeded by air: the thin air of Coors Field and the hot air blown by higher-ups in the front office.

Due in large part to playing their home games in a comically extreme hitters’ park, the Rockies have finished 14th or worse in the National League in runs allowed per game in 21 of their 28 seasons in franchise history. Colorado has finished with a winning record five times in the past 20 seasons, and in four of those they ranked in the top 10 in the NL in RA/G. No, their run prevention as a whole has never been what you would call “good” or even “well above average,” but their only brushes with success have come at times when their pitching ventured beyond putrid.

The adverse effect of the thin atmosphere on pitching is twofold. The more apparent aspect is that it imparts less drag on a batted ball, allowing for fly balls to carry further, resulting in increased slugging at Coors. Perhaps less obviously, movement of pitches due to the Magnus effect is diminished. At the risk of triggering memories of my undergraduate fluid dynamics course, the lift on a baseball (or any spinning sphere) is proportional to the density of the fluid it moves through. Thus, when a fastball is thrown at Coors Field, it has less “rise” (or more accurately, is less affected by gravity) than it would at other major league parks.

Does this mean that every pitcher will perform demonstrably worse if he takes up in-season residence in Denver? Well, yes, but actually no. Read the rest of this entry »


Jake McGee: The One-Pitch Pitcher

One of the newest members of the San Francisco Giants, lefty reliever Jake McGee, is coming off one of his best years in the major leagues throwing one pitch: a fastball. Seemingly by magic, McGee twirled a fastball 97% of the time he threw in 2020 on the way to a 2.66 ERA, 0.836 WHIP, and 11 strikeouts for every walk. I will be taking an in-depth look into McGee’s success and failure over his career, which might give better insight as to how he can continue to perform and how a major league reliever can succeed with only one pitch.

McGee was drafted in 2004 by the Tampa Bay Rays and made his major league debut with them in 2010. After his first full season in 2011, McGee posted extremely strong numbers in 2012, 2014, and 2015 with an ERA+ (it will become clear why I use ERA+) of 148 and a K/BB of 5.02 within those four seasons. After the 2015 campaign, McGee was traded along with Germán Márquez to the Colorado Rockies in exchange for Corey Dickerson and Kevin Padlo.

McGee immediately regressed in Colorado, as his ERA+ went from 163 to 103 (ERA+ adjusts for ballparks, which is particularly useful at Coors Field) and his K/BB sunk from 6 to 2.38 in the transition from the Rays to the Rockies (2015-2016). Of course, some of this decline is attributed to the difficult conditions of Colorado, but there is also additional evidence to show that McGee’s style of pitching contributed to his declined performance. Following 2016, McGee remained a strong-yet-aging reliever and was ultimately released by the Rockies in July of 2020.

Four days later, McGee signed with the Los Angeles Dodgers and proceeded to outperform even his 27-year-old self with an incredible season. McGee finished in the 99th percentile in K%, 96th in BB%, 95th in xERA, and 95th in xwOBA. So what exactly was the cause of this change and what did McGee do to get there? Read the rest of this entry »


Rearing Back: Pitchers’ Effort in Important Situations

Leading 3-1 and one out away from being a World Series Champion, Los Angeles Dodgers pitcher Julio Urías faces Tampa Bay Rays infielder Willy Adames. The first two pitches of the at-bat, fastballs resulting in a swinging strike and a called strike, clock in at 94.9 mph and 94.1 mph. The last pitch of the at-bat (and subsequently the World Series) comes on the third pitch. Urias fires a third straight four-seam fastball, this time for a called strike three at 96.7 mph. This may not feel particularly fast in a day and age in which some pitchers consistently hit 100 mph, but for Urías, there was a little something extra behind that final pitch. Of the 682 four-seam fastballs that Urías threw in 2020, this pitch was the fastest. While it may have been a coincidence that his hardest-thrown pitch was also in the most important situation, I suspect the significance of the moment was a key factor.

I doubt this claim comes as much of a surprise to anyone. Most people in crucial situations will push a little harder to ensure the outcome is in their favor. To test the theory, I examined pitch velocities from the 2019 regular season. I chose 2019 rather than 2020 to ensure the situations were most similar to a normal year in case any of the irregularities of baseball during COVID influenced the data. In general, it appears that two-strike fastballs are thrown harder than fastballs in other counts. I graphed the respective densities of fastball velocities below. Read the rest of this entry »


Modeling the Effect of Deadening the Baseball

Much has been made of the “juiced ball era” which we currently inhabit. Decreased drag on the ball along with an increase in-ball bounciness means that fly balls are carrying further, rewarding hitters with more home runs than ever before. This change has coincided with increases in strikeout rates which can be partially explained by pitchers throwing harder, but also may be due to more hitters selling out for a home run. There are now fewer balls in play than ever before, and many fans no longer enjoy this Three True Outcomes style of baseball.

Deadening the ball is a proposed solution to ballooning home run rates. Introducing a deadened ball along with measures to limit the dominance of pitchers (such as shrinking the strike zone) could increase the number of balls in play, improving the aesthetic value of baseball for many viewers as discussed on this site in a recent article. But what would baseball with a deadened ball actually look like? How much would the ball have to be deadened to return home run rates to those seen in past years? Would deadening the ball disincentivize strikeouts more strongly than the juiced ball? Which hitters would be the biggest winners and losers in a season with a deadened ball?

I aim to investigate all these questions in this article, so without further ado, let’s dive right in. Read the rest of this entry »


Pound the Knees, Steven

After the Toronto Blue Jays traded for left-handed pitcher Steven Matz, he is projected to slide into the bottom of the starting rotation and pitch about 115 innings this year. Matz’s 2020 was a year to forget — join the club, Steven — but let’s take a look at who Matz is as a pitcher and why a change in fastball location is something the Jays coaching staff might consider.

Matz pitched only about 30 innings last year, so in the interest of sample size, I will also be using statistics from 2019 and 2018. Here is what those last three seasons looked like, courtesy of Baseball Savant: Read the rest of this entry »


Extracting Luck From BABIP

Balls in play are subject to lucky bounces, bloops, and exquisite defensive plays. Are some great hitting seasons and breakout performances just a player getting lucky on more than their fair share of balls? Is there any way to tell if a player is truly lucky or good, or if his batting average on balls in play is higher than we would expect? Could building a better expected BABIP help us find over- or undervalued players?

In the hopes of better understanding players’ true abilities, I looked specifically at the correlation between BABIP and launch characteristics. A player’s BABIP viewed across a short timeframe, such as a single season, can be highly influenced by luck. BABIP doesn’t converge well over a small sample. Using the law of large numbers, we know that given enough balls in play, a player’s BABIP should converge to their “true” BABIP. Fortunately, other launch characteristics like exit velocity and launch angle (both vertical and horizontal) converge more quickly. My goal was to build a model for expected BABIP based on those launch characteristics that removes as much luck as possible and more closely reflects a player’s true skill.

This project started as work I did along with Eric Langdon, Kwasi Efah, and Jordan Genovese for Safwan Wshah’s machine learning class at the University of Vermont. We were using launch characteristics (exit velocity, vertical launch angle, and derived horizontal launch angle) to predict if balls would land for hits or not. We initially tried using a support vector machine classification but found that a random forest model delivered more accurate predictions. Read the rest of this entry »