Extracting Luck From BABIP

Balls in play are subject to lucky bounces, bloops, and exquisite defensive plays. Are some great hitting seasons and breakout performances just a player getting lucky on more than their fair share of balls? Is there any way to tell if a player is truly lucky or good, or if his batting average on balls in play is higher than we would expect? Could building a better expected BABIP help us find over- or undervalued players?

In the hopes of better understanding players’ true abilities, I looked specifically at the correlation between BABIP and launch characteristics. A player’s BABIP viewed across a short timeframe, such as a single season, can be highly influenced by luck. BABIP doesn’t converge well over a small sample. Using the law of large numbers, we know that given enough balls in play, a player’s BABIP should converge to their “true” BABIP. Fortunately, other launch characteristics like exit velocity and launch angle (both vertical and horizontal) converge more quickly. My goal was to build a model for expected BABIP based on those launch characteristics that removes as much luck as possible and more closely reflects a player’s true skill.

This project started as work I did along with Eric Langdon, Kwasi Efah, and Jordan Genovese for Safwan Wshah’s machine learning class at the University of Vermont. We were using launch characteristics (exit velocity, vertical launch angle, and derived horizontal launch angle) to predict if balls would land for hits or not. We initially tried using a support vector machine classification but found that a random forest model delivered more accurate predictions.

After graduation, I continued the project by myself. One thing we overlooked in our initial work was that fast baserunners overperformed in our model while slower players underperformed. To compensate for this, I split the model in two: one for groundballs and one for fly balls, popups, and line drives. Groundballs are naturally much more dependent on runner speed, therefore I added a variable for runners’ time to first base to the groundball model. The final model accuracies were 85.5% for the fly ball model and 78.0% for the groundball model, leading to a weighted average accuracy of 82.3%. While the model struggled on groundballs (it didn’t account for infielder position), overall the results seemed pretty good.

Comparing My Results

Statcast has a “Batting Average using Speed Angle” stat that incorporates exit velocity and vertical launch angle, which I used as a comparison point. In terms of predicting a player’s BABIP for a season from their launch characteristics, my model delivered 38% less error than Statcast’s model (using 2019 data for players with over 300 balls in play). When using a player’s 2019 data to predict their 2020 BABIP, my model performed similarly to Statcast’s model, giving average errors of 3.94% and 3.90%, respectively. Both models were better at predicting a player’s 2020 BABIP than their actual 2019 BABIP numbers, which had an average error of 4.39%. The 2020 predictions have to be taken with a grain of salt, as I expect progression and regression to occur between seasons, and neither of the two models nor the data accounts for that.

Cross-Validation To Verify Accuracy 

I sought to verify the accuracy of my model’s predictions using cross-validation. I split the 2019 season data into two groups, randomly assigning balls in play into Box A or Box B. This allowed me to predict a player’s BABIP in Box A using their expected BABIP, their Statcast expected BABIP, and their actual BABIP from Box B, and vice versa. Regression/progression was not an issue since I was drawing two random samples from the same population. After running the comparison on each player with 125 balls in play in both groups A and B (>250 BIP for 2019), my model produced an average error of just 3.2% while Statcast had an average error of 3.5% and actual BABIP had an average error of 3.7%.

What’s Luck Got to Do With It?

If a player’s BABIP is higher than his expected BABIP, we can deem him lucky. Luck, as I define it, is the gap between actual and expected BABIP. The luckiest player in 2019 per my analysis was Nolan Arenado, who outperformed his expected BABIP of .325, achieving an actual BABIP for the season of .368. Unfortunately for Arendado, his luck didn’t follow him into 2020, where his expected BABIP exactly matched his actual BABIP at .277. On the flip side, 2019’s unluckiest player was Marcell Ozuna, who posted a BABIP of .314 despite an expected BABIP of .370. After signing a one-year deal with the Braves in 2020, Ozuna exploded and out-performed his expected BABIP of .429 to the tune of a .456 BABIP en route to a sixth-place finish in MVP voting.

leaderboard

I have listed the top 10 unluckiest and luckiest players from 2020 based on the difference between their expected and actual BABIPs. This year the expected BABIPs may be especially useful given the shortened 2020 season. With a greatly truncated sample size, the law of large numbers was not at play, so 2020’s actual BABIPs may fluctuate significantly from the true mean. The table includes players’ actual 2020 hits, balls in play, and BABIPs as well as my model’s expected BABIP (xBABIP), MLB’s Statcast Expected Batting Average on Balls in Play using SpeedAngle (mlbBABIP), and the difference between expected BABIP and actual BABIP (diff).

Further Refining Expected BABIP

While the results are encouraging, there is still much more that can be done to improve how to predict a player’s true BABIP. I am exploring some ideas on how to do that and hope to outline them, as well as the anticipated challenges, in future writing. The code I used for this project and the final report from the machine learning class are available upon request for those interested in the gory math details. I plan to continue this research, so any suggestions via questions or comments would be greatly appreciated.

Jack Olszewski graduated from the University of Vermont and has interned as a video scout for Baseball Info Solutions, a statistician for several college baseball and hockey teams, and a data analyst for a national publisher. He is currently pursuing entry-level positions in baseball operations and can be reached via email or LinkedIn.





8 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Dylan Higginsmember
3 years ago
Reply to  Jack Olszewski

(Sorry about that, got it fixed!)

eph_unitmember
3 years ago

Gotta include speed and shift in any calculation of xBABIP, imo

Richiemember
3 years ago
Reply to  eph_unit

Don’t gotta do nuthin’, Dude. It’s research, which shows what it shows what it shows with such-and-such degree of certainty.

Going back a few more years would help increase that certainty, except that as you do point out, increased shifting since then could pollute that data.

weekendatbidens
3 years ago

great stuff, I look forward to reading more in the future!

MRDXolmember
3 years ago

good stuff. Are there any extremes (players with very high/low BABIPs) that your model thought were at least partially deserved?

Also, it might be possible to implicitly include defensive shifts if you included the hitter’s mean & standev for GB & LD horizontal launch angle, for, say, the last 500 batted balls or the previous season’s. The program would presumably find that concentrating almost all your grounders in on place is a bad thing– because, of course, opposing teams also know that and put their fielders there.

Carl Backman
3 years ago

Nice work Jack! I may have a question for you about my research. Things are progressing sloooowwwwly….