Using Batted-Ball Data to Measure Hitter Performance

Imagine a batter hits a long fly ball that’s destined for the right-field seats only for the outfielder on the other team to clear the wall and rob him of his home run. In traditional stat sheets, this is treated the same way as any other out and there’s no real way of distinguishing that from a dribbler down the third-base line. But intuitively we know that these are two very different things, and a batter who does more of the first is going to end up being more valuable than one who does more of the second. Thus, if we wanted to truly measure how well a player has performed, we need to separate the performance from the results. The best way of doing that is to break down a batted ball in the most granular way possible and look at the average performance for similar batted balls, and today I’ll reveal a personal tool to do this. This work was inspired by Tony Blengino’s terrific posts on batted-ball data, and I suggest reading his introductory post as background on the theory and methodology that I employ.

This tool uses information on the type, velocity, direction, and distance of a hitter’s batted balls to calculate an expected AVG, OBP, and SLG for him. It divides batted balls into buckets based on the type (GB, FB, LD, PU) and either the direction and velocity or the direction and the distance and calculates the resulting AVG and SLG for all batted balls that meet that criteria. It then goes through all of a batter’s plate appearances and uses these data to calculate both the observed and expected AVG/OBP/SLG for each PA. The table below shows the top 30 hitters by Expected wOBA (xwOBA) as of 5/26/2015.

Name AB PA Velocity AVG OBP SLG wOBA wRAA xAVG xOBP xSLG xwOBA xwRAA
Bryce Harper 151 191 89 0.331 0.471 0.722 0.505 29.1 0.298 0.445 0.650 0.467 23.3
Miguel Cabrera 164 195 93 0.341 0.446 0.610 0.453 21.7 0.304 0.415 0.665 0.457 22.3
Prince Fielder 182 199 93 0.363 0.417 0.571 0.425 17.7 0.349 0.404 0.640 0.443 20.5
Mike Trout 168 194 92 0.298 0.392 0.548 0.404 14.0 0.321 0.412 0.615 0.438 19.3
Anthony Rizzo 161 197 88 0.311 0.437 0.565 0.433 18.7 0.304 0.431 0.589 0.438 19.6
Ryan Braun 154 173 94 0.266 0.347 0.532 0.376 8.7 0.298 0.375 0.661 0.436 16.9
Paul Goldschmidt 160 190 93 0.338 0.442 0.631 0.459 22.0 0.290 0.402 0.615 0.433 18.1
Adrian Gonzalez 158 179 89 0.342 0.419 0.620 0.443 18.5 0.322 0.401 0.614 0.432 16.9
Todd Frazier 164 187 92 0.256 0.348 0.549 0.382 10.4 0.304 0.390 0.620 0.429 17.2
Yasmani Grandal 104 124 95 0.288 0.403 0.462 0.379 6.6 0.310 0.421 0.574 0.428 11.3
Brandon Crawford 151 170 93 0.298 0.376 0.510 0.383 9.5 0.316 0.393 0.608 0.426 15.2
Brandon Belt 139 156 93 0.302 0.378 0.496 0.379 8.2 0.316 0.391 0.606 0.424 13.8
Nelson Cruz 170 186 92 0.341 0.398 0.688 0.456 21.2 0.295 0.356 0.654 0.423 16.3
Alex Rodriguez 146 170 94 0.260 0.365 0.541 0.388 10.2 0.283 0.384 0.612 0.423 14.9
Joc Pederson 146 179 95 0.247 0.385 0.548 0.401 12.6 0.257 0.394 0.592 0.421 15.4
Mark Teixeira 147 177 87 0.231 0.362 0.551 0.390 10.9 0.281 0.402 0.560 0.414 14.2
Hanley Ramirez 158 170 94 0.259 0.312 0.468 0.336 3.2 0.318 0.366 0.590 0.406 12.6
Stephen Vogt 131 155 87 0.298 0.406 0.580 0.423 13.5 0.283 0.394 0.544 0.404 11.2
Cameron Maybin 109 126 92 0.248 0.349 0.404 0.332 2.0 0.304 0.398 0.537 0.403 9.0
Jose Bautista 133 165 92 0.211 0.364 0.444 0.353 5.4 0.252 0.397 0.530 0.401 11.5
Josh Reddick 153 170 90 0.314 0.382 0.536 0.395 11.1 0.302 0.372 0.561 0.399 11.6
Brian Dozier 174 196 90 0.247 0.332 0.494 0.355 6.6 0.284 0.365 0.572 0.399 13.4
Adam Jones 167 178 91 0.311 0.354 0.479 0.360 6.8 0.319 0.361 0.571 0.397 11.9
Freddie Freeman 169 188 92 0.302 0.372 0.485 0.372 8.9 0.304 0.375 0.553 0.397 12.6
Giancarlo Stanton 174 198 97 0.230 0.323 0.500 0.353 6.4 0.249 0.340 0.598 0.396 13.1
Matt Carpenter 165 184 91 0.321 0.391 0.582 0.416 15.0 0.293 0.366 0.557 0.394 11.9
Eric Hosmer 171 192 91 0.310 0.385 0.520 0.391 11.9 0.306 0.382 0.534 0.394 12.4
Lucas Duda 161 186 92 0.292 0.387 0.491 0.381 10.2 0.285 0.381 0.536 0.394 12.1
Mark Trumbo 144 152 93 0.264 0.303 0.507 0.345 3.9 0.298 0.335 0.600 0.394 9.8
Corey Dickerson 111 117 90 0.306 0.342 0.523 0.370 5.3 0.317 0.352 0.573 0.393 7.4

The tool uses the velocity and direction, rather than the distance and direction, of a batted ball to calculate the expected values with a few exceptions. If the velocity is not available for a fly ball or a line drive, it uses the distance and the direction of the batted ball to calculate the expected values. If the velocity of the batted ball is not available for a ground ball, the tool assumes it was of average velocity and only considers the direction it was hit when calculating the expected values. It does not consider distance for ground balls, as the distances are calculated using where the ball was fielded, so using distance would be describing what actually happened rather than what we expected to happen. For all line drives and fly balls hit over 375 feet it uses distance and direction rather than velocity and direction. The reason for this is that I do not have information on the hang time of batted balls, and in going through the data I found that fly balls and line drives that traveled over 375 feet but weren’t hit very hard were being severely underrated by the tool. As an example of the underlying data, the table below shows the reference data for fly balls hit to center field.

TYPE Velocity Range (MPH) Direction Range (90=CF) AVG OBP SLG
FB 105 150 85 95 0.732 0.732 2.511
FB 100 105 85 95 0.314 0.314 0.931
FB 97.5 100 85 95 0.082 0.082 0.247
FB 95 97.5 85 95 0.023 0.023 0.047
FB 92.5 95 85 95 0.000 0.000 0.000
FB 90 92.5 85 95 0.010 0.010 0.038
FB 87.5 90 85 95 0.025 0.025 0.063
FB 85 87.5 85 95 0.000 0.000 0.000
FB 80 85 85 95 0.020 0.020 0.050
FB 75 80 85 95 0.056 0.056 0.070
FB 70 75 85 95 0.220 0.220 0.231
FB 65 70 85 95 0.583 0.583 0.590
FB 60 65 85 95 0.145 0.145 0.145
FB 55 60 85 95 0.073 0.073 0.073
FB 0 55 85 95 0.073 0.073 0.073

I’m providing a link to a Google Sheets document with a leaderboard for all qualified batters, along with leaderboards broken down by each batted ball type. The document also contains a reference page that contains all the information for how batted balls performed in each bucket based on 2015 StatCast data for velocity references and 2014-2015 MLBAM data for distance references. The numbers in the reference page will continue to be updated as more data becomes available from StatCast. Feel free to look through this section and point out any inconsistencies you may see, and note that all data comes from BaseballSavant.

I’ve also provided a Methodology Example in the document so you can dig through what the behind the scenes data looks like as it’s being processed. Note that you may see some discrepancies in a player’s actual AVG seen here and his AVG seen elsewhere, as I treat sac flies as regular outs. The “Notes” tab gives a general outline of the procedure, and also contains a link to an Excel sheet that you can download to perform these calculations on your own.

https://docs.google.com/spreadsheets/d/1-XohbJlWIceDS2Rc8_7-rOxv9avU3IwMCecPkUNxlYU/edit?usp=sharing

Before I wrap up, I should also mention the limitations. It’s been noted elsewhere on FanGraphs that the StatCast data isn’t always completely accurate. Also, the tool currently doesn’t incorporate a player’s speed in any way, so guys like Dee Gordon are going to be fairly underrated in terms of their ground ball performance. I’ve been brainstorming ways to incorporate this and am open to any input you may have. Furthermore, I’ve noticed the tool can be pretty stingy with labeling balls as pop-ups and occasionally pretty generous with labeling them as line drives. I’ve noticed some fly balls with velocities over 95 MPH that only traveled 300 feet, indicating they were hit almost straight up in the air. Unfortunately, without data on the vertical angle of the ball off the bat or on the hang time of the ball in play, it will be difficult to fix this issue.

Even with these limitations, the tool works extremely well at determining how well guys have been hitting the ball and identifying who has been helped or hurt by factors beyond their control. Take the time to dig through the data and the code and point out areas for improvement, and I’ll incorporate them in future versions.





Stephen Brown has been a Braves baseball fan his entire life and started BravesGeneralStore.com with a few friends after college to write about them. He can be found on twitter @srbrown70.

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
tz
8 years ago

Steven,

Very cool stuff – can’t wait to dig into your methodology after work. I’ve been looking at the Baseball Savant batted-ball info from a different perspective (quality of contact for pitchers) and I look forward to comparing notes on how to best use this data. Like you, I want to put together a good framework first and caveat any missing or funny data in the early results.

Joshua
8 years ago

Love the research! All of it makes sense! Just one question though. Is it really true that the avg for a batted ball going 65-70 mph is .539?

Felix
8 years ago

Are the reference values for each batted ball trajectory based on standard shifts? It seems like it. If the references were based on defense shifts, this would be extremely powerful. Either way, great job.