Imagine a batter hits a long fly ball that’s destined for the right-field seats only for the outfielder on the other team to clear the wall and rob him of his home run. In traditional stat sheets, this is treated the same way as any other out and there’s no real way of distinguishing that from a dribbler down the third-base line. But intuitively we know that these are two very different things, and a batter who does more of the first is going to end up being more valuable than one who does more of the second. Thus, if we wanted to truly measure how well a player has performed, we need to separate the performance from the results. The best way of doing that is to break down a batted ball in the most granular way possible and look at the average performance for similar batted balls, and today I’ll reveal a personal tool to do this. This work was inspired by Tony Blengino’s terrific posts on batted-ball data, and I suggest reading his introductory post as background on the theory and methodology that I employ.
This tool uses information on the type, velocity, direction, and distance of a hitter’s batted balls to calculate an expected AVG, OBP, and SLG for him. It divides batted balls into buckets based on the type (GB, FB, LD, PU) and either the direction and velocity or the direction and the distance and calculates the resulting AVG and SLG for all batted balls that meet that criteria. It then goes through all of a batter’s plate appearances and uses these data to calculate both the observed and expected AVG/OBP/SLG for each PA. The table below shows the top 30 hitters by Expected wOBA (xwOBA) as of 5/26/2015.
The tool uses the velocity and direction, rather than the distance and direction, of a batted ball to calculate the expected values with a few exceptions. If the velocity is not available for a fly ball or a line drive, it uses the distance and the direction of the batted ball to calculate the expected values. If the velocity of the batted ball is not available for a ground ball, the tool assumes it was of average velocity and only considers the direction it was hit when calculating the expected values. It does not consider distance for ground balls, as the distances are calculated using where the ball was fielded, so using distance would be describing what actually happened rather than what we expected to happen. For all line drives and fly balls hit over 375 feet it uses distance and direction rather than velocity and direction. The reason for this is that I do not have information on the hang time of batted balls, and in going through the data I found that fly balls and line drives that traveled over 375 feet but weren’t hit very hard were being severely underrated by the tool. As an example of the underlying data, the table below shows the reference data for fly balls hit to center field.
|TYPE||Velocity Range (MPH)||Direction Range (90=CF)||AVG||OBP||SLG|
I’m providing a link to a Google Sheets document with a leaderboard for all qualified batters, along with leaderboards broken down by each batted ball type. The document also contains a reference page that contains all the information for how batted balls performed in each bucket based on 2015 StatCast data for velocity references and 2014-2015 MLBAM data for distance references. The numbers in the reference page will continue to be updated as more data becomes available from StatCast. Feel free to look through this section and point out any inconsistencies you may see, and note that all data comes from BaseballSavant.
I’ve also provided a Methodology Example in the document so you can dig through what the behind the scenes data looks like as it’s being processed. Note that you may see some discrepancies in a player’s actual AVG seen here and his AVG seen elsewhere, as I treat sac flies as regular outs. The “Notes” tab gives a general outline of the procedure, and also contains a link to an Excel sheet that you can download to perform these calculations on your own.
Before I wrap up, I should also mention the limitations. It’s been noted elsewhere on FanGraphs that the StatCast data isn’t always completely accurate. Also, the tool currently doesn’t incorporate a player’s speed in any way, so guys like Dee Gordon are going to be fairly underrated in terms of their ground ball performance. I’ve been brainstorming ways to incorporate this and am open to any input you may have. Furthermore, I’ve noticed the tool can be pretty stingy with labeling balls as pop-ups and occasionally pretty generous with labeling them as line drives. I’ve noticed some fly balls with velocities over 95 MPH that only traveled 300 feet, indicating they were hit almost straight up in the air. Unfortunately, without data on the vertical angle of the ball off the bat or on the hang time of the ball in play, it will be difficult to fix this issue.
Even with these limitations, the tool works extremely well at determining how well guys have been hitting the ball and identifying who has been helped or hurt by factors beyond their control. Take the time to dig through the data and the code and point out areas for improvement, and I’ll incorporate them in future versions.
Stephen Brown has been a Braves baseball fan his entire life and started BravesGeneralStore.com with a few friends after college to write about them. He can be found on twitter @srbrown70.