The 20-to-80 scale is one of the core tenets of baseball scouting and allows evaluators to quickly interpret a player’s skillset. Kiley McDaniel wrote an excellent series of articles back in 2014 (and provided an update this past November) explaining the scale, and while the whole series is easily worth a read, one of the key notes is as follows:
The invention of the scale is credited to Branch Rickey and whether he intended it or not, it mirrors various scientific scales. 50 is major league average, then each 10 point increment represents a standard deviation better or worse than average.
On the surface, the scale is fairly easy to understand, but somewhat harder to conceptualize what each grade actually looks like. For example, how frequently does a hitter with a 45 power grade hit a home run? How does a 60 run grade translate to Sprint Speed? I decided to investigate, drawing inspiration from a 2013 article by Mark Smith. The idea of an objective 20-to-80 scale, while not a new concept, is worth revisiting at this time because of changes in the run environment and the development of new player evaluation techniques, most notably StatCast. We’ll begin with a brief rundown on methodology before looking at each tool:
As McDaniel noted, the 20-to-80 scale largely mirrors a normal distribution, with each 10-point grade jump representing an increase of one standard deviation. Following this logic, the first step towards creating an objective scale is finding “major league average.” This in and of itself presents a bit of a challenge, as major league average differs from the average of all regular (qualified) major league players. Taking the average of only qualified major league regulars would result in a skewed distribution, while including all major leaguers would result in far too many outliers (for example, a player batting .000 or .750). Therefore, the trick lies in finding a sample representative of more than just regulars but also one that isn’t easily skewed by tiny sample size variations. Many of the cutoffs selected are admittedly arbitrary but allow for a reasonable sample of players receiving significant time in the major leagues, and are largely based off of Russell Carleton’s excellent work on reliability. These cutoffs will be detailed with each statistic provided.
Returning to the methodology, once the “major league average” was established for each parameter, the standard deviation of the sample was calculated. The average was assigned a grade of 50, and each 5-point increase on the 20-to-80 scale resulted in the addition of half a standard deviation to the previous value. Likewise, a 5-point decrease corresponded with a decrease from the higher value by half a standard deviation. The following chart illustrates this pattern:
All data spans 2016 to 2018 and represents cumulative averages over that span. When applicable, each tool grade also provides a representative example player, as well as the number of players in the sample that fall under that tool grade. These examples are meant to be purely illustrative and are in place to simply give an idea of what each tool might look like rather than exist as a hard-and-fast grade. For FanGraphs-sourced data, cumulative data is directly pulled while cumulative three-year data from Baseball Prospectus and Baseball Savant was calculated with the use of weighted averages. Without further ado, let’s look at the tools:
For the hit tool, we’ll look at a number of measures, varying from traditional to StatCast-based. In order, those parameters are strikeout rate, batting average, batting average on balls in play, expected batting average, and contact rate. Strikeout rate and contact rate grades were calculated based on all non-pitchers with at least 200 PA between 2016 and 2018 (easily clearing Carleton’s standard of 60 PA for K%), while batting average was based off players with at least 910 at-bats (matching up with Carleton’s standard), and expected batting average and BABIP are based off all players with at least 820 balls in play over that span. Here’s a look at the hit tool across the league:
Power can be measured in a number of ways, but we’ll look at four primary measures: home run rate, isolated power, average exit velocity, and barrels per batted ball event. Any player with at least 200 PA qualified for the sample for the ISO and HR% grades, while players with at least 100 batted ball events qualified for the StatCast metrics. It’s worth noting the absence of 20 or 25 power grade players, but this is possibly indicative of the emphasis teams place on hitting for power. It is nearly impossible for a player to provide enough value elsewhere to make up for a complete lack of power, as evidenced by the lack of extremely low power hitters in the league. Power may be one of the tools McDaniel mentioned in his primer on the scouting scale that doesn’t exactly follow a normal distribution.
Speed is a simpler tool to look at than just about any other tool thanks to the development of StatCast Sprint Speed. Any player season with 50 or more max effort runs (2016-2018) was considered for the sprint speed component of the tool. It is also worth adding baserunning into the mix as McDaniel notes that “baserunning and good jumps out of the batter’s box are also folded into the run grade.” The baserunning grade is based on a sample of all players with at least 150 times on base over the three-year span and the parameter is scaled to 150 times on base for ease of comparison.
While the hit, power, and speed tools are all fairly easily measured and interpreted, the defense and throwing tools present a new set of challenges. For one, it doesn’t make sense to compare players at vastly different positions on the same scale. For this purpose, I’ve broken down defense into three groups: catchers, infielders, and outfielders. Of the three groups, outfielders were the most straightforward. Outs Above Average provides a solid look at outfield defense (based on players with at least 1900 innings in the outfield, 2016 to 2018) but is absent for other positional groups. Additionally, the throwing components of DRS and UZR are more easily separated from the fielding-based components for outfielders, allowing for a separate defense grade (based on range and plays made) from the throwing grade.
This is similarly possible for catcher defense, but the catcher fielding grade is complicated by the myriad of factors that make up suiting up behind the plate. The catcher defensive parameters include non-throwing DRS, blocking runs, and framing runs. For infielders, it’s much more difficult to break down throwing runs from fielding ones, and for this purpose it is safest to simply present the two together. All Defensive Runs Saved and Ultimate Zone Rating data is from FanGraphs and includes any players with 2,000-plus innings played in the infield between 2016 and 2018. All UZR and DRS-based defensive metrics are scaled to 1,000 innings played. Catcher framing is scaled to per 5,000 framing chances while blocking is scaled to 2,500 blocking chances, based on all catchers with 1,000 defensive innings over the last three seasons (as are all catching defensive metrics for the purpose of this study).
As discussed above, while there is no throwing grade for infielders, there are somewhat promising measures with which to evaluate outfielder and catcher throwing ability. Outfielder throwing is based on the arm run components of DRS and UZR, scaled to 1,000 innings. Catcher throwing includes both StatCast arm strength and caught stealing runs, scaled to 1,000 innings. Outfield metrics are based on all players with at least 2,000 defensive innings in the outfield while catcher throwing is based on all players with at least 1,000 innings caught since 2016. It is worth noting that the somewhat high cutoff for outfielders led to a number of players (Khris Davis, Derek Dietrich) that scored especially poorly in terms of throwing to not qualify for the sample, likely because teams have simply tried to avoid putting notably poor defenders and throwers in the field. Here’s a look at the throwing grade:
It’s worth wondering whether the lack of catchers at either extreme in terms of defense and throwing is simply due to the limited sample of players or is being caused by a different phenomenon. It’s possible that the clustering effect is explained by the fact that terrible catchers either don’t remain behind the plate often or don’t make the major leagues. Whether this effect is due simply to a small sample, less than comprehensive defensive metrics, or something different, it is certainly interesting to observe compared to other positions.
For how often the 20-to-80 scale is discussed in baseball circles, it is interesting to note that it is rarely objectively analyzed to create a rough estimate of what each grade might look like at the major league level. The advent of StatCast has allowed for objective analysis of more tools than ever before and will hopefully continue to do so as the technology continues to develop. While my attempt at creating an objective view of the 20-to-80 scale is undoubtedly imperfect, the results certainly provide an interesting shorthand look at many of the measurable aspects of the scale.
Statistical data (AVG, BABIP, ISO, HR%, BsR, all DRS and UZR components, Contact%) from FanGraphs, pulled December 11, 2018. Catcher framing and blocking data taken from Baseball Prospectus, pulled on December 11, 2018. StatCast data (xBA, Exit Velocity, Sprint Speed, Barrels/BBE, OAA, Catcher Throwing Velocity) from Baseball Savant, pulled August 23, 2018. Cutoffs for metrics largely based on Russell Carleton’s work on reliability.