Archive for Research

Fastball Velocity and Its Effect on Hitters

Over the past few seasons there has been a definite trend toward harder-throwing pitchers in the big leagues. The league average fastball velocity has gone up every year for the past few years, led by hard-throwing reliever Aroldis Chapman. Whether this increase in velocity is leading to a harder time for hitters at the plate would seem to be a topic of big concern for many of these teams who are investing in these hard-throwing players. Currently we see strikeout rates increasing at a rapid pace, but at the same time a home-run surge is happening. Are hitters just swinging hard and hoping to make good contact with these faster speeds? What kinds of effects are these higher velocities having on offensive performance?

AVG_vs_Velo

Taking data from AB results in 2015-17 we can see how batting average changes for hitters with respect to velocity. Here we can see that average of hitters goes down from close to .300 at pitch velocities down around 90 mph to around .200 at pitch velocities of above 100 mph. Clearly intuitive preconceptions, that faster pitches are harder to hit, seem to be justified by the data. Average, however, is not the be all end all of hitting metrics; we can look at the batting average on balls in play (BABIP) to get an idea of how hitters do when they do make contact with the faster pitches.

BABIP_vs_Velo

Here we can see the opposite effect compared to AVG. BABIP tends to slightly increase as velocity of the pitches go up. This tells us that the higher speeds of these pitches aren’t causing batters to make less solid contact, but they are causing the hitters to miss the ball completely. In addition, the rise in BABIP at the higher end of pitch velocity suggests that when contact is made at that speed it comes off of the bat faster and therefore is more likely to go for a hit. This seems to keep in line with what I was taught growing up: the faster a ball gets to the plate, the faster it leaves. That would suggest, however, that a higher percentage of hits are going to go for home runs when hit off of Aroldis Chapman rather than Bronson Arroyo, but does that happen?

ISO_vs_Velo

A look at isolated power (ISO) says that the assumption does not hold true. While the physics may appear correct in repeated lab tests, the conditions are not so predictable in the real world. Clearly the decrease in solid contact at higher velocities is having a major effect on power numbers. It seems that even among the balls that go for hits, more of them are ending up as singles than hits from pitches at lower velocities. This is another great sign for teams with hard-throwing pitchers that the money spent is worth it over a conventional pitcher.

The numbers presented in this article help to statistically show what was already intuitively known. Harder-throwing pitchers are harder to hit, and when they are hit, the hits are less damaging. Perhaps the one surprising conclusion was that faster pitches do not tend to result in more extra-base hits and home runs. In fact they lead to quite a bit fewer, even when looking at just balls that fall for hits. This all translates into good news for teams such as the Yankees who have invested a good amount of money in hard-throwing pitchers. Overall, while most likely detrimental to long-term health of the arms of many of the pitchers, I predict that with data like this coming out we will continue to see a trend of arms going the way of Chapman. Hard throwers that can put up a few seasons of good numbers and can be replaced by another hard thrower when they get injured or lose velocity. Speed is an easy trick to pick up and to use, and data here shows its effectiveness. All of that combined should lead to front offices targeting these types of hurlers for years to come.

 

(All data comes from Statcast and Pitch F/X via Baseball Savant)


A Brief Analysis of Predictive Pitching Metrics

Pitching performance can often be pretty volatile and difficult to predict. Look at Rick Porcello’s 2017 season, for example. After turning in a Cy Young-winning season in 2016, he regressed to have a below-average ERA. His ERA ballooned from 3.15 in 2016 to 4.65 in 2017.

This is where predictive pitching metrics come in. By just looking at Porcello’s ERA from 2016 it may have been hard to predict his 2017 ERA. Thus, we should use different metrics to better predict his performance.

One popular statistic for more accurately quantifying and predicting pitching performance is FIP (Fielding Independent Pitching). FIP attempts to approximate a pitcher’s performance independent of factors which the pitcher cannot directly control himself, such as his defense’s performance. For example, a good pitcher with a weak defense can induce lots of weak contact but still give up lots of runs due to his defense’s inability to successfully field a lot of balls. Additionally, luck may play a significant factor in how many runs a pitcher concedes. A pitcher may be unlucky and give up lots of bloop hits, or weakly hit balls that land away from fielders. Thus, FIP focuses on the factors that pitchers can directly control, such as strikeouts, walks, hit batsmen, and home runs.

The formula for FIP is:

FIP = (13*HR + 3*(BB + HBP) – 2*K) / IP   +   FIP constant

where HR is home runs allowed, BB is walks allowed, HBP is hit batsmen, K is strikeouts, and IP is innings pitched. FIP is scaled to ERA (Earned Run Average) by the FIP constant, and can be read the same way as ERA (i.e., lower FIP corresponds to better performance).

FIP’s formula may look complicated, but all it does is weight certain pitching statistics per inning pitched. Because a favorable FIP is one that is lower, strikeouts are weighted negatively since they contribute to favorable pitching performance, and home runs, walks, and hit batsmen are weighted positively since they contribute to unfavorable pitching performance. Home runs are weighted the most positively (at a coefficient of 13) because they are most detrimental to pitching performance and cause the most runs to be allowed.

Variability Between FIP and ERA

Figure 1

FIP provides an estimate of pitching performance independent of defensive performance and luck. If it is compared to ERA, the variance between the two statistics can provide an estimate of how much defensive performance or luck affects the number of runs allowed by a pitcher. FIP and ERA can be compared by creating a distribution of FIP – ERA for yearly pitching performance. In Figure 1, a distribution of FIP – ERA for all single-season starting  pitching performances (minimum 162 innings) from 2011 to 2015 is created using FanGraphs’ databases. The spread of this distribution is fairly symmetrical. The average FIP – ERA is 0.058 runs, meaning that qualified starting pitchers tend to have slightly higher FIPs than ERAs. The standard deviation is 0.498 runs, signifying that on average starting pitchers’ ERAs tend to differ from the average FIP – ERA of 0.058 by 0.498 runs. Thus, defensive performance and luck cause a starting pitcher’s ERA to differ from what it would be based off fielding-independent factors by about a half run.

Figure 2

Figure 2 shows a distribution of FIP – ERA for all single-season relief pitching performances (minimum 50 innings) from 2011 to 2015. Like the distribution for starting pitchers, the spread of FIP – ERA for relief pitchers is fairly symmetrical. However, the average FIP – ERA is 0.253 runs, meaning that on average qualified relief pitchers have significantly higher FIPs than ERAs. A possible reason for this could be that relief pitchers often throw harder than starters and can induce weaker contact from hitters, thus allowing the defense to convert more outs off balls in play than they would normally. Additionally, the standard deviation is 0.734 runs, meaning that on average relief pitchers’ ERAs tend to differ from the average FIP – ERA of 0.253 by 0.734 runs. Thus, defensive performance and luck cause a relief pitcher’s ERA to differ from what it would be based off fielding-independent factors by close to one run.

Predicting Future Pitching Performance

FIP is also useful in that it can help predict future pitching performance. Since the fielding-independent statistics that FIP uses in its formula (strikeouts, home runs, walks, hit batsmen) tend to stay more constant year to year than ERA, FIP tends to be consistent than ERA year to year. Thus, due to its lack of variability, it can be a better estimator for future pitching performance.

Figure 3

Figure 4

To determine how well ERA and FIP predict future pitching performance, the pitching statistics for the 50 pitchers that pitched at least 162 innings in both 2014 and 2015 are obtained. 2014 ERA and FIP are tested to see how well they predict 2015 ERA by looking at their correlation with 2015 ERA. This is demonstrated by Figure 3, which tests how well 2014 ERA predicts 2015 ERA. There is a moderate, positive, linear relationship with a correlation  coefficient of 0.382. Thus, it can be said that 2014 ERA is a moderately accurate predictor of 2015 ERA. Figure 4 demonstrates how well 2014 FIP predicts 2015 ERA. There is also a moderate, positive, linear relationship, but the correlation coefficient is higher at 0.462. Thus, there is a stronger relationship between 2014 FIP and 2015 ERA, and it can be said that 2014 FIP is a better predictor of 2015 ERA.

However, FIP is not the only fielding-independent statistic that is commonly used. xFIP is a variant of FIP that uses a pitcher’s fly ball rate instead of home runs in its formula. The logic behind this is that fly balls a pitcher gives up are a strong indicator of how many home runs a pitcher will give up in the future — an even better indicator than home runs themselves. The formula for xFIP is:

FIP = (13*(Fly Balls*League Home Run per Fly Ball Rate) + 3*(BB + HBP) – 2*K) / IP   +   FIP constant

Figure 5

Figure 5 demonstrates the relationship between 2014 xFIP and 2015 ERA. Similar to the aforementioned relationships, there is a moderate, positive, linear relationship, but with an even higher correlation coefficient at 0.520. Thus, in comparison to ERA and FIP, xFIP is the strongest predictor for pitcher success.

Figure 6

Skill-Interactive ERA, abbreviated as SIERA, is another fielding-independent statistic. It is a variant of xFIP, but it accounts for various factors that make xFIP less accurate. For example, each walk given up by a pitcher is less detrimental if he generally walks few batters, whereas each walk given up by a pitcher is more detrimental if he generally walks more batters. Thus, SIERA takes this into account. The complete formula of SIERA can be viewed here. Figure 6 shows the relationship between 2014 SIERA and 2015 ERA. There is a moderate, positive, linear relationship with a correlation coefficient of 0.517. This is almost the same as xFIP’s correlation coefficient with 2015 ERA, which was 0.520. Overall, there is likely not a very significant difference in predicting ERA using SIERA or xFIP, but this assertion can be better tested through obtaining more data.

Conclusion

What can be concluded from this piece is how much defensive performance and luck can alter a pitcher’s ERA, and what statistics should be used to predict future performance for pitchers. On average defensive performance and luck account provide about half a run in variation of a starting pitcher’s ERA, and about one run in variation of a relief pitcher’s ERA. Additionally, the statistics that are most effective in predicting future pitching performance are xFIP and SIERA.

Acknowledgments

I want to thank my AP Statistics teacher, Ms. Rachel Congress, for teaching me a lot of the material about statistics that I applied in this paper.

Bibliography

DuPaul, Glenn. “Occam’s Razor and Pitching Statistics.” The Hardball Times. FanGraphs, 26 Sept. 2012. Web. 24 May 2016.

“Fielding Independent Pitching (FIP) Added to Baseball-Reference.com » Sports Reference.”
Sports Reference RSS. Sports Reference, 17 Apr. 2014. Web. 24 May 2016.

A Guide to Sabermetric Research.” Society for American Baseball Research. Society for American Baseball Research, n.d. Web. 24 May 2016.

McCracken, Voros. “Baseball Prospectus | Pitching and Defense.” Baseball Prospectus. N.p., 23 Jan. 2001. Web. 24 May 2016.

Petti, Bill. “How Teams Can Get the Most Out of Analytics.” The Hardball Times. FanGraphs, 27 Jan. 2015. Web. 24 May 2016.

Sawchik, Travis. Big Data Baseball: Math, Miracles, and the End of a 20-year Losing Streak. New York: Flatiron, 2015. Print.

Swartz, Matt. “New SIERA, Part Three (of Five): Differences Between XFIPs and SIERAs.”
Baseball Statistics and Analysis. N.p., 20 July 2011. Web. 24 May 2016.

Swartz, Matt. “New SIERA, Part Two (of Five): Unlocking Underrated Pitching Skills.” Baseball Statistics and Analysis. N.p., 19 July 2011. Web. 24 May 2016.


On Jake Arrieta, Aaron Slegers, and Extreme Release Points

Jake Arrieta turning himself from a Baltimore castoff to a Chicago Cy Young Award winner was a fascinating thing to watch, especially considering how it happened. This wasn’t just a guy who benefited from a change of scenery. When Arrieta adopted a new look, it was much more than his jersey color that changed.

The alterations were covered in a great 2014 Jeff Sullivan article titled Building Jake Arrieta. Among the things noted in that piece was his new release point that was primarily the result of pitching from the third-base side of the rubber.

Sullivan noted changes in Arrieta’s delivery yet again this May, pointing out an even more extreme horizontal release point in a piece titled Jake Arrieta Has Not Been Good. How extreme? Well, he’s throwing like a giant. No, not the kind that play in San Francisco. Arrieta has achieved nearly the exact same release point as Minnesota Twins pitcher Aaron Slegers, who at 6-foot-10 is one of the tallest hurlers to ever grace the mound.

Among the 562 right-handed pitchers Baseball Savant has data on from 2017, only three of them averaged a release point of at least 6.2 feet vertically and 3.3 feet horizontally: Arrieta, Slegers, and Brewers reliever Taylor Jungmann. Jungmann only thew 0.2 innings for Milwaukee last season, so there’s not much to unpack there. Below is the release point chart for Arrieta, courtesy of Baseball Savant:

And here is the chart for Slegers:

And finally, below is a graph showing how Arrieta’s horizontal release point has evolved over his career. You can see the dramatic dip to his first full season with Chicago in 2014. Things leveled out somewhat from there to 2016, but then there’s another noticeable dive last season.

Arrieta’s horizontal release point was farther toward third base than 98.6 percent of right-handed pitchers last year. It’s easy to see why a pitcher would want to create a unique look, as hitters aren’t accustomed to picking up a ball from that point, but how much does that really matter? Well, by the sound of this Francisco Cervelli quote from an MLB.com article in October 2015, I’m guessing it matters a lot.

“What makes him so tough is he throws the ball from the shortstop,” Cervelli said. “He’s supposed to throw straight. It should be illegal.”

Given Arrieta’s struggles, however, you can’t help but wonder if maybe he has taken this too far. He hit a career-high 10 batters and led the league in wild pitches for the second-straight season. Coming into 2017, Arrieta had averaged up just 6.2 H/9 and 0.5 HR/9 as a Cub. Last year, those numbers ballooned to 8.0 H/9 and 1.2 HR/9. His quality of pitch average also dipped from a score of 5.31 over his first three seasons with the Cubs to 4.98 last year.

The free agent market has been slow to get moving, but you’d have to figure things will start to pick up once the calendar turns over to 2018. It’ll be interesting to see if Arrieta’s new team tries to tweak some things with his mechanics. If nothing else, he’s shown a great openness to experiment.

Arrieta used his feet to get his arm into an angle that only a much taller pitcher should be able to achieve. Is it possible another set of eyes could get him pointed back in the right direction in 2018?

Tom Froemming is a contributor at Twins Daily and co-author of the 2018 Minnesota Twins Prospect Handbook.


On Drew Smyly, Michael Pineda, and the History of Signing Injured Free-Agent Pitchers

About 12 hours apart, news of two very similar moves broke out of Chicago and Minnesota, as the Cubs agreed to terms with Drew Smyly while the Twins signed Michael Pineda. Both pitchers inked two-year deals with $10-million guarantees and additional incentives based on innings pitched, but the two deals shared an even more important similarity: both pitchers underwent Tommy John surgery this summer and seem unlikely to contribute significantly during the 2017 campaign. Both clubs are clearly betting on a return to health and productivity in 2019 for the two still relatively young pitchers, as evidenced by the financial distribution of the contracts. Pineda is only owed $2 million for the upcoming season but will receive $8 million in 2019, while Smyly will be paid $3 million next year but will pull in $7 million the following year. Since both pitchers underwent surgery around the same time, during the middle of the summer, it seems unlikely that either will throw pitch in the coming season.

While uncommon, these types of deals certainly aren’t entirely unprecedented. The Kansas City Royals have inked three pitchers with similar situations over the past few years, with varying degrees of success. These contracts, given to Luke Hochevar and Kris Medlen in 2015 and Mike Minor the following season, seem to represent the most relevant examples of such a deal. While Minor was non-tendered by the Braves following repeated shoulder issues, both Medlen and Hochevar underwent Tommy John surgery the previous year. All three pitchers would appear for the Royals in the major leagues over the life of their deals, albeit with differing results. Hochevar would appear in 89 games for the Royals, and accumulate only marginal value, as he posted a FIP around 4.00 and tallied only 0.3 WAR combined before succumbing to thoracic outlet syndrome surgery. Kansas City declined their option over Hochevar last winter, who became a free agent and sat out 2017 recovering.

Medlen would also return to pitch in 2015, making eight starts and seven relief appearances for Kansas City. He saw an uptick in walks and a downturn in strikeouts compared to his previous work, but overall pitched his way to a 4.01 ERA with similar peripherals and rang up half a win of value. 2016, however, would not be so kind to Medlen, as he was shelled to the tune of a 7.77 ERA while walking more batters than he struck out and battling a shoulder injury. He would sign a minor-league deal with the Braves after the season, but would not return to the majors. Although he did not appear with the Royals in 2016 after struggling in AAA, Minor marks the largest success story of the three. Over 65 relief appearances, Minor registered a 2.62 FIP and was worth 2.1 WAR out of the bullpen. He recently signed a three-year contract with the Rangers to return to a starting role.

In total, the Royals invested $25.75 million in the three pitchers and saw them accumulate a grand total of 2.9 WAR, with most it coming from Minor. This works out to a $/WAR figure of $8.88 million per win, which is slightly higher than the $8 million per win value assumed of the free-agent market. Based on these three deals, it would appear that this type of signing is not a bargain, but rather an overpay on average. However, it isn’t fair to make such an assumption without looking at a larger sample of data. If we classify a similar deal as one in which a team signed a pitcher that was injured at the time of the signing and expected to miss at least part of the following season and either signed a major-league deal or a two-year minor-league pact, that leaves us with 18 similar signings since 2007. One of these signings, Nate Eovaldi, has yet to return from his injury but should in 2018, so we won’t include him in the sample.

These 17 signings correlate to 25 player seasons following injury, with 24 of those representing guaranteed contract years, as well as one option year (Joakim Soria, 2015). The breakdown of these player seasons by games, innings pitched, strikeouts, walks, earned runs, and WAR are presented in the table below:

G IP K BB ER WAR
Total 447 725.2 606 246 347 6.9
Mean 18 29 24 10 14 0.27
Median 7 20 15 6 10 0

Altogether, when on a big-league mound, the group pitched to a 4.30 ERA to go along with a 7.52 K/9 and a 3.05 BB/9, numbers not entirely dissimilar from, say, Dustin McGowan or Sal Romano in 2017. So even the healthy group put together fairly middling results, but it’s also important to remember that eight of these player seasons wouldn’t see the player throw a single big-league pitch, and therefore provided no value to the club. Let’s plot the distribution of value produced by WAR:

INJ FA Pit WAR

That 2.1 WAR recorded by Minor last season was the highest figure of any player season in the sample, and besides Mike Pelfrey’s 2013 season, no other player season really comes close. Of the 10 player seasons recorded by primarily starting pitchers, only Pelfrey’s season even came close to average production, as every other starter either wasn’t durable or good enough to rack up any significant value. On the relief side, Minor and 2014 Joakim Soria both excelled, but no other relief season (out of the 15 in the sample) even crossed the 0.5 win threshold. As with the Royals pitchers earlier, it is important to look at these deals from a value standpoint. We can do this by calculating $ per WAR for the whole sample to find a mean, and for each deal to find a median, and visually represent the distribution. Overall, teams invested a total of $78 million in these 25 player seasons, with $71 coming in guaranteed money and $7 million in Joakim Soria’s club option. All minor-league deals to MLB veterans were assigned a dollar value of $333,333 for ease of calculation. Bonuses and incentives were ignored from this figure, as it is very difficult to find these details of the player contracts and few of these seasons would reach such incentives. As we saw above, the sample produced a total WAR of 6.9. This means that on average, teams paid $11.3 million per win when committing money to injured pitchers in hopes of a bounceback, well above the market rate of $8 million per win in free agency. Based on some quick calculations, teams paid that $78 million for production worth $55.2 million, for a net loss of $22.8 million. Let’s now look at the value gained/lost for each contract (in millions of $):

INJ FA Pit Val

As you can see, only five such contracts actually generated positive (above market value of $8 million per win), while the remaining 12 contracts provided their team with below-market value. The mean loss per contract is $1.34 million, while the median is represented by the Phillies’ $700k loss on Chad Billingsley. While neither number is outrageously high, both figures only serve to reinforce the fact that teams have generally lost more often than they have benefited from inking an injured pitcher.

None of this is necessarily to say that the Pineda, Smyly, and Eovaldi contracts are doomed or that no team should ever make this type of investment, but simply to look at how similar deals have worked out in the past. Admittedly, the sample is hardly big enough to make any sort of definitive conclusion, but the overall trend on these “bargain” signings isn’t pretty. Both Smyly and Pineda are better pitchers than most in the sample, so it is entirely possible that they (along with Eovaldi) could significantly shift the outlook on these types of deals in the future. Whether this trio of pitchers can buck the trend or will follow in the footsteps of their predecessors will certainly be an interesting, if minor (pun intended) storyline to watch over the next few seasons.

FanGraphs.com leaderboards, Baseball-Reference transaction data, and MLBReports Tommy John surgery database were all used extensively for this research.


Relationship Between OBP and Runs Scored in College Baseball

There is a segment of the population of the United States which meets the following criteria:  between the age of 18-21, devout FanGraphs reader, and was mesmerized by the movie “Moneyball.”  I have read the book and watched the movie a number of times, as well as dedicating time to understanding the guiding principles in the book and how they relate to professional baseball.  The relationship between on-base percentage and scoring runs in Major League Baseball is well established, but has anyone ever taken the time to examine the relationship at the collegiate level?

Collegiate baseball is volatile — roster makeups change dramatically each year, no player is around more than five years, not to mention there are hundreds of teams competing against one another. In terms of groundbreaking sabermetric principles, this study is not intended to turn over any new stones, but rather present information which may have been overlooked up to this point, which is the relationship between on-base percentage and runs scored in collegiate baseball.

To conduct this study, I compiled a list of Southeastern Conference team statistics from the 2014-2017 seasons (Runs Scored, On Base Percentage, Runs Against, and Opponents’ On Base Percentage).  I then performed linear regression on the distribution by implementing a line of best fit.  Some teams’ seasons were excluded due to inability to access that season’s data, and I felt like removing the 2014 Auburn season on the grounds that it was an outlier affecting the output (235 runs, 0.360 OBP).  Below is the resulting math:  the R2, and the resulting predictive equation:

Runs Scored = ( 3,537. x OBP ) – 933.6791

R² = 0.722849

I am by no means a seasoned statistician, but in my interpretation of the R2 value, the relationship between Runs Scored and OBP in this is moderately strong, with a team’s OBP accounting for roughly 72.3% of the variation in Runs Scored in a season.  Simply, OBP is statistically significant in determining the offensive potency of a team.

At the professional level, the R2 is found to be around 0.90.  The competitive edge the Oakland A’s used in “Moneyball” was using this correlation to purchase the services of “undervalued” players.  But what about in college?  Colleges certainly cannot purchase their players, but the above information can be useful to college programs.

For example, the average Runs Scored per season of the sample I used was roughly 347.8.  If an SEC team wanted to set the goal of being “above average” offensively, they would be able to determine, roughly, what their target OBP should be by using the resulting predictive equation from the Linear Fit:

Does this mean if an SEC program produces an OBP of .362 they would score 348 runs precisely? Obviously not. Could they end up scoring exactly 348 runs? Yes, but variation exists, and statistics is the study of variation.  Here are a few seasons in which teams posted an OBP at or around 0.362, and the resulting run totals:

The average of those six seasons’ run totals was 347.5, which is pretty darn close to 348, and even closer to the average of 347.8 runs derived from the sample.

Another use for this information is lineup construction and tactical strategy in-game.  The people in charge of baseball programs do not need instruction on how to construct their roster and manage their team, but who would disagree with a strategy of maximizing your team’s ability to get on base?

The purpose of this study was to examine the relationship between On Base Percentage and Runs Scored in college baseball, and how the relationship compares to its professional counterpart.  To conclude, the relationship between OBP and runs exists at the collegiate level, and carries considerable weight and value if teams are willing to get creative in utilizing its ability.

 

Disclaimer: I am a beginner-level statistician, and if you have any suggestions or critiques of this article, please feel free to share them with me.

Theodore Hooper is a Student Assistant, Player Video/Scouting, for the University of Tennessee baseball program.  He can be reached at thooper3@vols.utk.edu or on LinkedIn at https://www.linkedin.com/in/theodore-hooper/


Identifying Impact Hitters: Proof of Concept

Earlier this season I set out to build a tool similar in nature to my dSCORE tool, except this one was meant to identify swing-change hitters. Along the course of its construction and early-alpha testing, it morphed into something different, and maybe something more useful. What I ended up with was a tool called cHit (“change Hit”, named for swing changers but really I was just too lazy to bother coming up with a more apt acronym for what the tool actually does). cHit, in its current beta form, aims to identify hitters that tend to profile for “impact production” — simply defined as hit balls hard, and hit them in the air. Other research has identified those as ideal for XBH, so I really didn’t need to reinvent the wheel. Although I’d really like to pull in Statcast data offerings in a more refined form of this tool, simple batted ball data offered here on FanGraphs does the trick nicely.

The inner workings of this tool takes six different data points (BB%, GB%, FB%, Hard%, Soft%, Spd), compares each individual player’s stat against a league midpoint for that stat, then buffs it using a multiplier that serves to normalize each stat based on its importance to ISO. I chose ISO as it’s a pretty clean catch-all for power output.

Now here’s the trick of this tool: it’s not going to identify “good” hitters from “bad” hitters. Quality sticks like Jean Segura, Dee Gordon, Cesar Hernandez, and others show up at the bottom of the results because their game doesn’t base itself on the long ball. They do just fine for themselves hitting softer liners or ground balls and using their legs for production. Frankly, chances are if a player at the bottom of the list has a high Speed component, they’ve got a decent chance of success despite a low cHit. Nuance needs to be accounted for by the user.

Here’s how I use it to identify swing-changers (and/or regression candidates): I pulled in data for previous years, back to 2014. I compared 2017 data to 2016 data (I’ll add in comparisons for previous years in later iterations) and simply checked to see who were cHit risers or fallers. The results were telling — players we have on record as swing changers show up with significant positive gains, and players that endured some significant regression fell.

There’s an unintended, possible third use for this tool: identifying injured hitters. Gregory Polanco, Freddie Freeman, and Matt Holliday all suffered/played through injury this year, and they all fell precipitously in the rankings. I’ll need a larger sample size to see whether injuries and a fall in cHit are related or if that’s just noise.

Data!

cHit 2017
Name Team Age AB cHit Score BB% GB% FB% Hard% Soft% Spd ISO
Joey Gallo Rangers 23 449 27.56 14.10% 27.90% 54.20% 46.40% 14.70% 5.5 0.327
J.D. Martinez – – – 29 432 23.52 10.80% 38.30% 43.20% 49.00% 14.00% 4.7 0.387
Matt Carpenter Cardinals 31 497 22.46 17.50% 26.90% 50.80% 42.20% 12.10% 3.1 0.209
Aaron Judge Yankees 25 542 21.56 18.70% 34.90% 43.20% 45.30% 11.20% 4.8 0.343
Lucas Duda – – – 31 423 19.69 12.20% 30.30% 48.60% 42.10% 14.50% 0.5 0.279
Cody Bellinger Dodgers 21 480 19.26 11.70% 35.30% 47.10% 43.00% 14.00% 5.5 0.315
Miguel Sano Twins 24 424 17.73 11.20% 38.90% 40.50% 44.80% 13.50% 2.9 0.243
Jay Bruce – – – 30 555 16.50 9.20% 32.50% 46.70% 40.30% 11.70% 2.6 0.254
Trevor Story Rockies 24 503 16.39 8.80% 33.70% 47.90% 40.30% 14.40% 4.7 0.219
Justin Turner Dodgers 32 457 16.16 10.90% 31.40% 47.80% 38.90% 9.80% 3.3 0.208
Khris Davis Athletics 29 566 15.64 11.20% 38.40% 42.30% 42.10% 13.50% 3.4 0.281
Brandon Belt Giants 29 382 15.38 14.60% 29.70% 46.90% 38.40% 14.00% 4.2 0.228
Nick Castellanos Tigers 25 614 14.94 6.20% 37.30% 38.20% 43.40% 11.50% 4.6 0.218
Eric Thames Brewers 30 469 14.52 13.60% 38.40% 41.30% 41.50% 16.00% 4.6 0.271
Justin Upton – – – 29 557 14.43 11.70% 36.80% 43.70% 41.00% 19.80% 4 0.268
Justin Smoak Blue Jays 30 560 14.38 11.50% 34.30% 44.50% 39.40% 13.10% 1.7 0.259
Wil Myers Padres 26 567 14.32 10.80% 37.50% 42.90% 41.40% 19.50% 5.3 0.220
Paul Goldschmidt Diamondbacks 29 558 14.31 14.10% 46.30% 34.90% 44.30% 11.30% 5.6 0.265
Chris Davis Orioles 31 456 14.28 11.60% 36.70% 39.80% 41.50% 12.80% 2.7 0.208
Kyle Seager Mariners 29 578 13.57 8.90% 31.30% 51.60% 35.70% 13.10% 2.2 0.201
Nelson Cruz Mariners 36 556 13.35 10.90% 40.40% 41.80% 40.70% 14.70% 1.7 0.261
Mike Zunino Mariners 26 387 13.31 9.00% 32.00% 45.60% 38.60% 17.50% 1.9 0.258
Mike Trout Angels 25 402 13.16 18.50% 36.70% 44.90% 38.30% 19.00% 6.2 0.323
Corey Seager Dodgers 23 539 13.08 10.90% 42.10% 33.10% 44.00% 12.90% 2.7 0.184
Logan Morrison Rays 29 512 12.74 13.50% 33.30% 46.20% 37.40% 17.50% 2.4 0.270
Randal Grichuk Cardinals 25 412 12.61 5.90% 35.90% 42.70% 40.20% 18.20% 5.2 0.235
Salvador Perez Royals 27 471 12.50 3.40% 33.30% 47.00% 38.10% 16.50% 2.4 0.227
Michael Conforto Mets 24 373 12.42 13.00% 37.80% 37.80% 41.60% 20.20% 3.6 0.276
Matt Davidson White Sox 26 414 12.19 4.30% 36.20% 46.50% 38.20% 15.80% 1.8 0.232
Mike Napoli Rangers 35 425 12.15 10.10% 33.20% 52.10% 35.50% 21.90% 2.7 0.235
Miguel Cabrera Tigers 34 469 12.03 10.20% 39.80% 32.90% 42.50% 9.90% 1.1 0.149
Brandon Moss Royals 33 362 11.83 9.20% 33.10% 44.50% 37.30% 13.60% 2.3 0.221
Curtis Granderson – – – 36 449 11.69 13.50% 32.60% 48.80% 35.30% 17.60% 4.8 0.241
Ian Kinsler Tigers 35 551 11.64 9.00% 32.90% 46.50% 37.00% 18.70% 5.6 0.176
Edwin Encarnacion Indians 34 554 11.01 15.50% 37.10% 41.80% 37.60% 15.50% 2.7 0.245
Manny Machado Orioles 24 630 10.79 7.20% 42.10% 42.10% 39.50% 18.50% 3.3 0.213
Freddie Freeman Braves 27 440 10.72 12.60% 34.90% 40.60% 37.50% 12.40% 4.3 0.280
Nolan Arenado Rockies 26 606 10.60 9.10% 34.00% 44.90% 36.70% 17.60% 4.1 0.277
Anthony Rendon Nationals 27 508 10.41 13.90% 34.00% 47.20% 34.30% 13.00% 3.5 0.232
Yonder Alonso – – – 30 451 10.34 13.10% 33.90% 43.20% 36.00% 13.20% 2.4 0.235
Kyle Schwarber Cubs 24 422 10.24 12.10% 38.30% 46.50% 36.40% 21.30% 2.8 0.256
Carlos Gomez Rangers 31 368 10.19 7.30% 39.10% 40.30% 39.00% 16.50% 5 0.207
Luis Valbuena Angels 31 347 9.81 12.00% 38.40% 47.30% 35.80% 22.00% 1.3 0.233
Dexter Fowler Cardinals 31 420 9.61 12.80% 39.40% 38.20% 38.10% 12.70% 5.9 0.224
Jed Lowrie Athletics 33 567 9.40 11.30% 29.40% 43.50% 34.50% 12.10% 2.7 0.171
Giancarlo Stanton Marlins 27 597 8.96 12.30% 44.60% 39.40% 38.90% 20.80% 2.3 0.350
Jose Abreu White Sox 30 621 8.95 5.20% 45.30% 36.40% 40.50% 15.80% 4.4 0.248
Josh Donaldson Blue Jays 31 415 8.92 15.30% 41.00% 42.30% 36.30% 17.30% 1.6 0.289
Joey Votto Reds 33 559 8.87 19.00% 39.00% 38.00% 36.30% 10.40% 2.8 0.258
Victor Martinez Tigers 38 392 8.75 8.30% 42.10% 34.20% 39.90% 12.40% 0.9 0.117
Charlie Blackmon Rockies 31 644 8.63 9.00% 40.70% 37.00% 39.00% 17.10% 6.4 0.270
Mitch Moreland Red Sox 31 508 8.43 9.90% 43.40% 36.20% 38.90% 13.50% 1.7 0.197
Scott Schebler Reds 26 473 8.29 7.30% 45.60% 38.20% 39.40% 19.30% 3.9 0.252
Paul DeJong Cardinals 23 417 8.19 4.70% 33.70% 42.90% 36.40% 21.40% 2.5 0.247
Ryan Zimmerman Nationals 32 524 8.18 7.60% 46.40% 33.70% 40.50% 14.10% 2.2 0.269
Mookie Betts Red Sox 24 628 7.76 10.80% 40.40% 42.80% 35.70% 18.20% 5.5 0.194
Rougned Odor Rangers 23 607 7.61 4.90% 41.50% 42.20% 36.80% 18.50% 5.6 0.193
Francisco Lindor Indians 23 651 7.42 8.30% 39.20% 42.40% 35.20% 14.30% 5.1 0.232
Brad Miller Rays 27 338 7.39 15.50% 47.40% 36.10% 38.40% 18.10% 4.6 0.136
Daniel Murphy Nationals 32 534 6.97 8.80% 33.50% 38.90% 35.70% 16.70% 3.8 0.221
Travis Shaw Brewers 27 538 6.87 9.90% 42.50% 37.60% 37.10% 15.80% 4.5 0.240
Jake Lamb Diamondbacks 26 536 6.86 13.70% 41.10% 38.30% 35.70% 12.90% 4.4 0.239
Todd Frazier – – – 31 474 6.75 14.40% 34.20% 47.50% 32.20% 23.20% 3.1 0.215
Yasmani Grandal Dodgers 28 438 6.63 8.30% 43.50% 40.00% 36.50% 17.60% 1.1 0.212
Brian Dozier Twins 30 617 6.60 11.10% 38.40% 42.60% 34.10% 15.90% 5.2 0.227
Adam Duvall Reds 28 587 6.55 6.00% 33.20% 48.60% 31.80% 17.50% 3.9 0.232
Hunter Renfroe Padres 25 445 6.52 5.60% 37.90% 45.40% 34.60% 23.50% 3.2 0.236
Justin Bour Marlins 29 377 6.40 11.00% 43.40% 33.60% 38.80% 19.60% 1.6 0.247
Carlos Correa Astros 22 422 6.33 11.00% 47.90% 31.70% 39.50% 15.00% 3.2 0.235
Marcell Ozuna Marlins 26 613 6.09 9.40% 47.10% 33.50% 39.10% 18.30% 2.3 0.237
Domingo Santana Brewers 24 525 5.85 12.00% 44.90% 27.70% 39.70% 11.70% 4 0.227
Kris Bryant Cubs 25 549 5.83 14.30% 37.70% 42.40% 32.80% 14.80% 4.4 0.242
Gary Sanchez Yankees 24 471 5.47 7.60% 42.30% 36.60% 36.90% 18.60% 2.6 0.253
Asdrubal Cabrera Mets 31 479 5.46 9.30% 43.50% 36.20% 36.80% 17.20% 2.5 0.154
Austin Hedges Padres 24 387 5.37 5.50% 36.60% 45.70% 33.10% 22.30% 2.7 0.183
Logan Forsythe Dodgers 30 361 5.33 15.70% 44.00% 33.10% 36.60% 13.20% 2.8 0.102
Yadier Molina Cardinals 34 501 5.25 5.20% 42.20% 37.40% 36.40% 16.50% 3.9 0.166
Bryce Harper Nationals 24 420 5.07 13.80% 40.40% 37.60% 34.30% 13.30% 3.7 0.276
Neil Walker – – – 31 385 5.01 12.30% 36.20% 41.70% 32.80% 17.70% 2.8 0.174
Aaron Altherr Phillies 26 372 5.01 7.80% 43.10% 37.50% 36.40% 20.10% 5.5 0.245
Andrew McCutchen Pirates 30 570 4.90 11.20% 40.70% 37.40% 35.20% 17.50% 4.3 0.207
Eduardo Escobar Twins 28 457 4.86 6.60% 33.70% 45.30% 31.40% 16.00% 5.1 0.195
Anthony Rizzo Cubs 27 572 4.79 13.20% 40.70% 39.20% 34.40% 19.80% 4.4 0.234
Ryan Braun Brewers 33 380 4.73 8.90% 49.20% 31.90% 39.00% 19.20% 5.3 0.218
Kendrys Morales Blue Jays 34 557 4.56 7.10% 48.40% 33.20% 37.90% 15.20% 1.1 0.196
Jose Ramirez Indians 24 585 4.54 8.10% 38.90% 39.70% 34.00% 16.70% 6 0.265
Mike Moustakas Royals 28 555 4.51 5.70% 34.80% 45.70% 31.90% 21.20% 1.1 0.249
Andrew Benintendi Red Sox 22 573 4.50 10.60% 40.10% 38.40% 34.30% 16.60% 4.5 0.154
Jose Bautista Blue Jays 36 587 4.47 12.20% 37.70% 45.80% 31.40% 21.70% 3.4 0.164
Jason Castro Twins 30 356 4.36 11.10% 41.90% 33.50% 36.00% 14.00% 1.5 0.146
Albert Pujols Angels 37 593 4.12 5.80% 43.50% 38.10% 35.10% 15.90% 2.1 0.145
Hanley Ramirez Red Sox 33 496 4.04 9.20% 41.80% 37.10% 35.30% 20.00% 1.5 0.188
Tommy Joseph Phillies 25 495 3.99 6.20% 41.70% 39.00% 35.00% 20.90% 2.2 0.192
Tim Beckham – – – 27 533 3.99 6.30% 48.80% 29.50% 39.10% 15.50% 4.4 0.176
Jonathan Schoop Orioles 25 622 3.90 5.20% 41.90% 37.20% 36.10% 23.00% 2.2 0.211
George Springer Astros 27 548 3.58 10.20% 48.30% 33.80% 36.70% 17.90% 3.1 0.239
Carlos Beltran Astros 40 467 3.54 6.50% 43.10% 40.40% 33.70% 17.50% 1.8 0.152
Alex Bregman Astros 23 556 3.52 8.80% 38.40% 39.90% 33.00% 18.00% 5.9 0.191
Carlos Santana Indians 31 571 3.49 13.20% 40.80% 39.30% 33.00% 18.40% 4 0.196
Eugenio Suarez Reds 25 534 3.33 13.30% 38.90% 37.10% 33.80% 20.70% 3.1 0.200
Scooter Gennett Reds 27 461 3.29 6.00% 41.30% 37.60% 34.40% 17.20% 4.3 0.236
Mark Reynolds Rockies 33 520 3.26 11.60% 42.10% 36.30% 34.50% 19.00% 2.7 0.219
Josh Reddick Astros 30 477 3.23 8.00% 33.60% 42.30% 31.10% 17.20% 4.8 0.170
Mitch Haniger Mariners 26 369 2.97 7.60% 44.00% 36.70% 34.70% 17.70% 4.3 0.209
Ian Happ Cubs 22 364 2.92 9.40% 40.20% 39.70% 32.80% 18.70% 5.7 0.261
Josh Harrison Pirates 29 486 2.90 5.20% 36.50% 40.80% 32.40% 18.70% 4.9 0.160
Keon Broxton Brewers 27 414 2.78 8.60% 45.10% 34.60% 35.30% 17.00% 7.4 0.200
Matt Joyce Athletics 32 469 2.69 12.10% 37.80% 42.80% 30.30% 16.30% 3.2 0.230
Derek Dietrich Marlins 27 406 2.65 7.80% 36.50% 40.70% 32.10% 20.50% 3.9 0.175
Ryon Healy Athletics 25 576 2.56 3.80% 42.80% 38.20% 33.90% 16.50% 1.4 0.181
Evan Longoria Rays 31 613 2.50 6.80% 43.40% 36.80% 34.30% 18.00% 3.8 0.163
Zack Cozart Reds 31 438 2.49 12.20% 38.20% 42.30% 30.80% 19.50% 5.3 0.251
Robinson Cano Mariners 34 592 2.48 7.60% 50.00% 30.60% 36.90% 12.80% 2 0.172
Max Kepler Twins 24 511 2.39 8.30% 42.80% 39.50% 32.90% 18.70% 4.2 0.182
Steven Souza Jr. Rays 28 523 2.22 13.60% 44.60% 34.30% 34.10% 16.50% 4.8 0.220
Michael Taylor Nationals 26 399 2.17 6.70% 42.90% 36.70% 34.00% 18.10% 5.9 0.216
Yulieski Gurriel Astros 33 529 2.12 3.90% 46.20% 35.20% 35.10% 15.90% 2.8 0.187
Corey Dickerson Rays 28 588 1.24 5.60% 41.80% 35.80% 33.60% 18.70% 4 0.207
Whit Merrifield Royals 28 587 1.01 4.60% 37.70% 40.50% 30.60% 15.40% 6.7 0.172
Chris Taylor Dodgers 26 514 0.88 8.80% 41.50% 35.80% 32.40% 15.80% 6.4 0.208
A.J. Pollock Diamondbacks 29 425 0.81 7.50% 44.60% 32.10% 35.00% 19.80% 7.5 0.205
Marwin Gonzalez Astros 28 455 0.71 9.50% 43.90% 36.20% 32.70% 18.60% 3.2 0.226
Yangervis Solarte Padres 29 466 0.62 7.20% 41.60% 42.10% 31.10% 25.20% 2.4 0.161
Shin-Soo Choo Rangers 34 544 0.57 12.10% 48.80% 26.20% 36.10% 12.20% 4.7 0.162
Buster Posey Giants 30 494 0.50 10.70% 43.60% 33.00% 33.00% 14.10% 2.8 0.142
Jedd Gyorko Cardinals 28 426 0.48 9.80% 40.50% 39.30% 30.80% 19.20% 3.8 0.200
Yasiel Puig Dodgers 26 499 0.30 11.20% 48.30% 35.60% 32.90% 18.30% 4.4 0.224
Eddie Rosario Twins 25 542 0.12 5.90% 42.40% 37.40% 31.70% 16.70% 3.9 0.218
J.T. Realmuto Marlins 26 532 -0.01 6.20% 47.80% 34.30% 33.30% 14.90% 5 0.173
Jorge Bonifacio Royals 24 384 -0.20 8.30% 39.30% 34.80% 32.20% 20.20% 2.9 0.177
Gerardo Parra Rockies 30 392 -0.27 4.70% 46.80% 30.30% 34.70% 14.40% 3 0.143
Willson Contreras Cubs 25 377 -0.34 10.50% 53.30% 29.30% 35.50% 17.00% 2.4 0.223
Kole Calhoun Angels 29 569 -0.37 10.90% 43.90% 35.00% 31.80% 17.00% 3.7 0.148
Robbie Grossman Twins 27 382 -0.43 14.70% 40.70% 34.40% 30.90% 16.00% 3.5 0.134
Matt Holliday Yankees 37 373 -0.46 10.80% 47.70% 37.50% 31.80% 21.20% 2.1 0.201
Mark Trumbo Orioles 31 559 -0.47 7.00% 43.30% 40.60% 30.40% 20.90% 2.5 0.163
Stephen Piscotty Cardinals 26 341 -0.80 13.00% 49.20% 33.20% 32.70% 17.90% 2.7 0.132
Tommy Pham Cardinals 29 444 -0.86 13.40% 51.70% 26.10% 35.50% 15.40% 6 0.214
Joe Mauer Twins 34 525 -0.92 11.10% 51.50% 23.60% 36.40% 12.80% 2.4 0.112
Jackie Bradley Jr. Red Sox 27 482 -0.94 8.90% 49.00% 32.60% 33.30% 17.50% 4.5 0.158
Brandon Crawford Giants 30 518 -0.98 7.40% 46.20% 34.40% 32.60% 19.30% 2.5 0.151
Nomar Mazara Rangers 22 554 -1.13 8.90% 46.50% 34.20% 32.60% 20.90% 2.6 0.170
Ben Zobrist Cubs 36 435 -1.35 10.90% 51.10% 33.30% 32.30% 14.90% 3.6 0.143
Javier Baez Cubs 24 469 -1.36 5.90% 48.60% 36.00% 32.40% 21.30% 5.3 0.207
Jorge Polanco Twins 23 488 -1.42 7.50% 37.90% 42.80% 27.70% 19.90% 4.9 0.154
Avisail Garcia White Sox 26 518 -1.70 5.90% 52.20% 27.50% 35.30% 15.70% 4.3 0.176
Matt Kemp Braves 32 438 -1.76 5.80% 48.50% 28.20% 34.70% 17.40% 1.7 0.187
Maikel Franco Phillies 24 575 -2.04 6.60% 45.40% 36.70% 30.90% 20.80% 1.5 0.179
Nick Markakis Braves 33 593 -2.17 10.10% 48.60% 29.20% 33.10% 15.60% 1.9 0.110
Tucker Barnhart Reds 26 370 -2.46 9.90% 46.00% 27.80% 33.20% 16.50% 3.4 0.132
Trey Mancini Orioles 25 543 -2.48 5.60% 51.00% 29.70% 34.10% 19.60% 3.2 0.195
Christian Yelich Marlins 25 602 -2.51 11.50% 55.40% 25.20% 35.20% 15.90% 5.2 0.156
Lorenzo Cain Royals 31 584 -2.79 8.40% 44.40% 32.90% 31.10% 18.70% 6.5 0.140
Josh Bell Pirates 24 549 -2.87 10.60% 51.10% 31.20% 32.60% 20.60% 3.5 0.211
Jose Reyes Mets 34 501 -3.00 8.90% 37.20% 43.10% 26.70% 26.10% 7.2 0.168
Carlos Gonzalez Rockies 31 470 -3.04 10.50% 48.60% 31.70% 31.90% 20.50% 3.2 0.162
Adam Jones Orioles 31 597 -3.27 4.30% 44.80% 34.30% 30.90% 20.10% 2.7 0.181
Byron Buxton Twins 23 462 -3.57 7.40% 38.70% 38.00% 27.60% 18.20% 8.2 0.160
Kevin Kiermaier Rays 27 380 -3.81 7.40% 49.60% 32.10% 31.80% 22.00% 5.9 0.174
Chase Headley Yankees 33 512 -3.90 10.20% 43.50% 31.70% 30.00% 17.10% 4.3 0.133
Xander Bogaerts Red Sox 24 571 -4.31 8.80% 48.90% 30.50% 31.40% 19.70% 6.7 0.130
Jordy Mercer Pirates 30 502 -4.33 9.10% 48.30% 30.90% 31.00% 19.00% 2.9 0.151
Brandon Drury Diamondbacks 24 445 -4.44 5.80% 48.80% 29.40% 31.70% 16.60% 2.4 0.180
Alex Gordon Royals 33 476 -4.69 8.30% 42.60% 33.00% 29.20% 19.40% 4.3 0.107
Ben Gamel Mariners 25 509 -4.84 6.50% 44.90% 33.30% 29.40% 18.70% 4.9 0.138
Hernan Perez Brewers 26 432 -4.85 4.40% 48.30% 33.50% 30.40% 21.20% 5.3 0.155
Matt Wieters Nationals 31 422 -4.94 8.20% 42.50% 36.40% 27.40% 18.10% 2 0.118
Brett Gardner Yankees 33 594 -5.07 10.60% 44.50% 33.20% 28.80% 20.00% 6 0.163
Odubel Herrera Phillies 25 526 -5.10 5.50% 44.10% 34.70% 29.40% 24.40% 4.3 0.171
Freddy Galvis Phillies 27 608 -5.11 6.80% 36.70% 39.20% 25.50% 18.10% 5.3 0.127
Elvis Andrus Rangers 28 643 -5.13 5.50% 48.50% 31.50% 30.50% 18.70% 5.7 0.174
Danny Valencia Mariners 32 450 -5.93 8.00% 47.90% 31.00% 29.80% 20.50% 3.3 0.156
Kevin Pillar Blue Jays 28 587 -6.25 5.20% 43.10% 36.40% 27.30% 22.50% 4.4 0.148
Dansby Swanson Braves 23 488 -6.35 10.70% 47.40% 29.40% 29.30% 18.00% 3.2 0.092
Jose Altuve Astros 27 590 -6.45 8.80% 47.00% 32.70% 28.20% 19.00% 6.4 0.202
Alcides Escobar Royals 30 599 -6.47 2.40% 40.80% 37.40% 26.80% 22.80% 4.3 0.107
Andrelton Simmons Angels 27 589 -6.62 7.30% 49.50% 31.50% 29.30% 20.60% 5 0.143
Didi Gregorius Yankees 27 534 -6.91 4.40% 36.20% 43.80% 23.10% 24.40% 2.7 0.191
Ryan Goins Blue Jays 29 418 -6.94 6.80% 50.30% 34.80% 27.70% 19.60% 2.7 0.120
Gregory Polanco Pirates 25 379 -7.00 6.60% 42.20% 37.50% 25.90% 22.80% 3.7 0.140
David Peralta Diamondbacks 29 525 -7.02 7.50% 55.10% 26.50% 31.80% 21.20% 4.6 0.150
Kolten Wong Cardinals 26 354 -7.11 10.00% 48.10% 31.80% 28.20% 20.80% 5.4 0.127
Orlando Arcia Brewers 22 506 -7.74 6.60% 51.60% 28.50% 30.20% 22.90% 4.1 0.130
Martin Maldonado Angels 30 429 -7.80 3.20% 48.50% 36.60% 26.70% 21.60% 2.3 0.147
Cory Spangenberg Padres 26 444 -7.85 7.00% 49.30% 27.80% 29.20% 16.90% 5 0.137
Joe Panik Giants 26 511 -7.96 8.00% 44.00% 34.10% 26.10% 20.10% 4.2 0.133
David Freese Pirates 34 426 -8.08 11.50% 57.00% 22.60% 31.90% 19.40% 1 0.108
Melky Cabrera – – – 32 620 -8.14 5.40% 48.90% 29.00% 28.90% 19.00% 2.3 0.137
Hunter Pence Giants 34 493 -8.28 7.40% 57.20% 29.40% 29.40% 18.50% 3.6 0.126
Manuel Margot Padres 22 487 -8.30 6.60% 40.50% 36.30% 25.40% 25.90% 6.1 0.146
Trea Turner Nationals 24 412 -8.61 6.70% 51.70% 33.50% 26.70% 18.00% 8.9 0.167
Jonathan Villar Brewers 26 403 -8.85 6.90% 57.40% 21.90% 33.20% 27.00% 5.4 0.132
Starlin Castro Yankees 27 443 -9.19 4.90% 51.80% 28.00% 29.20% 21.80% 3.5 0.153
Denard Span Giants 33 497 -9.30 7.40% 45.00% 33.60% 25.10% 18.60% 5.5 0.155
Jacoby Ellsbury Yankees 33 356 -9.73 10.00% 45.90% 31.00% 26.10% 22.70% 7.7 0.138
Delino DeShields Rangers 24 376 -9.93 10.00% 45.10% 34.80% 23.90% 20.10% 7.1 0.098
Adam Frazier Pirates 25 406 -9.98 7.90% 47.90% 26.80% 27.50% 17.90% 5.7 0.123
DJ LeMahieu Rockies 28 609 -10.42 8.70% 55.60% 19.70% 30.60% 15.40% 3.9 0.099
Yolmer Sanchez White Sox 25 484 -10.53 6.60% 44.50% 33.90% 24.00% 19.30% 5.3 0.147
Jason Heyward Cubs 27 432 -10.54 8.50% 47.40% 32.70% 25.50% 25.80% 4.3 0.130
Tim Anderson White Sox 24 587 -10.66 2.10% 52.70% 28.00% 28.30% 21.30% 6.2 0.145
Jean Segura Mariners 27 524 -10.79 6.00% 54.30% 26.40% 28.30% 19.70% 5.5 0.128
Cameron Maybin – – – 30 395 -10.88 11.30% 57.70% 27.90% 27.40% 20.10% 6.9 0.137
Dustin Pedroia Red Sox 33 406 -10.90 10.60% 48.80% 28.80% 25.90% 20.10% 2.2 0.099
Jose Iglesias Tigers 27 463 -10.91 4.30% 50.40% 26.40% 28.40% 23.40% 4.2 0.114
Eric Hosmer Royals 27 603 -11.30 9.80% 55.60% 22.20% 29.50% 21.80% 3.4 0.179
Eduardo Nunez – – – 30 467 -12.27 3.70% 53.40% 29.10% 26.70% 24.50% 4.8 0.148
Jon Jay Cubs 32 379 -12.53 8.50% 47.10% 23.90% 25.30% 11.50% 5.3 0.079
Brandon Phillips – – – 36 572 -12.97 3.50% 49.50% 28.30% 25.50% 21.70% 4.1 0.131
Guillermo Heredia Mariners 26 386 -15.19 6.30% 47.40% 34.90% 20.40% 23.80% 2.2 0.088
Ender Inciarte Braves 26 662 -15.36 6.80% 47.00% 29.10% 22.10% 20.90% 5.4 0.106
Jonathan Lucroy – – – 31 423 -16.18 9.60% 53.50% 27.90% 22.30% 20.50% 3.1 0.106
Jose Peraza Reds 23 487 -16.45 3.90% 47.10% 31.30% 21.40% 26.60% 5.8 0.066
Cesar Hernandez Phillies 27 511 -18.08 10.60% 52.80% 24.60% 22.10% 23.50% 6 0.127
Billy Hamilton Reds 26 582 -21.80 7.00% 45.80% 30.60% 16.00% 25.00% 9 0.088
Dee Gordon Marlins 29 653 -28.88 3.60% 57.60% 19.60% 16.10% 24.70% 8.5 0.067

Okay, so here’s the breakdown. I pulled all 2017 hitters with 400 at-bats or more so I could capture some significant hitters that didn’t have qualifying numbers of ABs due to injury. Ball-bludgeon extraordinaire Joey Gallo is a pretty solid name to have heading up this list, as he’s pretty much the human definition of what this tool is trying to identify. JD Martinez, Aaron Judge, Cody Bellinger, Miguel Sano, Trevor Story, and Justin Turner all in the top 10 is pretty much all the proof-of-concept I needed.

Interesting notes:

Brandon Belt at 12 — Someone needs to tell the Giants to trade him to literally any other team, stat.

Giancarlo Stanton at 46 — Surprisingly, the MVP fell off from his stats in 2016. His grounders and soft contact rose by 3 or more percentage points, and shaved off the equivalent from hard and fly balls. His output was fueled by adding almost 200 ABs to his season — he could actually get better if he can stay healthy and add those hard flies back in!

Francisco Lindor at 58 — The interesting part of this is even though Lindor is still a decent way down the list, he actually was the biggest gainer from last season to this, adding 9.52 points to his cHit. We knew he was gunning for flies from the outset of the season, and it looks like his mission was accomplished.

Mike Moustakas at 87 — Frankly, being bookended by Jose Ramirez and Andrew Benintendi should, in a vacuum, should be great company. But this is a prime example of how cHit requires users to not take the numbers at face value. Ramirez and Benintendi aren’t slug-first hitters like Moose. They’ve got significantly better Speed scores, plus aren’t as prone to soft contact. I’d be very wary of Moose regressing, as he seems to rely on sneaking some less-than ideal homers over fences. If he goes to San Francisco I could see his value crater (see Belt, Brandon).

Eric Hosmer at 206 — Nope, negative, pass, I’m trying to sign quality hitters here <— Suggested responses for GMs when approached this offseason by Scott Boras on behalf of Hosmer.

Final Notes:

  •  Batted-ball distribution data is noticeably absent. In one of my iterations I added in those stats, and found that they actually regressed the accuracy of the formula. It doesn’t matter where you hit the ball, as long as you hit it hard.
  • Medium% and LD% are noisy stats. They also regressed the formula.
  • I may look to replace BB% in future iterations. For now though, it does a decent job of capturing plate discipline and selectivity.
  • K% doesn’t seem to have much of an impact on cHit (see Gallo, Joey).
  • R-squared numbers over the last four years of data hold pretty steady between .65 and .75, which is really encouraging. Also, the bigger the pool of data per year (number of batters analyzed), the higher R-squared goes; which is ultimately the most encouraging result of this whole endeavor.

Input is greatly appreciated! I’m not a mathematician in any stretch of the imagination, so if there’s a better way of going about this I’d love to hear it. I’ll do a writeup about my swing-change findings at a later date.


Looking for Evidence of a Change to the Ball

We saw an unprecedented jump in home runs in the last few years. What made it so strange was that most of it happened after the 2015 All-Star break. There is an increased awareness of launch angle and bat path, and 2015 was the first year there was a public in-game feedback, but still you would expect such an adjustment to take longer, especially since in-season swing changes are really hard to do — maybe with a whole offseason to work on it, it might have been slightly more believable.

There have been multi-factor explanations like a great rookie class of power hitters in the second half of 2015, changed approach, and other stuff like a slightly smaller zone, but really you would not expect such a multi-factor cause to happen that quickly and distinctly between two season halves. That made most sabermetric writers, including most of the FanGraphs staff, believe in a single-factor cause, most likely the ball.

There is some evidence for a changed ball, and there is also anecdotal evidence of minor-league players called up claiming the MLB ball flies farther. However, MLB so far has rejected that, and supported that with the credible name of professor Alan Nathan, albeit without really publishing the data, which further increased the suspicion.

We also did see an increase in launch angle: In 2015 in the first half, the LA of the league was 9.6, and in the second half it was 10.3, which further slightly increased in the first half of 2016 (10.4) and 2017 (10.8). The biggest jump, however, occurred between the season halves of 2015. So were the players really able to increase their LA with a single focus cue without really having much time to work on swing mechanics by just aiming higher after getting the first-half feedback? Those are the most talented athletes in the world, but still that sounds incredible.

But of course just increased elevation doesn’t explain the surge. The number of balls hit between 20 and 35 degrees (usual HR range) increased from roughly 8200 in the first half of 2015 to roughly 8600 in the first half of 2016, but the number of HRs increased from 2521 to 3082. Since less than half of the FBs between 20 and 35 go out of the park (I don’t have the exact number but I estimate 30% from the numbers I have), the 600 more batted balls in that range don’t explain 500 more HRs. That means, apart from more FBs, those also got out more, and the league saw a jump in HR/FB rate (9.5% in 2014 and 12.8 in 2016).

To research that, I looked into some Statcast stats. All stats here are just first halves of the respective seasons, because the first half of 2015 was the last “normal” HR half. Also I want to lessen weather effects.

This table shows that balls between 20 and 35 degrees do indeed fly farther and also go faster off the bat.
Average distance (20-35 LA)

2015 326 89.9
2016 331 91.6
2017 332 91.3

So does this jump in HR/FB prove a juiced ball? Not necessarily. To explain this, we have to get into swing mechanics. The attack angle is the vector of the bat’s sweetspot just before contact. Generally you can hit higher LAs (launch angles) by just hitting the bottom of the ball, but while some backspin is good, too much of it will slow down the ball. Generally the more LA and attack angle match, the higher the exit velo. That means players that try to swing up more might shift their highest velos to higher LAs. So while players couldn’t really change their swings that fast, just the intent of higher LA might have unconsciously caused a higher attack angle and thus more “flush hit” fly balls.

Evidence for the ball not being a factor is that average league EV is actually down a tiny bit. However, if the attack-angle theory is true, you would also expect that the EV of balls between 0 and 10 degrees would lower a little bit, and that hasn’t really happened.

Avg EV EV (0-10 LA)
2015 87.1 93.3
2016 87.8 93.3
2017 86.9 93.1

Another theory came from Tom Tango. He assumed that harder swinging and increased attack angles lead to higher peak EVs but also more weak mis-hits.

We do indeed see a big increase of balls hit above 105 MPH, but on the other side (and there have to be weaker hits to explain that overall EV is not up) there is an effect of more weak-hit balls in 2017, but not so in 2016.

EV >85 Balls 105
2015 96.2 19210 2960
2016 96.9 19075 3917
2017 96.7 20436 3635

To see if there is an aerodynamic effect — one theory of the juiced ball is reduced air drag due to lower seams — I looked at the average distance of balls hit at 20-25 degree LA in different velocity buckets.

EV Range 95-100 100-105 105-110
2015 366 391 415
2016 362 387 408
2017 363 391 411

You can’t really see an effect here. Balls hit at the same EV (which is measured right after exit so that air drag hasn’t done its work yet) don’t fly farther in 2016 or 2017 than they did in the first half of 2015. That means there likely isn’t really an effect of aerodynamics, at least not a big one.

So the reason for increased HRs seems to be mostly that fly balls fly faster and farther for whatever reason. We don’t see an across-the-board increase of EV, however, but simple explanations like a shift of max EVs to other launch angles don’t seem to really work either, as LAs from 0-10 (and also lower than minus 5 for that matter) haven’t really changed in their EV.

It remains mysterious what did actually happen. We do know LAs have increased some, but that doesn’t explain the whole story. But I couldn’t find real evidence for a changed ball in Statcast either. Could a super fast on-the-fly adjustment of the league between season halves based on the Statcast date really be the driving factor here?

Intellectually I really want to believe the juiced-ball theory, as it is the most elegant explanation for such a quick turnaround, but maybe it isn’t that easy.


An Exercise in Generating Similarity Scores

In the process of writing an article, one of the more frustrating things to do is generate comparisons to a given player. Whether I’m trying to figure out who most closely aligns with Rougned Odor or Miguel Sano, it’s a time-consuming and inexact process to find good comparisons. So I tried to simplify the process and make it more exact — using similarity scores.

An Introduction to Similarity Scores

The concept of a similarity score was first introduced by Bill James in his book The Politics of Glory (later republished as Whatever Happened to the Hall of Fame?) as a way of comparing players who were not in the Hall of Fame to those who were, to determine which non-HOFers deserved a spot in Cooperstown. For example, since Phil Rizzuto’s most similar players per James’ metric are not in the HOF, Rizzuto’s case for enshrinement is questionable.

James’ similarity scores work as such: given one player, to compare them to another player, start at 1000 and subtract one point for every difference of 20 games played between the two players. Then, subtract one point for every difference of 75 at-bats. Subtract a point for every difference of 10 runs scored…and so on.

James’ methodology is flawed and inexact, and he’s aware of it: “Similarity scores are a method of asking, imperfectly but at least objectively, whether two players are truly similar, or whether the distance between them is considerable” (WHHF, Chapter 7). But it doesn’t have to be perfect and exact. James is simply looking to find which players are most alike and compare their other numbers, not their similarity scores.

Yes, there are other similarity-score metrics that have built upon James’ methodology, ones that turn those similarities into projections: PECOTA, ZiPS, and KUBIAK come to mind. I’m not interested in making a clone of those because these metrics are obsessed with the accuracy of their score and spitting out a useful number. I’m more interested in the spirit of James’ metric: it doesn’t care for accuracy, only for finding similarities.

Approaching the Similarity Problem

There is a very distinct difference between what James wants to do and I what I want to do, however. James is interested in result-based metrics like hits, doubles, singles, etc. I’m more interested in finding player similarities based on peripherals, specifically a batted-ball profile. Thus, I need to develop some methodology for finding players with similar batted-ball profiles.

In determining a player’s batted-ball profile, I’m going to use three measures of batted-ball frequencies — launch angle, spay angle, and quality of contact. For launch angle, I will use GB%/LD%/FB%; for spray angle, I will use Pull%/Cent%/Oppo%; and for quality of contact, I will use Soft%, Med%, Hard%, and HR/FB (more on why I’m using HR/FB later).

In addition to the batted-ball profiles, I can get a complete picture of a player’s offensive profile by looking at their BB% and K%. To do this, I will create two separate similarity scores — one that measures similarity based solely upon batted balls, and another based upon batted balls and K% and BB%. All of our measures for these tendencies will come from FanGraphs.

Essentially, I want to find which player is closest to which overall in terms of ALL of the metrics that I’m using. The term “closest” is usually used to convey position, and it serves us well in describing what I want to do.

Gettin’ Geometrical

In order to find the most similar player, I’m going to treat every metric (GB%, LD%, FB%, Pull%, and so on) as an axis in a positioning system. Each player has a unique “position” along that axis based on their number in that corresponding metric. Then, I want to find the player nearest to a given player’s position within our coordinates system — that player will be the most similar to our given player.

I can visualize this up to the third dimension. Imagine that I want to find how similar Dee Gordon and Daniel Murphy are in terms of batted balls. I could first plot their LD% values and find the differences.

1-D visualization of Daniel Murphy's and Dee Gordon's batted ball profiles

So the distance between Murphy and Gordon, based on this, is 4.8%. Next, I could introduce the second axis into our geometry, GB%.

2-D visualization of Daniel Murphy's and Dee Gordon's batted ball profiles

The distance between the two players is given by the Pythagorean formula for distance — sqrt(ΔX^2 + ΔY^2), where X is LD% and Y is GB%. To take this visualization to a third dimension and incorporate FB%…

3-d visualization of Daniel Murphy's and Dee Gordon's batted ball profiles

… I would add another term to the distance calculation — sqrt(ΔX^2 + ΔY^2 + ΔZ^2). And so on, for each subsequent term. You’ll just have to use your imagination to plot the next 14 data points because Euclidian geometry can’t handle dimensions greater than three without some really weird projections, but essentially, once I find the distance between those two points in our 10 or 12-dimensional coordinate system, I have an idea how similar they are. Then, if I want to find the most similar batter to Daniel Murphy, I would find the distance between him and every other player in a given sample, and find the smallest distance between him and another player.

If you’ve taken a computer science course before, this problem might sound awfully familiar to you — it’s a nearest-neighbor search problem. The NNS problem is about finding the best way to determine the closest neighbor point to a given point in some space, given a set of points and their position in that space. The “naive” solution, or the brute-force solution, would be to find the distance between our player and every other player in our dataset, then sort the distances. However, there exists a more optimized solution to the NNS problem, called a k-d tree, which progressively splits our n-dimensional space into smaller and smaller subspaces and then finds the nearest neighbor. I’ll use the k-d tree approach to tackling this.

Why It’s Important to Normalize

I used raw data values above in an example calculation of the distance between two players. However, I would like to issue caution against using those raw values because of the scale that some of these numbers fall upon.

Consider that in 2017, the difference between the largest LD% and smallest LD% among qualified hitters was only 14.2%. For GB%, however, that figure was 30.7%! Clearly, there is a greater spread with GB% than there is with LD% — and a difference in GB% of 1% is much less significant than a difference in LD% of 1%. But in using the raw values, I weight that 1% difference the same, so LD% is not treated as being of equal importance to GB%.

To resolve this issue, I need to “normalize” the values. To normalize a series of values is to place differing sets of data all on the same scale. LD% and GB% will now have roughly the same range, but each will retain their distribution and the individual LD% and GB% scores, relative to each other, will remain unchanged.

Now, here’s the really big assumption that I’m going to make. After normalizing the values, I won’t scale any particular metric further. Why? Because personally, I don’t believe that in determining similarity, a player’s LD% is any more important than the other metrics I’m measuring. This is my personal assumption, and it may not be true — there’s not really a way to tell otherwise. If I believed LD% was really important, I might apply some scaling factor and weigh it differently than the rest of the values, but I won’t, simply out of personal preference.

Putting it All Together

I’ve identified what needs to happen, now it’s just a matter of making it happen.

So, go ahead, get to work. I expect this on my desk by Monday. Snap to it!

Oh, you’re still here.

If you want to compare answers, I went ahead and wrote up an R package containing the function that performs this search (as well as a few other dog tricks). I can do this in two ways, either using solely batted-ball data or using batted-ball data with K% and BB%. For the rest of this section, I’ll use the second method.

Taking FanGraphs batted-ball data and the name of the target player, the function returns a number of players with similar batted-ball profiles, as well as a score for how similar they are to that player.

For similarity scores, use the following rule of thumb:

0-1 -> The same player having similar seasons.

1-2 -> Players that are very much alike.

2-3 -> Players who are similar in profile.

3-4 -> Players sharing some qualities, but are distinct.

4+ -> Distinct players with distinct offensive profiles.

Note that because of normalization, similarity scores can vary based on the dataset used. Similarity scores shouldn’t be used as strict numbers — their only use should be to rank players based on how similar they are to each other.

To show the tool in action, let’s get someone at random, generate similarity scores for them, and provide their comparisons.

Here’s the offensive data for Elvis Andrus in 2017, his five neighbors in 12-dimensional space (all from 2017), and their similarity scores.

Elvis Andrus Most Similar Batters (2017)

The lower the similarity score, the better, and the guy with the lowest similarity score, J.T. Realmuto, is almost a dead ringer for Andrus in terms of batted-ball data. Mercer, Gurriel, Pujols, and Cabrera aren’t too far off as well.

After extensively testing it, the tool seems to work really well in finding batters with similar profiles — Yonder Alonso is very similar to Justin Smoak, Alex Bregman is similar to Andrew McCutchen, Evan Longoria is similar to Xander Bogaerts, etc.

Keep in mind, however, that not every batter has a good comparison waiting in the wings. Consider poor, lonely Aaron Judge, whose nearest neighbor is the second furthest away of any other player in baseball in 2017 — Chris Davis is closest to him with a similarity score of 3.773. Only DJ LeMahieu had a further nearest-neighbor (similarity score of 3.921!).

The HR/FB Dilemma

While I’m on the subject of Aaron Judge, let’s talk really quickly about HR/FB and why it’s included in the function.

When I first implemented my search function, I designed it to only include batted-ball data and not BB%, K%, and HR/FB. I ran it on a couple players to eye-test it and make sure that it made sense. But when I ran it on Aaron Judge, something stuck out like a sore thumb.

Aaron Judge Similarity Scores

Players 2-5 I could easily see as reasonable comparisons to Judge’s batted balls. But Nick Castellanos? Nick Castellanos? The perpetual sleeper pick?

But there he was, and his batted balls were eerily similar to Judge’s.

Aaron Judge Most Similar Batters (2017)

Judge hits a few more fly balls, Castellanos hits a few more liners, but aside from that, they’re practically twins!

Except that there’s not. Here’s that same chart with HR/FB thrown in.

Aaron Judge Most Similar Batters (2017)

There’s one big difference between Judge and Castellanos, aside from their plate discipline — exit velocity. Judge averages 100+ MPH EV on fly balls and line drives, the highest in the majors. Castellanos posted a meek 93.2 MPH AEV on fly balls and line drives, and that’s with a juiced radar gun in Comerica Park. Indeed, after incorporating HR/FB into the equation, Castellanos drops to the 14th-most similar player to Judge.

HR/FB is partially considered a stat that measures luck, and sure, Judge was getting lucky with some of his home runs, especially with Yankee Stadium’s homer-friendly dimensions. But luck can only carry you so far along the road to 50+ HR, and Judge was making great contact the whole season through, and his HR/FB is representative of that.

In that vein, I feel that it is necessary to include a stat that has a significant randomness component, which is very much in contrast with the rest of the metrics used in making this tool, but it is still a necessary inclusion nevertheless for the skill-based component of that stat.

Using this Tool

If you want to use this tool, you are more than welcome to do so! The code for this tool can be found on GitHub here, along with instructions on how to download it and use it in R. I’m going to mess around with it and keep developing it and hopefully do some cool things with it, so watch this space…

Although I’ve done some bug testing (thanks, Matt!), this code is still far from perfect. I’ve done, like, zero error-catching with it. If in using it, you encounter any issues, please @ me on twitter (@John_Edwards_) and let me know so I can fix them ASAP. Feel free to @ me with any suggestions, improvements, or features as well. Otherwise, use it responsibly!


Hack Wilson: The Most Interesting Player You’ve Sorta-Kinda Heard of Before

Lewis Robert “Hack” Wilson was an outfielder for the New York Giants, Chicago Cubs, Brooklyn Dodgers, and Philadelphia Phillies in the early 20th century. Wilson was a very good ballplayer, and was enshrined in Cooperstown in 1979.

As my title suggests, you have probably heard the name Hack Wilson before, but I’m guessing you probably don’t know much about him, because his most popular claim to fame is considered by many to be irrelevant today. This claim to fame is his record-setting 191 RBI in 1930. This remains the single-season record for the stat to this day, and it’s hard to believe that anyone will come along who can break it. In that 1930 campaign, Hack also slugged 56 home runs, walked 105 times, struck out 84 times, and slashed .356/.454/.723 with a 1.177 OPS and a 177 OPS+. These were all league highs, excluding average and OBP.

That’s a great season, but it gets a whole lot more interesting when you look a little closer. 56 home runs is a lot. That mark is tied with Ken Griffey Jr.’s pair of 56-home-run campaigns for 17th-most all-time in a single season, and was the best non-Ruth mark at the time (although this would last just two years, when Jimmie Foxx hit 58 home runs in 1932).

Just hitting home runs isn’t what makes Hack Wilson so interesting to me, though. It’s who he was. Hack Wilson stood at just 5’6. The same height as our favorite short player today, Jose Altuve. In fact, at 5’6, Altuve and Hack are both the shortest players to ever hit 20 or more home runs in a single season. Hack alone is the shortest player to ever slug 30, 40, or 50 in a single season. Hack also holds the single-season home-run record for anyone under 6’0. Hack, Mantle (5’11), Mays (5’10), and Prince Fielder (5’11) are the only men to hit 50 or more home runs while being less than 6’0.

However, with that enormous home-run total comes strikeouts. You may have noticed that he struck out just 84 times in that 56 home-run season, and he even walked more than he struck out. But 84 was a lot in 1930. In fact, Hack Wilson led the league in strikeouts.

In 2017, just 25 qualified hitters struck out 84 times or fewer. Of these 25, just one (Mookie Betts) matched or exceed Hack’s 709 plate appearances. This tidbit really speaks more to the two eras in discussion, but it’s interesting nonetheless.

Some other Hack Wilson fun facts:

Hack received MVP votes in five years. Amazingly, his monstrous 1930 season (undoubtedly his best) was not one of the five. However, this was due to the fact that the MVP was not awarded in 1930. Had it been, Wilson likely would have won in a landslide.

Despite having the single-season record for most RBI, he is tied for just the sixth-most seasons of 150 or more RBI with two, behind Lou Gehrig (7), Babe Ruth (6), Jimmie Foxx (4), Hank Greenberg (3), and Al Simmons (3), and tied with Sosa, DiMaggio, and Sam Thompson.

Despite the legendary 1930 season, Hack’s career was significantly below that of a typical Hall of Famer. His Gray Ink score is 110 (average HOF’s is 144), and his “Hall of Fame Standards” is 39 (average HOF’s is 50). His 38.8 career bWAR is nearly half of the average bWAR for center fielders, at 71.2.

That’s all I have on Lewis Wilson. He may still seem like a relatively mundane player, but imagine if Altuve came out in 2018 and kept up with Stanton and Judge in the home-run race. That is what Hack Wilson did in 1930, belting 56 homers as a man who stood 5’6″ tall (how can you not be romantic about baseball?).


Overcoming Imperfect Information

When a team trades a veteran for a package of prospects, only minor-league data and the keen eye of scouts can be used to assess the likely future major-league contributions from those particular players. Teams have accurately relied on the trained eyes of scouts for generations, but of course the analytics community wants its foot in the game too. Developments such as Chris Mitchell’s KATOH systems make some strides, as it is helpful to compare historical information. Does prospects rank on MLB.com’s or Baseball America’s top-prospect list really indicate how productive a player will be in the major leagues? Of course, baseball players are human, and production will always vary due to the result of numerous factors that could potentially change the course of someone’s career. Perhaps a player meets a coach that dramatically changes his game around, or a pitcher discovers a new-found talent for an impressive curveball that jumps him from low fringe prospect to MLB ready. The dilemma of imperfect information will always be present, so team must use the best resources available to them to tackle the problem.

To start my analysis of imperfect information, I look at the top 100 position prospects from 2009 using data from BaseballReference.com. I break up the prospects into three groups based on their prospect ranking, which are position players ranked 1-10, 11-20 and 21-100. I then look at the value that those prospects contributed in their first six seasons in the major leagues, as well as their to-date total contributions using fWAR. I choose to look at the first six seasons of a player’s career because that is how long a player is under team control before reaching free agency. This study does not take into account any contract extensions that may have been given before a player reached free agent-eligibility. For players who have not been in the MLB for six full seasons, I look at their total contributions so far. The general idea for this study was inspired by a 2008 article by Victor Wang that looked at imperfect prospect information.

I convert the prospects’ production into monetary value based on the relative WAR values that were commanded in the free-agent market that year. I use fWAR to encompass the best measure of total value. When teams trade for prospects, they understand that they are trading wins today for wins in the future. Since baseball is a business and teams care about their performance on the field each year, I need to account for that fact in my analysis. In order to do that, I assume all else equal, a win today is more valuable than a win in the future. I apply an 8% discount rate to each prospect’s WAR value and create a discounted WAR value (dWAR). The value of the discount rate can be debated, but the 8% rate seems appropriate for the time framed looked at.

From here, I break up the prospects into a few different subgroups based on their average WAR contributed over their first six seasons in the major leagues. I follow some of the guidelines laid out in other studies with some slight modifications. Players with 0 or negative WAR per year are labeled as busts. Players with slightly above 0-2 WAR are contributors. Players with 2-4 WAR are starters and players with 4+ WAR are stars. Like described previously, I estimate the players’ monetary savings to their team by taking their monetary value based on WAR performance and comparing it to what similar production would command in the free-agent market for that year. There seems to be some debate on the value of one WAR in the free-agent market, however my calculations show that about $7 million bought one WAR leading up to the 2009 season. Victor Wang suggests that the price for one WAR had about a 10% inflation rate from year to year. I find the present value of each player’s WAR, then divide it by the $7 million dollars per WAR that would have been commanded in the free-agent market in order to find a player’s effective savings to their team based on production.

Position Prospects Ranked 1-10

Bust Contributor Starters Star AVG WAR/Y
1 2 5 2 2.83
10.00% 20.00% 50.00% 20.00%

 

Bust Contributor Starters Star
WAR/Y 0.43 1.53 2.73 5.17
Probability 10.00% 20.00% 50.00% 20.00%
PV Savings/y (in millions) 1.88 8.46 10.91 27.98

Interestingly enough, this prospect class panned out quite well compared to some other recent draft classes. The only bust in terms of discounted WAR turned out to be Travis Snider of Toronto, who was ranked the sixth-best prospect in 2009 but only managed to accumulate a cumulative WAR slightly above 0 in his first six seasons. Though the top 10 position-player prospects from this class feature names such as Jason Heyward and Mike Moustakas, the player that contributed the greatest WAR over his first six seasons from the top 10 ranking was Buster Posey of San Francisco, who posted nearly 6 WAR a year. It is important to understand that the savings a player gives to his team based on his production does not indicate any “deserved” salary for that player. Instead, it merely indicates the amount of money the team would have had to spend in the free-agent market to acquire that exact same production. The top 10 position-player prospects from this prospect class turned very productive to their respective teams, having a 70% chance of being either a contributor or star.

Position Prospects Ranked 11-20

Bust Contributor Starters Star AVG WAR/Y
5 2 1 2 2.158950617
50.00% 20.00% 10.00% 20.00%

 

Bust Contributor Starters Star
WAR/Y 0.67 1.6 3.56 5.71
Probability 50.00% 20.00% 10.00% 20.00%
PV Savings/y (in millions) 3.21 8.36 19.10 30.90

The next group is the 11-20 ranked position players. As perhaps expected, there are more busts in this group of ranked prospects. The variation of the small is sample is spread through the rest of the categories. Giancarlo Stanton, the 16th ranked prospect, and Andrew McCutchen, the 33rd ranked prospect, turned out to the be the two stars from the list. As the chart shows, the probability of getting a bust at this ranking of prospects is much higher than the 1-10 rankings. The variance does show, however, that player outcomes expectancy can also be promising at this ranking level. There was an identical chance of player becoming a star in this group compared to the first group, and a 50% chance of them being at least a contributor. In total, four of the top 20 prospects from 2009 turned out to be stars to this point in their careers, though not all have reached six full service years in the majors.

Position Prospects Ranked 21-100

Bust Contributor Starters Star
12 7 3 1
38.71% 22.58% 9.68% 3.23%

 

Bust Contributor Starters Star
WAR/Y 0.35 1.46 3.22 3.87
Probability 38.71% 22.58% 9.68% 3.23%
PV Savings/y 1.48 7.52 17.18 20.80

 

The next group of charts shows the rest of the top 100 ranked position players. The chart shows there is much more potential for busts to be found in this ranking; however, we must keep in mind that the variance will be different in this group automatically because of the larger sample size than the first two groups. Nearly 40% of position players ranked 21-100 turned out to be busts. In addition, only Freddie Freeman of Atlanta managed to get above the 4+ dWAR/year threshold to qualify as a star. In fact, the most common category of these ranked position players is a bust. When drafting a player, a team never knows for certain the production that the pick will produce in the major leagues, no matter the pick number of the draft pick. In addition, prospect rankings based on minor-league performance is still not a completely accurate indicator of future MLB productivity. Higher-ranked prospects in 2009 did have higher probability of contributing more to their major-league club, though rankings are understandably volatile. A variety of factors play into the volatile nature of prospect outcomes and the prospect risk premium. Part of the reason I chose to only look at position players is because they are traditionally safer from injury than pitchers, and therefore carry slightly less of a risk premium.

Looking at the variance of dWAR for the prospect group, the distribution is skewed left, which is to be expected because not all prospects will turn out to be as equally strong, and most will not become stars. It also makes sense because in any given year, only a few top prospects will become very strong players, while most will hover around average. We also see that the inner quartile range is about from 0.5 dWAR per year to slightly above 2.5 dWAR per year. Therefore, it could be expected that a team get production in that range from a given prospect ranked 1-100, varying sightly in what rank group they are in. A useful analysis would be to make a distribution chart of each rank group, but in the interest of brevity, I do not do that here.

New ways of evaluating both minor league and amateur players to relieve some of the prospect-risk premium is useful, although risk will always be present. In the next part of this study, I will try to discover statistically significant correlations between college and major-league performance in order to try to reduce the noise of prospect-risk premium. One of the great things about the baseball player development structure is that it allows players with the right work ethic and dedication, as well as others who were overlooked in high rounds of the draft, to prove themselves in the minor leagues. That can seldom be said it other professional sports. The famous example of this was Mike Piazza, who was one of the last overall picks in his draft class and worked his way to a Hall of Fame career. With perfect information, the graph would be perfectly skewed left, with each ranked prospect achieving a higher dWAR than the next ranked prospect. Some may attribute the imperfect information dilemma to drafting or the evaluation of minor-league performance, and some may attribute it to differences in player-development systems. Some may also rationally say that both the players and the scouts are humans and will not be perfect. Prospects rankings for a given year are based on several factors, including a player’s proximity to contributing on the major-league level. The most talented minor-league players could be at a lower ranking in a given year because of their age or development level, which could cause some unwanted variance in the data. Looking at the just the 100 top prospects helps somewhat eliminate this problem, but will not make the problem completely disappear. It is difficult to know when teams plan on calling up prospects anyway, and it really depends on the needs of the team. Some make the jump at 20, while others make the jump at 25, or even later.

This type of analysis could be useful for things like estimating opportunity cost of a trade involving prospects for both financial trade-offs and present versus future on-field production. A lot of factors play into the success of a prospect. When evaluating any player, things such as makeup and work ethic are just as big of factors as measurable statistics. Evaluating college and high-school players for the annual Rule 4 draft can be especially difficult because of the limited statistical information that are accessible. Team scouts work very hard to accurately evaluate the top amateur players in the United States and around the world in order to put their team in a good position for the draft. Despite the immense baseball knowledge that scouts bring to player evaluation, statistical analysis on college players is still explored and used to complement traditional scouting reports. Prospect-risk premium will always be something teams must deal with, but efficiently allocating players into a major-league pipeline is essential for every front office.

There have been a few other articles on sites such as FanGraphs and The Hardball Times on statistical analysis of college players. Cubs president Theo Epstein told writer Tom Verducci that the Cubs analytics team has developed a specific algorithm for evaluating college players. The process involved sending interns to photocopy old stat sheets on college players from before the data was recorded electronically.

Though I do not doubt the Cubs have a very accurate and useful algorithm for such a goal, the algorithm is not publicly available for review, and understandably so. However, for the several articles which tackle this question on other baseball statistical websites, I think there is some room for improvement. First, the multiple of different complex statistical analysis techniques to compare college versus MLB statistics yield about the same disappointing results as the other, meaning that some of the models are probably unnecessarily complicated. Second, though the authors may imply it by default, statistical models in no way account for the character and makeup of a college player and prospect. Even in the age of advanced analytics, the human and leadership elements of the game still hold great value. Therefore, statistical rankings should not be taken as precise recommended draft order. In addition, they do not take into account injury history and risk of a player. Teams can increase their odds of adding a future starter or star over a player’s first six seasons by drafting position players, who have been historically shown to be safer bets than pitchers due to a lesser injury risk.

The model in this post attempts to find statistically significant correlations between players’ college stats and a player’s stats for his first six seasons in the MLB. Six seasons is the amount of time a team has a drafted player under control until they reach free agency and the player is granted negotiating powers with any team, like we’ve gone over. However, the relationship between college batting statistics and MLB fWAR can only go so far because of the lack of fielding and other data for college players.

The first thing I did was merge databases of Division I college players for years 2002-2007 with their statistics for their first six years in the MLB. There is some noise in the model since some payers in the MLB who were drafted in later years in my sample have not spent six years in the MLB, which is accounted for. I only look at the first 100 players drafted each year. I then calculate each player’s college career wOBA per the methods recommended by Victor Wang in his 2009 article on a similar topic. However, since wOBA weights are not recorded for college players, the statistic is more of an arbitrary wOBA that uses the weights from the 2013 MLB season. Since wOBA weights do not vary heavily from year to year, it will do the trick for the purpose of this analysis. For MLB players, wOBA compared to wRC and wRC+ have a 97% correlation (varying slightly on the size of the sample) so I did not feel it was necessary to calculate wRC in addition to wOBA. In fact, when using ordinary least squares and multiple least squares regression techniques, I would have experienced problems with pairwise collinearity, so calculating both statistics would have proved pointless. Along with an ordinary least squares regression technique, I also use multiple least squares and change the functional form to double logarithmic. (A future study I hope to tackle soon is to use logistic regression techniques to calculate the odds of a college player ending up each of the four WAR groups for their first six season in the majors.)

Due to the limitations in the data as well as the restrictions on the amount of top 100 picks that actually make it to the MLB, the analysis is somewhat limited, yet still produces some valuable results. Interestingly, though perhaps unsurprisingly, my calculated wOBA for each player’s college career showed a strong and statistically significantly relationship with wOBA produced in the MLB. To a lesser extent, college wOBA also indicates a statistically significant relationship with MLB-produced WAR, even though this study does not take into account defense, baserunning, etc. Looking at a collinearity matrix, I find that college wOBA and MLB wOBA have about a 25% pairwise collinearity. In addition, the matrix shows a similar pairwise collinearity of about 25% between college wOBA and MLB WAR, though at a lower level of confidence. Using an ordinary least squares regression, I use different functional forms to further evaluate the strength of the relationship between college and MLB statistics.

The first model confirms a fairly strong and statistically significant relationship at the 1% level between college and MLB wOBA with a correlation coefficient of about .25. College strikeout to walk ratio is also statistically significant at the 1% level albeit without a strong correlation coefficient. Even so, looking back at the matrix indicated that players who are less prone to the strikeout in college, on average, see better success in the MLB. Interestingly enough, college wOBA and strikeout to walk ratio are about the only two statistically significant statistics that I can find by running several models with different functional forms. Per the model, we can also say that it is likely that college hitters with extra-base-hit ability have better prospects in the majors. The R-square for model one is about .20, which is not terrible, but certainty not enough information to provide a set-in stone model. The constant in the regressions seem to capture noise that is difficult to replicate, lending insight to the extreme variance and unpredictability of the draft.

For model 2, I use a double logarithmic functional form with a multiple least squares linear regression in order to see the variance in MLB wOBA with college wOBA and strikeout to walk ratio. The results of this regression are slightly stronger and look a bit more promising to the conclusion that the calculated college wOBA is a strong predictor of MLB wOBA.

According to the results of the double log model, a one percent increase in MLB wOBA corresponds to about 36% increase in college wOBA, all else equal. (Since the model is in double log form, the interpretation is done by percent and percentage points.) We can more simply interpret this that a player, on average and all else equal, will have a one percent higher wOBA in MLB for every 36% increase to their college wOBA compared to other players. The coefficient is significant at the one percent level. In addition, a one percent increase in MLB wOBA corresponds to about a six percent decrease in college strikeout to walk ratio. Again, I get about a R-squared of about 0.20.

Perhaps the most interesting thing that these regressions have shown is that college batting average has almost no correlation with MLB success. This may be a little misleading because hitters who get drafted in high rounds and who do well in the MLB will likely have high college batting averages, but the regressions show that there are other things teams should look for in their draft picks besides a good batting average. Traits such as low amounts of strikeouts, especially relative to the number of walks, helping indicate a player’s pure ability to get on base. When evaluating college players, factors such as character build, work ethic and leadership abilities will be just as good as indicators for success for strong college ball players. Perhaps the linear weights measurements used in wOBA calculations are on to something. Accurate weights can obviously not be applied to college statistics without the proper data, but the comparisons using MLB weights for college players can still be useful. In addition, it is also well known that position players are traditionally safer higher-round picks than pitchers due to injury risk. I would argue that strong college hitters are often times the most productive top prospects, while younger pitchers who can develop in a team’s player-development system can be beneficial for a strong farm system and pipeline to the major leagues. Many high-upside arms can be found coming out of high school, rather than taking power college pitchers. In addition, arms from smaller schools often times are overlooked due to the competitive environment they player in. Nevertheless, hidden and undervalued talent exists that could result in high-upside rewards, both financially and productively for teams.