Archive for March, 2016

The Truth About Hitting the Ball Hard

I recently presented evidence that power and contact are independent skills.  An increase in power does not have to come at the cost of contact.  Surely intuition disagrees with these findings and when that happens you should be skeptical. I would be skeptical.

One reason a trade-off between power and contact is intuitive is that we are accustomed to speed-accuracy trade-offs for many everyday actions.  For example, we slow down when we pour a fresh cup of coffee because going too fast is dangerous.  Implicitly, we assume there is a speed-accuracy trade-off when we suggest that hitters can cut down on their swing to achieve more contact. Richard A. Schmidt is like the Bill James of my field — motor behaviour — and in 1979 he and his colleagues published the Theory of Accuracy for Rapid Tasks.  According to Google it has been cited over 1200 times.  While speed-accuracy trade-offs for movement are typical, the theory explains that rapid timing tasks like hitting are an exception to this rule.

The theory is a dense 46-pager including equations, but I’ll provide a couple critical graphs to illustrate its implications to hitting.  First, Figure 1 presents the results of an experiment that investigated the effect of movement distance and movement time on spatial error.

Spatial error
Figure 1. Spatial error (“We”–deg.) as a function of movement time (MT–msec) and movement distance (A–deg).

The results indicate that movement time, that is, movement speed, had almost no impact on spatial error. You’ll notice that the movement times tested in this experiment are conveniently reflective of a short and a long MLB swing (per Zepp).  The movement distances in the experiment were shorter than a swing, and the task far simpler, but the results are suggestive nonetheless.

A second experiment explored the effect of movement speed and distance on timing error.  Unlike the experiment above, movement speed did have an effect on timing error.  Figure 2 presents data indicating that faster movements result in significantly less timing error than slower movements, irrespective of movement distance.

Tempral error
Figure 2. Timing error (VEt–msec) as a function of movement time (MT–msec) and distance (A–deg).

In addition to these two examples, there is a substantial empirical and theoretical framework suggesting rapid timing tasks are exempt from a speed-accuracy trade-off.  Swinging slower does not increase a hitter’s chance to make contact.  On the basis of these data and the data I presented previously, it seems that hitters can try to hit the ball as hard as possible, within reason, without sacrificing contact or base-hit skill.

UNDERSTANDING HARD%

Power, contact, speed and discipline account for 66% of variance in hitting production. Power, measured by Hard%, is by far the most important skill. But what does Hard% measure, exactly?  The description of Hard% can be found in the glossary here.  Basically, Hard% describes the proportion of batted balls that meet an unknown criteria for “hardness,” and depends on hit-type, hang-time, landing-spot, and trajectory.  Importantly, Hard% does not include exit speed in its calculation.

In the plot below, Average Exit Speed for players with a minimum of 190 Abs in 2015 is plotted against their Hard%.  It is pretty clear from Figure 3 that while Hard% doesn’t directly measure exit speed, it does a pretty good job of estimating it.

Exit speed and hard
Figure 3.  Average Exit Speed and Hard%.

Given the tight relationship between Average Exit Speed and Hard%, I wondered if both measures were equally effective at predicting production.  The graphs in Figure 4 and Figure 5 present both power measures plotted against wRC+.

Hard and wRC+
Figure 4. Hard% and wRC+.

Exit speed and wRC+
Figure 5.  Average Exit Speed and wRC+.

Hard% does a better job of predicting production than Average Exit Speed, explaining about 23% more variance.  Since exit speed is a more direct measurement of power than Hard%, it follows non-power related data included in Hard% are relevant to production.  Previous research suggests that hit-type and trajectory are important to the outcome of a batted ball, and since both variables are used to calculate Hard%, it seems likely they contribute to the relationship between Hard% and wRC+.

INTRODUCING LIFT BIAS

Trajectory is tightly linked to outcome and hitters only control the trajectory (or angle) they intend to hit the ball on.  We have no way to measure hitters’ intentions. The only data on vertical launch angle that I’ve been able to access are extremely limited, or incomplete, so we can’t estimate hitters’ intentions based on results.  If we had a database of swing-plane information we could estimate each hitter’s intentions based on his average swing plane relative to the pitch, but we don’t have such a database.  What we do have are data on each hitter’s average exit velocity on ground balls, as well as their average exit velocity on line drives and fly balls.  If we assume that each hitter is trying to hit the ball as forcefully as possible along their intended trajectory, and further assume that over the course of a season exit velocity will be maximal around the force vector intended by the hitter, then we can infer each hitter’s bias toward lower or higher trajectory hits by subtracting their average ground-ball velocity from their average line-drive / fly-ball velocity.  The lower the resultant value, the lower the trajectory we can assume the hitter intended.  I examined the relationship between AvgLD/FB – AvgGB (or, Lift Bias) and Hard% and the results are in Figure 6 below.

Lift Bias and Hard
Figure 6.  Lift Bias and Hard%.

Almost every hitter in the sample hit the ball harder in the air than on the ground.  Only Melky Cabrera, Jason Heyward, and Nick Markakis hit their ground balls harder than their line drives and fly balls in 2015.  As suspected, almost every hitter appears to be trying to hit the ball in the air.  There is an apparent relationship between Lift Bias and Hard%, suggesting that hitters who intend to hit the ball on a higher angle tend to record more hard hits per contact.  To see if this was due to harder hitters choosing to lift the ball more, I examined the relationship between Average Exit Speed and Lift Bias and the results are presented in Figure 6 below.

Exit speed and Lift Bias
Figure 6.  Average Exit Speed and Lift Bias.

Surprisingly, there is practically no relationship between Average Exit Speed and Lift Bias.  This suggests that Lift Bias is associated with Hard% independent of how forcefully a hitter strikes the ball.  Since Lift Bias and Average Exit Speed are independent predictors of Hard%, I modeled the effect of both simultaneously with multiple regression.  The model explained 75% of variance in Hard% overall, and the part and partial correlations are reported in Figure 7 below.

Regression coefficients
Figure 7.  Multiple regression coefficients. 

The part correlation value in Figure 7 indicates the unique variance explained by each predictor.  Thus, Average Exit Speed explained 52% of the total variance in Hard%.  The partial correlation value describes the proportion of the remaining variance explained by one predictor after accounting for the other.  Thus, after accounting for Average Exit Speed, Lift Bias explained 26% of the remaining variance in Hard%.

In order to determine how much of the relationship between Hard% and production can be accounted for by Average Exit Speed and Lift Bias, I plotted predicted Hard% against wRC+.  The results indicate that Average Exit Speed and Lift Bias together account for almost, but not quite all of the relationship between Hard% and wRC+. See Figure 8 below.

Predicted Hard% and wRC+
Figure 8.  Predicted Hard% and wRC+.

If you compare Figure 8 and Figure 4, you can see that real Hard% still explains more of wRC+ than predicted Hard%, but the predicted values are getting close.  Since Hard% is based on the result of each hit rather than a tendency to hit balls harder in the air or on the ground, it makes sense that Hard% should be more related to performance.  It is impressive that two variables not directly measured in Hard% explain so much of its variance, as well as such a high percentage of its relationship to wRC+.

DOES LIFT BIAS COME WITH A TRADE-OFF?

One of the most interesting results described above is the null relationship between exit speed and Lift Bias, suggesting that an increase in Lift Bias may be beneficial regardless of power. Yet again, intuition kicks in protesting that while it might be more effective for power hitters to try to lift the ball, when light hitters lift the ball the result is a fly out. Since Lift Bias is unrelated to exit speed, examining the relationship between Lift Bias and BABIP should give a hint as to whether increasing Lift Bias decreases the chances of getting at least a single.

Lift Bias and BABIP
Figure 9.  Lift Bias and Batting Average on Balls in Play (BABIP).

Lift bias apparently has no relationship to BABIP, which seems counterintuitive.  Does lift bias even have an effect on batted-ball type? Not really.  The relationship depicted in Figure 10 below is the strongest of all, and even then Lift Bias only explains 8% of the total variance in GB%.

Lift Bias and GB
Figure 10. Lift Bias and Ground Ball Rate (GB%).

The launch angle of a batted ball depends more on the offset of the ball and bat at contact than on the attack angle of the swing.  Thus, perhaps it shouldn’t be too surprising that an ostensible measure of swing plane has little relationship to batted ball distribution.  While offset largely determines launch angle, swings that have more positive attack angles (to a point) are more optimal for batted ball distance. If Lift Bias is based on a more positive attack angle, we might expect to see a positive relationship between Lift Bias and HR/FB.  In fact, as shown in Figure 11, Lift Bias accounts for 30% of the variance in home runs per fly ball.

Lift Bias and HR/FB
Figure 11.  Lift Bias and Home Runs per Fly Ball (HR/FB).

Lift Bias has a strong relationship to average distance, and a smaller but still significant relationship to maximum recorded distance as well.  These data suggest that swing plane may be responsible for at least part of the observed Lift Bias, since increased Lift Bias seems to optimize batted-ball distance.

If swing plane does drive Lift Bias, one might expect a trade-off between Lift Bias and contact skill.  Since pitches are typically thrown on a negative angle of around 6 degrees, and attack angles exceeding 6 degrees can result in farther hits, it follows that hitters may be using a more severe uppercut than a 6 degree “level” swing to generate Lift Bias.

I used the Real Contact measure from my previous study to estimate contact skill for the hitters who have data in the 2015 sample.  The results indicated that Lift Bias is negatively associated with Real Contact, accounting for about 20% of the variance. This is the first hint of the nuance between slugging and contact, suggesting that hitters may be using steep swing planes to generate lift.  Conversely, Real Contact was unrelated to Average Exit Speed, confirming the absence of a trade-off between force and accuracy.

COMPARISON OF PLAYERS WITH MOST OR LEAST LIFT BIAS

It still seems counterintuitive that all players would benefit from having a lift bias in the top range of the sample. Is it possible that players at either end of the Lift Bias distribution are especially powerful or light-hitting, causing the appearance of a true relationship but reflecting only selective sampling? To examine the players with the most extreme Lift Bias (or lack thereof), I divided the sample into two groups with the 50 most Lift Biased and 50 least Lift Biased players.  First, I tested for differences in the potential to generate power by comparing the two groups on maximum recorded exit speed. The group with the most Lift Bias had a mean Max Exit Speed of 111mph, while the low Lift Bias group had a mean of 110mph. There is little difference in power potential between the most Lift Biased players and the least.

Next, I tested for differences in power production by comparing the groups on HR/FB.  As you can see in Figure 12, the high Lift Bias group (.167) saw their fly balls leave the park over twice as often as the low Lift Bias group (.074).

Group means: Power
Figure 12.  Mean HR/FB for the Low Lift Bias and High Lift Bias groups. Error bars represent 95% confidence intervals.

Finally, I compared the two groups on overall production.  The high Lift Bias group had a mean wRC+ of 117, while the low Lift Bias group had a mean of 93.  The players with the largest Lift Bias are, on average, substantially better than league average.  Conversely, the players with the smallest Lift Bias are somewhat worse than the league average. Figure 13 presents the observed means with error bars representing 95% confidence intervals.

Group means: Production
Figure 13.  Mean wRC+ for the Low Lift Bias and High Lift Bias groups. 

The players with a large Lift Bias have basically the same power potential as the players with the least bias, yet they have much more power production.  The extra power production completely accounts for the difference in overall production between the groups, which is substantial.

CONCLUSION

Over the last two articles, I have been detailing a hierarchy of measurable skills that explain the majority of variance in hitting production.  Further, I have demonstrated that there is little trade-off between skills.  Fast exit velocity does not come at the expense of contact, and Lift Bias does not come at the expense of base hits.  There does appear to be a small trade-off between Lift Bias and contact, suggesting that situational hitting could require adjusting swing plane or intended trajectory.

Power is the most important skill to production and is comprised of two sub-skills: Hitting balls harder on average (measured by Average Exit Speed), and generating more Lift Bias (measured by subtracting AvgGB velocity from AvgLD/FB).  The next most important is contact skill, which was estimated by parceling the effect of Fastball% out of True Contact (a location-independent measure of contact), to provide an estimate of real contact ability independent of how a hitter is pitched.  Finally, speed and discipline (represented by Spd and O-Swing%) are equally important skills, but much less important than power. Figure 14 depicts the relative importance of each skill in estimating production.

The relative importance of hitting skills
Figure 14.  The relative importance of hitting skills.

It is tempting to assume this model is causal, when in fact the data are all correlational.  If the data were causal, the conclusions for hitting coaches would be obvious:  a) Optimizing exit speed with efficient mechanics and hard work should be an ongoing goal for every player, b) Players should focus on driving the ball in the air and the hitting coach should help his hitters optimize their Lift Bias, c) Equally important, hitters should practice their contact skills against all pitch types on a situational basis, d) Discipline, which can be trained, should get about half the attention that contact receives, and e) The league is full of underachievers – assuming Lift Bias is a learnable skill.

Science will require experimental evidence before concluding that the skill hierarchy provides a causal explanation of hitting production.  Hitters and coaches may not want to wait around.  Hey, Kevin Pillar! Give me a call…


Using Contact Rates to Evaluate Pitchers

A little over a month ago, I published this piece detailing the methods that I had created to alternately assess hitter performance. I highly recommend glancing at that article before reading this one; it will make a whole lot more sense. For the lazy, here is a brief primer: I focused on using rates (contact, hard%, etc.) to create rough estimates of what would happen on any given pitch. What is the probability that Mike Trout hits a hard line drive on a pitch in the strike zone? The more a player does that, is he more likely to be a successful hitter overall? One of the advantages of this approach is that it helps to remove the actions of a hitter from his circumstance; a hard line drive is a hard line drive, but the placement of it will greatly affect whether or not the player reaches base. Poor defense, such as one may find in the minor leagues or college ball, is made less important in judging a player.

On of the questions remaining was whether or not I could apply some of these same methods to evaluating pitching. So far, the answer is a qualified yes. We already have a number of metrics to determine pitching value without regard for circumstance, but these methods still provide useful insights. Using the existing methods, such as xFIP, we can determine which rate stats are strong indicators of success.

There is one result that emerged above all else: there is no such thing as a weak-contact pitcher. There is a significant amount of talk about pitchers “keeping the ball in the park” or “getting weak ground balls.” However, this method indicates no such thing. By simply multiplying contact rates with “Soft%” for all 2015 qualified pitchers and therefore creating the “SoftXCont” statistic, I was able to search for any correlation between this rate and xFIP. Judge the results for yourself:

View post on imgur.com

Clearly, almost no correlation. However, remember that this only examines the aggregate; perhaps some specific pitchers can leverage this so-called skill to great effect. But, it appears that at least on average, generating weak contact is a poor indicator of overall pitching success.

The opposite is absolutely true. Pitchers who allowed less hard contact saw substantial increases in xFIP, as measured by my “HardXCont” number.

View post on imgur.com

The correlation is relatively strong, especially compared to the correlations seen in other baseball metrics. Clearly there is something going on here; pitchers who allow less hard contact per pitch get better results. Duh. For an even more clean-cut view of this, we can look at GoodXCont, which uses a combination of “Hard” and “Medium” contact.

View post on imgur.com

That correlation is excellent, and indicates that measuring GoodXCont would be a significantly powerful way of evaluating pitchers.

So, we see that pitchers who limit hard contact and good contact are more successful than their peers. We also see that allowing a large amount of soft contact is not indicative of overall success. The “weak contact” type pitchers (think Rick Porcello) are not necessarily succeeding thanks to any particular ability to generate soft contact; any corresponding ability comes more from being able to allow less hard contact.

For scouts, this means finding pitchers who both limit total contact and allow only poor contact. By using these metrics, rather than the outdated ERA or a radar gun, they can get a strong impression of future big-league success.

In a future piece, I plan to dive deeper into research on “soft contact” pitchers. While these initial results indicate that soft contact is not a good indicator of overall success, there is further work to be done. Stay tuned.


The Mariners are Finally Using Safeco Field Correctly

It’s no trade secret that playing to the strengths of your ballpark helps your chances to succeed. To gain an advantage, franchises can exploit, and even sometimes manipulate their home ballpark. If you run the Astros or Reds, who play baseball in a lunchbox, you can succeed by employing otherwise-flawed home-run hitters with little regard for who gets on base ahead of them. When you play half your games in an airplane hangar, however, stubbornly attempting to put the ball over 900 foot fences is foolish. A foolish strategy common of recent Mariners teams. A foolish strategy that wasn’t working.

M’s Team Stats OBP ML Rank SLG Pct. ML Rank wOBA ML Rank
2015 .311 22 .411 12 .313 17
2014 .300 27 .376 21 .299 25
2013 .306 26 .390 20 .307 20
2012 .296 30 .369 30 .291 30
2011 .292 30 .348 30 .283 30
2010 .298 30 .339 30 .285 30

If you have a weak stomach, do not view the last few rows.

The Mariners wrote the Greatest Hits on failing to get on base and, not surprisingly, struggled to win games during those seasons. For years and years, the Mariners tried succeeding with players like Logan Morrison, Michael Morse, and Mark Trumbo, desperately clinging to the home run as the heralded harbinger of scoring runs. Whether this was evidence of a failing regime by general manager Jack Zduriencik remains up for debate, but the front office had seen enough. Around the same time, a wayward GM separated cleanly from the Mariners division rival Angels was seeking asylum, armed with his own vision of building a team.

Strategy 1: Get on Base

Jerry Dipoto, presumably having read Moneyball, understood the value of getting baserunners, and how to get players on base.

“Command the Strike zone” Dipoto told Justin Myers and Gee Scott on their ESPN 710 Seattle radio segment. “From the top of the lineup to the bottom, we will command the strike zone”.

Dipoto began addressing the team’s glaring need for baserunners by signing catcher Chris Iannetta, who had played for Dipoto in Anaheim, and had posted OBP numbers over .350 in 2011, 2013 and 2014. Dipoto found further help by trading for Adam Lind (.350 OBP in 2015) and  signing free agent Norichika Aoki (.353 OBP in 2015, 6.4 K%).

None of these moves were meant to be earth-shattering, but each undoubtedly made the Mariners lineup better. With a solid core of Robinson Cano, Nelson Cruz, and Kyle Seager, Dipoto’s goal was to fill the remaining slots with valuable role players, each of whom is more than capable of getting on base.

Here is a table of several key Mariners offseason additions, with 2015 statistics, and 2016 ZIPS projections courtesy of Dan Szymborski. Note that season projections are often more conservative estimates, as they account for a certain level of player regression.

OBP (2015, 2016) wOBA (2015, 2016) BB% (2015, 2016) K% (2015, 2016)
Chris Iannetta .293 .281 12.9 26.2
.329 .306 14.0 25.8
Adam Lind .360 .351 11.5 17.5
.334 .315 10.1 19.5
Nori Aoki .353 .326 7.7 6.4
.332 .313 7.0 7.8

Strategy 2: Prevent runs, Create runs

Dipoto, addressing the fallbacks of that revolutionary A’s season, also understood the value of defense and speed. “We see ourselves as a run-prevention club. You can create a lot of advantage playing good defense. We also see our overall team defense as our biggest area in need of improvement.”

Dipoto went primarily after well-rounded players, but several moves in particular focused on defense and speed. In November, Dipoto traded closer Tom Wilhelmsen to Texas in exchange for Leonys Martin, a light-hitting center fielder with blazing speed. Martin didn’t quite play enough innings (334) in 2015 to qualify for the CF leaderboard, but his 15.4 Ultimate Zone Rating/150 would have ranked him 5th best among MLB center fielders, just above Lorenzo Cain. Martin, by the FanGraphs arm strength statistic, also had the strongest arm of any center fielder in baseball.

In terms of speed, Martin is as fast as they come. He’s been consistently valuable on the basepaths, posting a 4.3 and 4.2 BRR in 2014 and 2013 respectively (BRR is Baseball Prospectus’s baserunning statistic, where 0 represents an average baserunner). Martin posted a lower total BRR in 2015 (1.5), mostly because his on-base percentage dropped 61 points from 2014, and he appeared at the plate 273 fewer times (generally it’s harder to be a valuable baserunner if you don’t get on base as often).

The second move was to acquire Boog Powell, young center field prospect, from Tampa Bay. Powell was part of a larger trade, wherein Seattle received starting pitcher Nate Karns and Powell, and sent Logan Morrison and shortstop Brad Miller to the Rays. We’ll talk about Karns in the last section, but Powell further embodies Dipoto’s vision of commanding the strike zone, getting on base, and playing defense.

Powell’s defensive statistics are less clear than Martin’s, since Powell has never stepped foot in the major leagues, but he’s consistently graded out in the minor leagues as a plus defender. Powell is 22, and serves as outfield depth should Martin fall down a well in center field.

It’s clear that Dipoto aggressively wanted to improve the outfield defense. In his wild spree of moves, he also made his infield defense better. In trading for Lind, he incrementally made first base a more well-defended position (Lind posted a 3.8 UZR in 2015, compared to Logan Morrison’s -2.9). Brad Miller was a plus defensive shortstop (1.1 UZR, 4.6 dWAR), but with the emergence of talented, young Ketel Marte (1.2 UZR, 2.8 dWAR in 310 fewer innings at SS), Dipoto knew he could afford to trade Miller.

If one looks around at the Mariners in the field, Robinson Cano and Nelson Cruz are currently the only remaining defensive liabilities, and Cruz might not see much right-field time this year. Kyle Seager is a plus defender, Aoki is capable in left, and Seth Smith improved his defense dramatically last season. The team re-signed Franklin Guitierrez (3.4 UZR, 1.9 dWAR) to split Right Field with Smith and Cruz. At the catcher position, both Iannetta and Mike Zunino are among the 10 best pitch framers in baseball, saving an aggregate 26.8 runs in 2015.

The Mariners were the 5th worst defensive team in 2015, but that looks likely to improve in 2016.

Strategy 3: Taking advantage of Dinger-hitting tendencies

When you play baseball in an extreme pitcher-friendly park, in a sea-level city whose summer nights are cool and humid, home runs are a rare commodity. The Mariners understand they won’t win by hitting home runs, but they also understand that the same difficulty exists for opposing teams. Thus, the Mariners can fill their starting rotation with pitchers with higher than average fly-ball rates. Here are the totals from Mariners starters in 2015. WARP is Baseball Prospectus’s cumulative wins above replacement player statistic.

IP FB % GB% BABIP WARP
Felix Hernandez 201.2 26.9 56.2 .288 3.3
Taijuan Walker 169.2 39.0 38.6 .291 1.8
Hisashi Iwakuma 129.2 31.1 50.3 .271 2.5
James Paxton 67.0 34.4 48.3 .289 0.0
Roenis Elias 115.1 36.4 44.2 .280 0.9

Normally we’d expect a higher GB rate to correlate with a higher BABIP, since it’s more likely for ground balls to find holes and become hits than it is for fly balls. Felix has the highest GB rate of that table, and still maintained a better-than-average BABIP. That’s because he’s Felix Hernandez, and he’s better than you. Iwakuma, 34, also posted a ground-ball rate of 50%, and he’s never posted a BABIP above .287. After 2000 balls in play, a pitchers BABIP will normalize, and Iwakuma is quickly approaching that. Walker has the highest FB rate, so it’s probably good that he pitches where he does.

Before you even get beyond the innings pitched column, however, it’s clear the Mariners were thin on reliable starting pitching depth in 2015. Out of the players above, only Hernandez and Walker eclipsed 130 innings, only those two and Iwakuma provided any sort of positive contribution, and Roenis Elias is now on the Red Sox.  So the offseason began, and Dipoto got to work.

Earlier we mentioned Boog Powell becoming a Mariner, but he came over as secondary piece that landed the team starting pitcher Nate Karns from Tampa Bay. Karns had a quasi-breakout season in 2015, posting a 3.67 ERA and 3.90 xFIP in 147.2 innings pitched (xFIP is a Fielding Independent Pitching statistic that takes fly-ball rate into account). This was the first full season for the 27-year-old Karns, who also had a 36.5% fly-ball rate in 2015. Of those fly balls, 12.5% went for home runs, an above-average rate for a starting pitcher. While Tropicana Field is not an especially friendly ballpark for hitters, every other park in the AL East dramatically favors home runs, and Karns’s HR rate was likely hurt by pitching frequently at parks like Yankee Stadium and Camden Yards.

Karns should be aided by the expansive parks of the American League West, where more fly balls will become outs. If Karns matches, or even exceeds his peripherals in 2016, while maintaining his high fly-ball rate (fly-ball rate normalizes after 70 fly balls, a total Karns exceeded long ago), he should lower his home-run rate, and his BABIP. Karns also has room for regression, as HR/FB doesn’t normalize until after about 500 IP.

There is a question of Karns’s durability, having only one major-league season with over 100 innings pitched, but no such question exists with Dipoto’s next trade target. A month after grabbing Karns, Dipoto traded Elias and closer Carson Smith to Boston for Wade Miley, one of the most consistently durable left-handed starters in the game. Smith was a bright spot in a bad Mariners bullpen, so Dipoto had to give up some value to acquire Miley, but the GM took that risk to bolster a shaky rotation. Miley has pitched more than 190 innings in four consecutive seasons: 2015 in Boston, and the previous three in Arizona. All of those years have featured FIPs below 4, and improvements across many categories in 2015, lowering his home run/9 rate by .24 despite pitching in the AL East. It’s no stretch of the imagination for Miley to improve even further in 2016, playing in front of an overhauled Mariners defense.

Miley and Karns, 2015 Statistics
Name            IP          FB%          GB%        BABIP        WARP
Nate Karns           147         36.5          41.9          .285            1.6
Wade Miley          193.2         30.5          48.8          .307            2.5

You start to see how exploiting these park advantages becomes mutually beneficial. A speedy outfield defense will turn more of Nate Karns’ fly balls into outs, and a more solid infield defense will help turn Miley’s ground-ball hits into outs as well. On the offensive side, players who don’t strike out will put the ball in play more often, and the increased speed of the lineup will turn more of those balls in play into hits, increasing the number of baserunners. If, with all of these improvements, we still believe in Nelson Cruz’s power, Kyle Seager’s upward trajectory, and continued King Felix domination, we believe in Mariners success.


The Truth About Power, Contact, and Hitting in General

The overarching purpose of this study was to identify the core skills that underlie hitting performance and investigate the extent to which hitters must choose between these skills. The article unfolds in two parts.  In Part 1, I explore the ostensible trade-off between power and contact in search of the optimal approach. Then in Part 2, I show that 66% of variance in wRC+ can be explained by four skill-indicators: power, contact, speed, and discipline.  It will be revealed that increasing hard contact should be of paramount importance to hitting coaches, while contact and discipline are complimentary assets.

PART ONE: IS THERE A POWER-CONTACT TRADE-OFF?

Eli Ben-Porat recently published a terrific study on the trade-off between contact ability and power and I will be building on his findings.  As such, I will be using the same sample as his study, which includes all players since 2008 who have swung at 1000 pitches or more. First, I want to explain why it is assumed that there is a trade-off between power and contact.  Not only is it intuitive that a hitter chooses between swinging for the fence and putting the ball in play — there is also clearly a trade-off between abilities among MLB hitters.  Here is a plot of the relationship between SLG on Contact and Contact%.

SLG and Contact
Figure 1. Contact Rate and SLG on Contact.

There is a strong inverse relationship between power and contact, explaining 42% of total variance.  However, Ben-Porat cited evidence that power hitters tend to face tougher pitches than light hitters, a factor that is likely to affect their contact rate.  When Ben-Porat controlled for effect of pitch location on contact rate, the relationship between contact and power dropped to an R2 of 33%. Figure 2 plots the relationship between Ben-Porat’s new True Contact, a location-independent measure of contact skill, and SLG on Contact.

SLG and True Contact
Figure 2. True Contact and SLG on Contact.

While controlling for location loosened the relationship between power and contact, there still appears to be a significant inverse correlation between the skills.  Is this lingering relationship due to a necessary trade-off between hitting for power and making contact? I propose not.  Instead, consider the relationship between Fastball% and SLG on Contact.

The graph in Figure 3 plots the relationship between percentage of fastballs faced and SLG on Contact.

SLG and Fastball%
Figure 3.  Percentage of Fastballs Faced and SLG on Contact.

Predictably, pitchers tend to throw fewer fastballs to more powerful hitters.  To parcel out the effect of pitch type, I examined the relationship between regular Contact% and SLG on Contact while controlling for Fastball%.  This strategy is similar to Ben-Porat’s approach but controls for pitch type rather than location.  The results of a simultaneous multiple regression analysis indicate that when holding Fastball% constant, Contact% explains just 12% of the variance in SLG on Contact.  In other words, most of the relationship between Contact% and SLG on Contact was due to differences in the amount of fastballs faced.

To do a little better, I examined the relationship between Fastball% and True Contact.  Figure 4 shows that Fastball% accounts for about a quarter of the variance in True Contact.  Understandably, as Fastball% increases so does True Contact.

Fastball% and True Contact
Figure 4.  Relationship between True Contact and Fastball%.

While True Contact controls for the location of pitches faced, it does not account for the proportion of fastballs faced.  When the effect of Fastball% is held constant, True Contact accounts for just 9% of the variance in SLG on Contact.  I computed a new Fastball%-independent version of True Contact, called Real Contact, and plotted it against SLG on Contact in Figure 5.

Real Contact and SLG
Figure 5. Relationship between Real Contact and SLG on Contact.

The plot resembles a shotgun distribution with only a slight relationship between power and contact left. It is possible this remaining relationship is due to what’s left of the “trade-off hypothesis.” If so, I suspected there would be evidence that an approach that maximizes slugging, such as hitting fly balls and pulling the ball, would be associated with lower Real Contact scores.  Instead, FB% explained only 2.6% and Pull% only 2.4% of total variance in Real Contact.  If there is real trade-off between contact and power, I still can’t isolate it.

Dr. Alan Nathan has demonstrated that home runs and base hits are optimized by different swing strategies.  The implication is that there is a trade-off between base hits and power. Perhaps a contact swing is a base-hit swing. I tested this notion, and Figure 6 plots the relationship.

babip and contact

Figure 6.  BABIP and Real Contact.

Surprisingly, contact and BABIP are unrelated.  This is a counter-intuitive null finding, like the non-association between LD% and Hard%. In this case, I think base-hit skill requires more than not-missing.

I can’t test my final explanation, but I think selective sampling could explain the remaining small association between contact and power.  Since hitters need to achieve a minimum level of success to stay in the league, it seems unlikely for hitters to lack both power and contact skills.  Further, a hitter deficient in one skill would need to make it up with the other to avoid being released.  Since I could not find evidence to support an adjustment-based trade-off between power and contact, I assume the skills are independent moving forward.

PART TWO: POWER, CONTACT, SPEED, AND DISCIPLINE

If power and contact are separate skills, how much does each contribute to a hitter’s overall production? What about speed and discipline?  To answer these questions, I conducted a multiple regression analysis with wRC+ as the dependent variable and Hard%, Real Contact, Spd, and O-Swing% included as predictors.  The predictors were chosen to reflect power, contact, speed, and discipline because they measure each construct without including outcome data that make up wRC+. A multiple regression allows us to measure the unique contribution of each predictor on wRC+ as well as the overall variance accounted for by all the predictors.

The correlation matrix for the four predictors and one dependent variable are presented in Figure 7.  Only Spd and Hard% have a zero-order correlation over .20, with an R2 of 11.6%.  The four skills are mostly unique, which means the model avoids statistical problems of multicollinearity and singularity.

Matrix
Figure 7. Correlation matrix indicating zero-order correlations in the top row, 1-tailed p-values in the second row, and sample size in the third row.

The results of the multiple regression are presented in Figure 8.  Note the adjusted R2 of .66 indicating that the four predictors explained 66% of total variance in wRC+.

Model Summary
Figure 8. Results of multiple regression.  Hard%, Real Contact, Spd, and O-Swing% predicted 66% of variance in wRC+.

The specific contribution of each measure is indicated in Figure 9.  The Part Correlation statistic describes the unique contribution (R) of each predictor to explaining wRC+. When considering all predictors together, Hard% accounts for 60% of the variance in wRC+. The remaining three skills provide only incremental value compared to hitting the ball hard.

Coefficients
Figure 9.  Coefficients and Correlations from multiple regression.

The Partial Correlation statistic indicates the proportion of the remaining variance explained by each predictor while controlling for the effects of the others.  In other words, when controlling for Hard%, Spd, and O-Swing%, Real Contact explains 24% of the remaining variance in wRC+.

The strength of the multiple regression approach is clear when comparing the zero-order correlations to the partial and part correlations.  In every case, the part and partial correlations are larger, suggesting that each predictor benefits from the inclusion of the others in the model. Further, the relationship between each skill and wRC+ seems more intuitive when the contribution of the other skills is accounted for.  For example, Spd has a slight negative association with wRC+ on its own, but a positive relationship accounting for 11% of the remaining variance when included with the other predictors. It makes sense that speed is helpful, all else being equal.  Similarly, Real Contact and O-swing% have larger, more intuitive relationships to wRC+ when controlling for all predictors.

CONCLUSION

I conducted this research from a coach and player’s perspective, with the goal of identifying the ideal composition of hitting skill. Previous research has already reported a strong association between Hard% and wRC+, and this study only reaffirms the contribution of Hard% to overall production.  Given the same amount of speed, discipline, and contact skill, hard-hit percentage accounts for over two-thirds of remaining variance in a hitter’s wRC+.

A novel finding of this study is that there is little to no trade-off between power and contact ability.  Almost all of the apparent effect was due to differences in how power hitters and light hitters are pitched.  Given the same pitches, power hitters can make as much contact as light hitters. For example, Albert Pujols ranks 10th in the sample in Hard% and 15th in Real Contact.

The truth about hitting is that every hitter is swinging the bat just about as fast as they can. They are racing 95+, so they don’t really have a choice.  That doesn’t leave a lot of room for a hitter to consciously swing easier.  The hitter can choose to take a “shorter” swing, but should only do so if it results in more hard contact (or the same amount and more overall contact). Hitting the ball hard is the name of the game. Making contact, running well, and being disciplined complete the package.


Pillar, Perez, and Our Common Bond

Oftentimes, preconceived notions inhibit our understanding of the game of baseball. From archaic methods of player evaluation to cultural expectations of players of varying ethnicity, each observer’s individual paradigm dramatically alters his or her view of the game.  Case in point, what common ground could Salvador Perez and Kevin Pillar possibly share beyond their profession? Perez, who recently inked a new contract extension with the Kansas City Royals, stands at a booming 6’3’’ and 240 lbs. Pillar measures in at a more svelte – for professional athletes, at least – 6’0’’ and 205 lbs. Perez signed with the Royals in 2006, at the age of 16, as an international free agent out of tumultuous Venezuela before Pillar had even graduated from Chaminade College Prep, a private Catholic school in San Fernando Valley. Pillar finally signed his first professional contract after being drafted by the Toronto Blue Jays in the 32nd round of the 2011 amateur draft, less than a month before Perez debuted in the Majors despite being two years Pillar’s younger. Examined from a cultural and personality standpoint, Perez and Pillar seem polar inverses of one another.

Herein lies the beauty of baseball, and sports in general – citizens from all walks of life can come together, abandon their differences, and enjoy a common passion. From first pitch to the final out, no one differentiates between the hulking, affable Venezuelan catcher and the agile, analytical outfielder. Indeed if you only considered their projected on-field contributions, you may discover them indistinguishable.

  G PA AB H 2B 3B HR R RBI BB% K% AVG OBP SLG OPS wOBA Fld WAR
Player A 126 531 504 137 25 2 17 54 67 3.6% 14.0% .272 .301 .431 .732 .313 5.0 3.1
Player B 142 595 556 153 34 3 12 67 66 4.3% 15.4% .275 .311 .410 .721 .312 6.5 2.8

 

Projections courtesy of FanGraphs’ Depth Charts, a combination of ZiPS and Steamer, provide us our best estimate of a player’s “true talent” level. No, projections are not infallible, but for this exercise they convey more than enough. Perez and Pillar share striking similarities in their statistical profiles. Solid defense up the middle, meager walk rates complemented by above-average strikeout rates; even the “old school” stats and classic triple-slash lines bear remarkable resemblance. The summary stats further these parallels; both players project around 3 fWAR for the upcoming season, while only one point of wOBA separates them. The only appreciable area of separation resides in base-running, where Pillar’s stolen bases give his BsR a four-run edge over Perez’. Otherwise, Pillar and Perez mirror each other with regard to their contributions on the diamond – the only facet we should judge players by. Perhaps more compelling, their overall approach. The below table lists each players’ plate discipline statistics from the 2015 season, as found on FanGraphs.

  O-Swing% Z-Swing% Swing% O-Contact% Z-Contact% Contact% Zone% F-Strike% SwStr%
Player A 43.3 % 68.5 % 54.7 % 73.5 % 90.7 % 83.2 % 45.1 % 60.6 % 9.0 %
Player B 40.9 % 63.0 % 51.4 % 73.6 % 90.1 % 83.2 % 47.3 % 65.1 % 8.5 %

 

Both player profiles match what we should have expected given their walk and strikeout rates above: free swingers, particularly at pitches outside the zone, with an above-average ability to make contact. (Statistically speaking, among qualified batters Perez and Pillar both rank in the top quartile in Swing%, the 96th percentile in O-Swing%, and top half in Contact%). Nonetheless, the proximity of their plate-discipline statistics encapsulates how comparably Perez and Pillar approach an at-bat.

Having not revealed the identity of the two stat lines* illustrates one of the charms of the game. No matter the background, personality, religion, whatever you may have, we convene to cherish a game we love. With an abundance of animosity arising over “playing the game the right way”, cultural lines oftentimes artificially divide us. We can choose to continue making these superficial discrepancies, or we can focus on what ultimately matters most, the product on the field and the joy it brings to our lives.

 

*For the curious (and spoilers for those who prefer the anonymity):

Player A – Salvador Perez, Player B – Kevin Pillar


Howie Kendrick Is Finished

Howie Kendrick is not the model of league-average consistency he seems like at first blush.  Kendrick is basically washed up.  Last year he posted numbers that would appear consistent with his performance since 2011:  BB% in the mid 5’s, K% in the mid-to-high teens, BABIP over .340, ISO hanging in at .114, and 2.1 WAR.  The plate-discipline numbers look stable, but the ISO and BABIP don’t.

The ISO was propped up by a 14.1% HR/FB that he is not going to repeat.  Last year he managed only a .114 ISO despite an elite FB distance of 305 feet, which was 14th-best in the majors.  His FB rate was the main culprit.  It has steadily declined since he arrived in the majors in 2007, bottoming out last year at 17%.  And he’s not going to have elite FB distance in 2016, and is unlikely to be anywhere close to his 2015 number.  He began his career in the low to mid 270s, peaked at 285 at age 28, and had been steadily receding back to the 270s until last year’s unlikely spike at age 32.  In all likelihood the 2015 number was driven by good fortune in a very small sample of fly balls.  Expect that number to be back in the low to mid 270s in his age-33 season.  If he hits the same number of fly balls in 2016 as he did in 2015, but his HR/FB% is cut roughly in half, he will hit 4-5 home runs.  Moreover, his 2015 hard-hit rate (29%) was in the bottom half for the first time in years, and his pull rate (27%) was a career low and good for third-lowest in the majors for all batters with at least 400 PA.  All of this points to an ISO below .100.

His BABIP won’t crater.  He doesn’t pop up and keeps the ball on the ground.  But his BABIP isn’t going to stay over .340 forever, and I would take the under in 2016.  Last year’s homers will be this year’s fly ball outs.  Overall, he’s not hitting the ball as hard.  Nor is he getting any faster.  And, because he can no longer pull the ball — particularly balls hit in the air — he should be getting easier to shift against.  Steamer’s projection of .324 seems about right.

Altogether, he’s looking at a .290-ish wOBA, bottom of the pile for regular second basemen.  Add in his projected league-average baserunning and defense, and he’s worth about 1 WAR.  Steamer has him at 2 WAR (based on a projected .316 wOBA); Zips projects 1.9 WAR (.317 wOBA); and the fans project 2.7 WAR (.322 wOBA).  These figures are double to triple what he is likely to produce.  Note, however, that the Dodgers are paying Kendrick $20 million for 2016 and 2017.  Assuming $8 million per WAR, the Dodgers are valuing him at only 1.25 WAR per season.  To no one’s surprise, it seems Friedman and company have this one right.  Also, Kendrick does have a career wOBA platoon split of .325 vs. righties and .340 vs. lefties.  One way to squeeze additional value from Kendrick (and keep him healthy) at this stage of his career might be a semi-platoon with Utley, who himself sports a career platoon split and projects better against righties than Kendrick.


The Park Effect: Ignore Minnesota’s Korean Slugger at Your Peril

The Premise: Byung-ho Park will be a very good, and potentially great, first baseman/DH as soon as this season.

The Format: A typical line of discourse between a Park believer — such as myself — and a Park-skeptic.

The First Argument: Park comes from a league with little track record of successful MLB transplants — after all, if Eric Thames can be a star, how good can the league be?

The Rebuttal: It is true that the Korean Baseball Organization (KBO) has sent very few players to the major leagues. However, consider these caveats before rendering judgement. Unlike in Japan, in which baseball has ruled supreme for decades, the sport has only really taken off in Korea in the last 20 years, spurred largely by the success of Chan-ho Park in Korea and then in the majors. Now, however, the country is baseball-crazy: their national team is among the best in the world and the KBO is by far the most popular professional sports league in Korea. This dramatic rise in interest has led to a correspondingly dramatic rise in baseball infrastructure as more talent is discovered and developed from an early age. The early success of Hyun-jin Ryu and Jung-ho Kang in the United States speaks to the ability of the Korean infrastructure to develop its top-tier talents. Korean national teams regularly beat Americans and others on the international stage. The notion that Korea is not on the same level as a baseball-playing nation as Japan, Cuba, the Dominican Republic et al. is a farce.

The Second Argument: Park strikes out too much to be an effective major-league player.

The Rebuttal:  There are two responses to this, one league-oriented and one player-oriented. Implicit in this argument is the notion that the KBO is sufficiently worse than the MLB that all numbers should be significantly adjusted to account for better pitchers in the MLB. While the average KBO pitcher is undeniably worse than the average MLB pitcher, it is worth noting that Cuban League pitching is also decidedly below-average (see this piece by BA’s very talented international correspondent Ben Badler), and Cuban hitters are being snatched up like airline tickets after a decimal point error.

Second, a look at Park’s past seasons reveals an interesting shift in approach. Park’s K% in 2012 and 2013 was 19.8% and 17.2%, respectively, and his slugging percentages were .561 and .602. In 2014, his slugging percentage jumped to .686, but his K% also climbed to 24.8%. Since strikeout rate is a stat which normalizes fairly quickly — 60 PAs, according to FanGraphs — and the overall KBO strikeout rate actually declined from 2013 to 2014 (from 17.3 percent to 16.7 percent), we have to assume that Park changed something in his approach.* My conclusion, given what we know about power hitters striking out more in general, is that Park decided to trade contact for power, much like Mike Trout did before the 2014 season. This is indicative both of Park’s recognition of his strengths as a player, which speaks to his baseball intelligence and ability to learn, and also to his adaptiveness at the plate. If he is striking out too much, I am confident that he can reorient his approach and still be a highly valuable player.

The Caveats: There is, of course, no guarantee that Park will succeed in Minnesota. MLB competition is significantly better than any other league anywhere and there will be a learning curve for Park as he learns to hit MLB pitchers. The steeper hurdle in my mind, however, is culture: American culture is very different from the Korean culture with which he is comfortable. Kang Jung-ho, thanks to no small helping of self-confidence, a good team environment, and a penchant for the dramatic, has thrived in Pittsburgh, but there is no guarantee that Park will adjust as successfully or as quickly.

The Conclusion: These caveats aside, drafting (or signing) Byung-ho Park is a risk worth taking. He will be cheap and the upside is enormous. Acquire Park with confidence; there is a good chance that in the not-so-distant future, both you and the Twins will be the proud owners of one of the best power hitters, and best bargains, in baseball.**

 

*KBO stats pulled from baseball-reference.com
**Read Dan Farnsworth’s recently published Twins prospect list for further analysis of Park


xHR%: Questing for a Formula (Part 3)

Part 3 of a series of posts regarding a new statistic, xHR%, and its obvious resultant, xHR. This article will examine formulas 2 and 3. 

As a reminder, I have attempted to create a new statistic, xHR%, from which xHR (expected home runs) can be derived. xHR% is a descriptive statistic, meaning that it calculates what should have happened in a given season. In searching for the best formula possible, I came up with three different variations, pictured below.

Today, I’m going to examine formulas 2 and 3 to measure their viability as formulas for xHR%. Hopefully the analysis will shine some light on a murky matter. Likely, formula 2 will end up being the best one because it probably balances in-season performance with prior performance better than formula 3, which has a heavier reliance on in-season performance. Thus, it will end up correlating too well with what actually happened (the same outcome is likely for formula 2).

Methodology

Luckily for myself and the readers, the process was a simple one. Pulling data from FanGraphs player pages, ESPN’s Home Run Tracker, and various Google searches, I compiled a data set from which to proceed. From FanGraphs, I collected all information for Part Two of the formula, including plate appearances and home runs. Unfortunately, because a few of the players from the sample were rookies or had fewer than three years of major league experience, I had to use regressed minor league numbers. In some cases, where that data wasn’t applicable, I dug through old scouting reports to find translatable game power numbers based off of scouting grades (and used a denominator of 600 plate appearances).

Then, from ESPN’s Home Run Tracker website, I obtained all relevant data for player home-run distance, average home-run distance for the player at home, and league average home-run distance. Due to my limited time, I only used players that qualified for the batting title during the 2015 season, yielding a potentially weak sample of only 130 players. Additionally, before anyone complains, please realize that the purpose of my research at this point is to obtain the most viable formula and refine it from there so that it can be applied across a wider population.

Results for Formula 2

Using Microsoft Excel, I calculated the resultant xHR% and xHR. Some key data points:

League Average HR% (actual):  3.03%

Average xHR%:  2.89%

Average Home Runs: 18.7

Expected Home Runs: 17.8

Please note that there is a significant amount of survivorship bias in this data. That is, because all of these players played enough to qualify for the batting title, they are likely significantly better than replacement level, which is why the percentages and home runs seem so high.

Correlation between xHR% and HR%: 0.974418884

R² for above: 0.949492162

HR% Standard Deviation: 1.5769373

xHR% Standard Deviation: 1.4265261

Correlation between xHR and HR: 0.977796283

R² for above: 0.956085571

HR Standard Deviation:  10.43771886

xHR Standard Deviation: 9.474596069

Results for Formula 3

League Average HR% (actual):  3.03%

Average xHR%:  2.92%

Average Home Runs: 18.7

Expected Home Runs: 18.1

Again, note the survivorship bias that comes with having a slightly skewed sample

Correlation between xHR% and HR%: 0.986440621

R² for above: 0.973065099

HR% Standard Deviation: 1.5769373

xHR% Standard Deviation: 1.4615323

Correlation between xHR and HR:0.988287804

R² for above:0.976712783

HR Standard Deviation:  10.43771886

xHR Standard Deviation: 9.698203408

Mostly Boring Analysis

I have opted to condense the analysis into one section instead of two because it would have otherwise been repetitive and boring.

I understand that that’s a lot to process, but the data really isn’t all that dissimilar. The expected home-run percentage is slightly lower than the actual home-run percentage for both of them, but it isn’t a massive difference by any means. When prorated to a 600 plate appearance season, xHR% for formula 2 predicts that the average player in the sample would have hit 17.3 home runs, while formula 3’s xHR% expects that the average home-run total would have been 17.5. In reality the average player hit 18.2 home runs per 600 plate appearances, so both were fairly close (maybe too close).

Both formulas had incredibly high correlations, with formula 3 correlating an insignificantly higher amount more. More importantly, formula 2 explains about 94% of the variance, while formula 3 accounts for 97%. The difference between those is relatively unimportant because they explain a very high amount of what occurred. Furthermore, p<.001, so the data must be statistically significant (actually many times lower than that).

Both formulas resulted in slightly lower standard deviations than what actually occurred, which is a recurring theme. In these formulas, the numbers have been clumped a little bit closer together and tend to underestimate rather than overestimate.

Players of Interest

Mr. Kole Calhoun – Last season he hit 26 home runs, but by both formulas he should have hit 3-4 fewer. Likely, this is because his only previous full season of home runs was in 2014, when he had only 17, in addition to the fact that I was forced to use scout grades for his third season. The scout grades were particularly off for Calhoun because he wasn’t even expected to be good enough for the majors, let alone be an above-average, high-value outfielder. Even though his overall offensive prowess declined slightly this past season (by 20 points of wRC+), he didn’t appear to be selling out for power, as his power profile numbers (FB%, Pull%, etc.) remained the same. Personally, I would expect him to regress next season, and I think the formula agrees with me.

Mr. Nolan Arenado – Arguably having the most unexpected offensive breakout of the season, he increased his home-run totals from 10 in 2013, to 18 in 2014, and finally to an astonishing 42 in 2015. While his totals were probably slightly Coors-inflated, they were real for the most part because his average home-run distance was excellent, in addition to the fact that 22 of his dingers came on the road. Arenado is young and likely to regress somewhat in the power department, but he is probably around to stay as a significant home-run threat. The formula was likely wrong on this one due to weighting of prior seasons, so go ahead and make the lazy Todd Helton comparison.

Mr. Carlos Gonzalez – Though Arenado’s teammate had the highest home-run total (40) of his career in 2015, it isn’t clear that he was anywhere near his peak statistically. His wRC+ was below his career average by six points, in addition to him being a net below-average player. All of this leads to the conclusion that he was selling out for power — which makes sense given that he lost over fifty points of batting average and on-base percentage from his 2010-13 peak years. While a viable argument could be made for his “subpar” performance being due to injuries, a better one could be made that his home runs were in part a result of playing half his games at Coors Field, where he hit 60% of his round-trippers. The formula says he should have hit about seven fewer home runs, which may be a best case scenario for next season given his penchant for injury. Additionally, while the Rockies are by no means full of talent, if Gonzalez continues his overall downward trend, he could get traded and lose the Coors advantage, or he could lose playing time.

Keep watch for a concluding piece in the next week. Criticism would be highly appreciated, but keep in mind that I’m still in high school and have yet to actually study statistics.


xHR%: Questing for a Formula (Part 2)

Part 2 of a series of posts regarding a new statistic, xHR%, and its obvious resultant, xHR, this article will examine formula 1. The primer, Part 1, was published March 4.

As a reminder, I have conceptualized a new statistic, xHR%, from which xHR (expected home runs) can be derived. Furthermore, xHR% is a descriptive statistic, meaning that it calculates what should have happened in a given season rather than what will happen or what actually happened. In searching for the best formula possible, I came up with three different variations, all pictured below with explanations.

HRD – Average Home Run Distance. The given player’s HRD is calculated with ESPN’s home run tracker.

AHRDH – Average Home Run Distance Home. Using only Y1 data, this is the average distance of all home runs hit at the player’s home stadium.

AHRDL – Average Home Run Distance League. Using only Y1 data, this is the average distance of all home runs hit in both the National League and the American League.

Y3HR – The amount of home runs hit by the player in the oldest of the three years in the sample. Y2HR and Y1HR follow the same idea. In cases where there isn’t available major league data, then regressed minor league numbers will be used. If that data doesn’t exist either, then I will be very irritated and proceed to use translated scouting grades.

PA – Plate appearances

(Apologies for my rather long-winded reminder, but if you really forgot everything from Part 1, then you should really invest in some Vitamin E supplements and/or reread the first post.)

The focus formula of this post is the first one, which also happens to be the one I think will work the least well because it relies too heavily on prior seasons to provide an accurate and precise estimate of what should have happened in a given season.

In the second piece of the formula, with only fifty percent of the results from the season being studied taken into account, it likely fails to take into account the fact that breakouts occur with regularity. As a result, it probably predicts stagnation rather than progress.

Methodology

Luckily for myself and the readers, the process was an incredibly simple one. Pulling data from FanGraphs player pages, ESPN’s Home Run Tracker, and various Google searches, I compiled a data set from which to proceed. From FanGraphs, I collected all information for Part Two of the formula, including plate appearances and home runs. Unfortunately, because a few of the players from the sample were rookies or had fewer than three years of major league experience, I had to use regressed minor league numbers. In some cases, where that data wasn’t applicable, I dug through old scouting reports to find translatable game power numbers based off of scouting grades (and used a denominator of 600 plate appearances).

Then, from ESPN’s amazingly in-depth Home Run Tracker website, I obtained all relevant data for player home run distance, average home run distance for the player at home, and league average home run distance. Due to my limited time, I only used players that qualified for the batting title during the 2015 season, yielding an iffy sample of only 130 players. Additionally, before anyone complains, please realize that the purpose of my research at this point is only to obtain the most viable formula and refine it from there.

Results

Using Microsoft Excel, I calculated the resultant xHR% and xHR. Some key data points:

League Average HR% (actual):  3.03%

Average xHR%:  2.85%

Average Home Runs: 18.7

Expected Home Runs: 17.7

Please note that there is a significant amount of survivorship bias in this data. That is, because all of these players played enough to qualify for the batting title, they are likely significantly better than replacement level, which is why the percentages and home runs seem so high.

Clearly, the numbers match up fairly well, with this version of the formula expecting that the league should have hit home runs at a .18% lower clip, and one fewer per player, which amounts to a significant difference. Over the course of a 600 plate appearance season, the difference between them is still only a little more than one home run, an acceptable distance.

Correlation between xHR% and HR%: 0.960506092

R² for above: 0.922571953

HR% Standard Deviation: 1.5769373

xHR% Standard Deviation: 1.3883746

Correlation between xHR and HR: 0.966224253

R² for above: 0.933589307

HR Standard Deviation:  10.43771886

xHR Standard Deviation: 9.201355342

While xHR% using this formula apparently explains about 92% of the variance, correlation may not be the best method of determining whether or not the formula works adequately. This holds at least for between xHR% and HR%, because there’s only a minuscule difference between their numbers (but one that matters), meaning it’s not a particularly explanatory method and that it may not have the descriptive power I’m looking for. Nevertheless, it is important to note that the correlation is not a product of random sampling, as p<.005. Unsurprisingly, the standard deviation for xHR% is smaller than that of HR% (nearly insignificantly so), indicating that the data is clumped together close to the mean as a result of using this formula, a potentially good thing (in terms of regression).

A better indicator of the success of the formula is the correlation between xHR and HR, a relatively high value of ≈.97. Here, presumably because the separation between home runs and expected home runs is greater, the formula ostensibly explains approximately 94% of the variance in outcomes and resultant data. However, in this case, the standard deviation for actual home runs is about 10.4, while for xHR it’s about 9.2, suggesting that, after being multiplied out by plate appearances, xHR is spaced nearly as evenly as HR. Ergo, it likely serves as a decent predictor of actual home runs.

Players of Interest

Mr. Bryce Harper – It’s likely there isn’t a better candidate for regression according to this formula than Bryce Harper, who the formula says have hit only 32 home runs as opposed to his actual total of 42. While he did lead his league in “Just Enough” home runs with 15, he’s also always been known for having prodigious power (or at least a potential for it). Furthermore, Mr. Harper dramatically changed his peripherals last season to ones more conducive to power. Suggesting this are the facts that he increased his pull percentage from 38.9% to 45.4%, his hard hit percentage from 32% to 40%, and his fly ball percentage from 34.6% to 39.3%. On their own, all of the previous statistics lend credence to the idea that Harper changed his profile to a more home-run-drive one, but when taken together they significantly suggest that. His season was no fluke, and the formula certainly failed him here because it weighted prior seasons far too heavily.

Mr. Brian Dozier – No surprises here. Mr. Dozier has certainly been trending upward for a long time, and in a model that heavily weights prior performance such as this one, upticks in performance are punished. Nevertheless, the data vaguely supports the idea that Dozier should have hit 24 home runs instead of 28. While he did significantly increase his pull percentage to an incredibly high 60% from 53%, he did play in a stadium where it’s of an average difficult to hit pull home runs as a right-handed hitter. Moreover, 10 of his 28 home runs were rated as “Just Enough” home runs, in addition to his average home-run distance being 12 feet below average (admittedly not a huge number, nor a perfect way of measuring power). If I were a betting man, I’d expect him to hit 4-6 fewer home runs this coming season.

Keep watch for Part 3 in the coming days, which will detail the results of the other formulas. Something to watch for in this series is the issue that the results of the formula correspond too closely to what actually happened, which would render it useless as a formula.

Note that because I have never formally taken a statistics course, I am prone to errors in my conclusions. Please point out any such errors and make suggestions as you see fit.


ZiPS, Steamer and Fans Projections, Visualized

Steamer and ZiPS, the two main projection systems used at this site, have similar outlooks on the futures of most players. However, the two models vary widely in a few cases, and it can be confusing to figure out why.

To try to visualize exactly how ZiPS, Steamer and the FanGraphs Fan Projections looked at players, I first averaged all three systems’ 2016 predictions for each player. Then, after calculating how far each projection was from this average, I performed principal component analysis to compare the differences in outlooks for all 284 players. (Fan scores are adjusted so that they would have the same average as Steamer and ZiPS.)

I primarily looked at three predicted stats: wOBA (for general offense), Fielding (for general defense), and WAR per 600 plate appearances (for general value).

The results:

Projected Offense (wOBA):

(Each arrow points towards the direction where it projects a player higher; for instance on this graph, Daniel Murphy is much better liked by Steamer than by the Fans, while Colby Rasmus is much better liked by ZiPS than the Fans. Players towards the middle are well-balanced among the three.)

Projected Defense (Fld):

Projected Defense (Fld)

(This one is pretty crowded, but the players in the middle aren’t that interesting; it’s the ones on the outside we’re looking for.)

Projected Overall Value (WAR/600 PA):

Projected Overall Value (WAR/600 PA)

It seems like ZiPS seems to favor lumbering home-run hitters more than the other two systems, but it’s tough to make any hard conclusions without a further analysis that eyeballing these graphs can’t provide.