Archive for Research

The Truth About Hitting the Ball Hard

March 19, 2016

I recently presented evidence that power and contact are independent skills. An increase in power does not have to come at the cost of contact. Surely intuition disagrees with these findings and when that happens you should be skeptical. I would be skeptical.

One reason a trade-off between power and contact is intuitive is that we are accustomed to speed-accuracy trade-offs for many everyday actions. For example, we slow down when we pour a fresh cup of coffee because going too fast is dangerous. Implicitly, we assume there is a speed-accuracy trade-off when we suggest that hitters can cut down on their swing to achieve more contact. Richard A. Schmidt is like the Bill James of my field — motor behaviour — and in 1979 he and his colleagues published the Theory of Accuracy for Rapid Tasks. According to Google it has been cited over 1200 times. While speed-accuracy trade-offs for movement are typical, the theory explains that rapid timing tasks like hitting are an exception to this rule.

The theory is a dense 46-pager including equations, but I’ll provide a couple critical graphs to illustrate its implications to hitting. First, Figure 1 presents the results of an experiment that investigated the effect of movement distance and movement time on spatial error.

Figure 1. Spatial error (“W_e”–deg.) as a function of movement time (MT–msec) and movement distance (A–deg).

The results indicate that movement time, that is, movement speed, had almost no impact on spatial error. You’ll notice that the movement times tested in this experiment are conveniently reflective of a short and a long MLB swing (per Zepp). The movement distances in the experiment were shorter than a swing, and the task far simpler, but the results are suggestive nonetheless.

A second experiment explored the effect of movement speed and distance on timing error. Unlike the experiment above, movement speed did have an effect on timing error. Figure 2 presents data indicating that faster movements result in significantly less timing error than slower movements, irrespective of movement distance.

Tempral error
Figure 2. Timing error (VE_t–msec) as a function of movement time (MT–msec) and distance (A–deg).

In addition to these two examples, there is a substantial empirical and theoretical framework suggesting rapid timing tasks are exempt from a speed-accuracy trade-off. Swinging slower does not increase a hitter’s chance to make contact. On the basis of these data and the data I presented previously, it seems that hitters can try to hit the ball as hard as possible, within reason, without sacrificing contact or base-hit skill.

UNDERSTANDING HARD%

Power, contact, speed and discipline account for 66% of variance in hitting production. Power, measured by Hard%, is by far the most important skill. But what does Hard% measure, exactly? The description of Hard% can be found in the glossary here. Basically, Hard% describes the proportion of batted balls that meet an unknown criteria for “hardness,” and depends on hit-type, hang-time, landing-spot, and trajectory. Importantly, Hard% does not include exit speed in its calculation.

In the plot below, Average Exit Speed for players with a minimum of 190 Abs in 2015 is plotted against their Hard%. It is pretty clear from Figure 3 that while Hard% doesn’t directly measure exit speed, it does a pretty good job of estimating it.

Figure 3. Average Exit Speed and Hard%.

Given the tight relationship between Average Exit Speed and Hard%, I wondered if both measures were equally effective at predicting production. The graphs in Figure 4 and Figure 5 present both power measures plotted against wRC+.

Hard and wRC+
Figure 4. Hard% and wRC+.

Figure 5. Average Exit Speed and wRC+.

Hard% does a better job of predicting production than Average Exit Speed, explaining about 23% more variance. Since exit speed is a more direct measurement of power than Hard%, it follows non-power related data included in Hard% are relevant to production. Previous research suggests that hit-type and trajectory are important to the outcome of a batted ball, and since both variables are used to calculate Hard%, it seems likely they contribute to the relationship between Hard% and wRC+.

INTRODUCING LIFT BIAS

Trajectory is tightly linked to outcome and hitters only control the trajectory (or angle) they intend to hit the ball on. We have no way to measure hitters’ intentions. The only data on vertical launch angle that I’ve been able to access are extremely limited, or incomplete, so we can’t estimate hitters’ intentions based on results. If we had a database of swing-plane information we could estimate each hitter’s intentions based on his average swing plane relative to the pitch, but we don’t have such a database. What we do have are data on each hitter’s average exit velocity on ground balls, as well as their average exit velocity on line drives and fly balls. If we assume that each hitter is trying to hit the ball as forcefully as possible along their intended trajectory, and further assume that over the course of a season exit velocity will be maximal around the force vector intended by the hitter, then we can infer each hitter’s bias toward lower or higher trajectory hits by subtracting their average ground-ball velocity from their average line-drive / fly-ball velocity. The lower the resultant value, the lower the trajectory we can assume the hitter intended. I examined the relationship between AvgLD/FB – AvgGB (or, Lift Bias) and Hard% and the results are in Figure 6 below.

Figure 6. Lift Bias and Hard%.

Almost every hitter in the sample hit the ball harder in the air than on the ground. Only Melky Cabrera, Jason Heyward, and Nick Markakis hit their ground balls harder than their line drives and fly balls in 2015. As suspected, almost every hitter appears to be trying to hit the ball in the air. There is an apparent relationship between Lift Bias and Hard%, suggesting that hitters who intend to hit the ball on a higher angle tend to record more hard hits per contact. To see if this was due to harder hitters choosing to lift the ball more, I examined the relationship between Average Exit Speed and Lift Bias and the results are presented in Figure 6 below.

Figure 6. Average Exit Speed and Lift Bias.

Surprisingly, there is practically no relationship between Average Exit Speed and Lift Bias. This suggests that Lift Bias is associated with Hard% independent of how forcefully a hitter strikes the ball. Since Lift Bias and Average Exit Speed are independent predictors of Hard%, I modeled the effect of both simultaneously with multiple regression. The model explained 75% of variance in Hard% overall, and the part and partial correlations are reported in Figure 7 below.

Figure 7. Multiple regression coefficients.

The part correlation value in Figure 7 indicates the unique variance explained by each predictor. Thus, Average Exit Speed explained 52% of the total variance in Hard%. The partial correlation value describes the proportion of the remaining variance explained by one predictor after accounting for the other. Thus, after accounting for Average Exit Speed, Lift Bias explained 26% of the remaining variance in Hard%.

In order to determine how much of the relationship between Hard% and production can be accounted for by Average Exit Speed and Lift Bias, I plotted predicted Hard% against wRC+. The results indicate that Average Exit Speed and Lift Bias together account for almost, but not quite all of the relationship between Hard% and wRC+. See Figure 8 below.

Figure 8. Predicted Hard% and wRC+.

If you compare Figure 8 and Figure 4, you can see that real Hard% still explains more of wRC+ than predicted Hard%, but the predicted values are getting close. Since Hard% is based on the result of each hit rather than a tendency to hit balls harder in the air or on the ground, it makes sense that Hard% should be more related to performance. It is impressive that two variables not directly measured in Hard% explain so much of its variance, as well as such a high percentage of its relationship to wRC+.

DOES LIFT BIAS COME WITH A TRADE-OFF?

One of the most interesting results described above is the null relationship between exit speed and Lift Bias, suggesting that an increase in Lift Bias may be beneficial regardless of power. Yet again, intuition kicks in protesting that while it might be more effective for power hitters to try to lift the ball, when light hitters lift the ball the result is a fly out. Since Lift Bias is unrelated to exit speed, examining the relationship between Lift Bias and BABIP should give a hint as to whether increasing Lift Bias decreases the chances of getting at least a single.

Lift Bias and BABIP
Figure 9. Lift Bias and Batting Average on Balls in Play (BABIP).

Lift bias apparently has no relationship to BABIP, which seems counterintuitive. Does lift bias even have an effect on batted-ball type? Not really. The relationship depicted in Figure 10 below is the strongest of all, and even then Lift Bias only explains 8% of the total variance in GB%.

Lift Bias and GB
Figure 10. Lift Bias and Ground Ball Rate (GB%).

The launch angle of a batted ball depends more on the offset of the ball and bat at contact than on the attack angle of the swing. Thus, perhaps it shouldn’t be too surprising that an ostensible measure of swing plane has little relationship to batted ball distribution. While offset largely determines launch angle, swings that have more positive attack angles (to a point) are more optimal for batted ball distance. If Lift Bias is based on a more positive attack angle, we might expect to see a positive relationship between Lift Bias and HR/FB. In fact, as shown in Figure 11, Lift Bias accounts for 30% of the variance in home runs per fly ball.

Lift Bias and HR/FB
Figure 11. Lift Bias and Home Runs per Fly Ball (HR/FB).

Lift Bias has a strong relationship to average distance, and a smaller but still significant relationship to maximum recorded distance as well. These data suggest that swing plane may be responsible for at least part of the observed Lift Bias, since increased Lift Bias seems to optimize batted-ball distance.

If swing plane does drive Lift Bias, one might expect a trade-off between Lift Bias and contact skill. Since pitches are typically thrown on a negative angle of around 6 degrees, and attack angles exceeding 6 degrees can result in farther hits, it follows that hitters may be using a more severe uppercut than a 6 degree “level” swing to generate Lift Bias.

I used the Real Contact measure from my previous study to estimate contact skill for the hitters who have data in the 2015 sample. The results indicated that Lift Bias is negatively associated with Real Contact, accounting for about 20% of the variance. This is the first hint of the nuance between slugging and contact, suggesting that hitters may be using steep swing planes to generate lift. Conversely, Real Contact was unrelated to Average Exit Speed, confirming the absence of a trade-off between force and accuracy.

COMPARISON OF PLAYERS WITH MOST OR LEAST LIFT BIAS

It still seems counterintuitive that all players would benefit from having a lift bias in the top range of the sample. Is it possible that players at either end of the Lift Bias distribution are especially powerful or light-hitting, causing the appearance of a true relationship but reflecting only selective sampling? To examine the players with the most extreme Lift Bias (or lack thereof), I divided the sample into two groups with the 50 most Lift Biased and 50 least Lift Biased players. First, I tested for differences in the potential to generate power by comparing the two groups on maximum recorded exit speed. The group with the most Lift Bias had a mean Max Exit Speed of 111mph, while the low Lift Bias group had a mean of 110mph. There is little difference in power potential between the most Lift Biased players and the least.

Next, I tested for differences in power production by comparing the groups on HR/FB. As you can see in Figure 12, the high Lift Bias group (.167) saw their fly balls leave the park over twice as often as the low Lift Bias group (.074).

Group means: Power
Figure 12. Mean HR/FB for the Low Lift Bias and High Lift Bias groups. Error bars represent 95% confidence intervals.

Finally, I compared the two groups on overall production. The high Lift Bias group had a mean wRC+ of 117, while the low Lift Bias group had a mean of 93. The players with the largest Lift Bias are, on average, substantially better than league average. Conversely, the players with the smallest Lift Bias are somewhat worse than the league average. Figure 13 presents the observed means with error bars representing 95% confidence intervals.

Group means: Production
Figure 13. Mean wRC+ for the Low Lift Bias and High Lift Bias groups.

The players with a large Lift Bias have basically the same power potential as the players with the least bias, yet they have much more power production. The extra power production completely accounts for the difference in overall production between the groups, which is substantial.

CONCLUSION

Over the last two articles, I have been detailing a hierarchy of measurable skills that explain the majority of variance in hitting production. Further, I have demonstrated that there is little trade-off between skills. Fast exit velocity does not come at the expense of contact, and Lift Bias does not come at the expense of base hits. There does appear to be a small trade-off between Lift Bias and contact, suggesting that situational hitting could require adjusting swing plane or intended trajectory.

Power is the most important skill to production and is comprised of two sub-skills: Hitting balls harder on average (measured by Average Exit Speed), and generating more Lift Bias (measured by subtracting AvgGB velocity from AvgLD/FB). The next most important is contact skill, which was estimated by parceling the effect of Fastball% out of True Contact (a location-independent measure of contact), to provide an estimate of real contact ability independent of how a hitter is pitched. Finally, speed and discipline (represented by Spd and O-Swing%) are equally important skills, but much less important than power. Figure 14 depicts the relative importance of each skill in estimating production.

Figure 14. The relative importance of hitting skills.

It is tempting to assume this model is causal, when in fact the data are all correlational. If the data were causal, the conclusions for hitting coaches would be obvious: a) Optimizing exit speed with efficient mechanics and hard work should be an ongoing goal for every player, b) Players should focus on driving the ball in the air and the hitting coach should help his hitters optimize their Lift Bias, c) Equally important, hitters should practice their contact skills against all pitch types on a situational basis, d) Discipline, which can be trained, should get about half the attention that contact receives, and e) The league is full of underachievers – assuming Lift Bias is a learnable skill.

Science will require experimental evidence before concluding that the skill hierarchy provides a causal explanation of hitting production. Hitters and coaches may not want to wait around. Hey, Kevin Pillar! Give me a call…

Using Contact Rates to Evaluate Pitchers

by acastle_10

March 19, 2016

A little over a month ago, I published this piece detailing the methods that I had created to alternately assess hitter performance. I highly recommend glancing at that article before reading this one; it will make a whole lot more sense. For the lazy, here is a brief primer: I focused on using rates (contact, hard%, etc.) to create rough estimates of what would happen on any given pitch. What is the probability that Mike Trout hits a hard line drive on a pitch in the strike zone? The more a player does that, is he more likely to be a successful hitter overall? One of the advantages of this approach is that it helps to remove the actions of a hitter from his circumstance; a hard line drive is a hard line drive, but the placement of it will greatly affect whether or not the player reaches base. Poor defense, such as one may find in the minor leagues or college ball, is made less important in judging a player.

On of the questions remaining was whether or not I could apply some of these same methods to evaluating pitching. So far, the answer is a qualified yes. We already have a number of metrics to determine pitching value without regard for circumstance, but these methods still provide useful insights. Using the existing methods, such as xFIP, we can determine which rate stats are strong indicators of success.

There is one result that emerged above all else: there is no such thing as a weak-contact pitcher. There is a significant amount of talk about pitchers “keeping the ball in the park” or “getting weak ground balls.” However, this method indicates no such thing. By simply multiplying contact rates with “Soft%” for all 2015 qualified pitchers and therefore creating the “SoftXCont” statistic, I was able to search for any correlation between this rate and xFIP. Judge the results for yourself:

View post on imgur.com

Clearly, almost no correlation. However, remember that this only examines the aggregate; perhaps some specific pitchers can leverage this so-called skill to great effect. But, it appears that at least on average, generating weak contact is a poor indicator of overall pitching success.

The opposite is absolutely true. Pitchers who allowed less hard contact saw substantial increases in xFIP, as measured by my “HardXCont” number.

View post on imgur.com

The correlation is relatively strong, especially compared to the correlations seen in other baseball metrics. Clearly there is something going on here; pitchers who allow less hard contact per pitch get better results. Duh. For an even more clean-cut view of this, we can look at GoodXCont, which uses a combination of “Hard” and “Medium” contact.

View post on imgur.com

That correlation is excellent, and indicates that measuring GoodXCont would be a significantly powerful way of evaluating pitchers.

So, we see that pitchers who limit hard contact and good contact are more successful than their peers. We also see that allowing a large amount of soft contact is not indicative of overall success. The “weak contact” type pitchers (think Rick Porcello) are not necessarily succeeding thanks to any particular ability to generate soft contact; any corresponding ability comes more from being able to allow less hard contact.

For scouts, this means finding pitchers who both limit total contact and allow only poor contact. By using these metrics, rather than the outdated ERA or a radar gun, they can get a strong impression of future big-league success.

In a future piece, I plan to dive deeper into research on “soft contact” pitchers. While these initial results indicate that soft contact is not a good indicator of overall success, there is further work to be done. Stay tuned.

The Mariners are Finally Using Safeco Field Correctly

by Leland

March 17, 2016

It’s no trade secret that playing to the strengths of your ballpark helps your chances to succeed. To gain an advantage, franchises can exploit, and even sometimes manipulate their home ballpark. If you run the Astros or Reds, who play baseball in a lunchbox, you can succeed by employing otherwise-flawed home-run hitters with little regard for who gets on base ahead of them. When you play half your games in an airplane hangar, however, stubbornly attempting to put the ball over 900 foot fences is foolish. A foolish strategy common of recent Mariners teams. A foolish strategy that wasn’t working.

M’s Team Stats	OBP	ML Rank	SLG Pct.	ML Rank	wOBA	ML Rank
2015	.311	22	.411	12	.313	17
2014	.300	27	.376	21	.299	25
2013	.306	26	.390	20	.307	20
2012	.296	30	.369	30	.291	30
2011	.292	30	.348	30	.283	30
2010	.298	30	.339	30	.285	30

If you have a weak stomach, do not view the last few rows.

The Mariners wrote the Greatest Hits on failing to get on base and, not surprisingly, struggled to win games during those seasons. For years and years, the Mariners tried succeeding with players like Logan Morrison, Michael Morse, and Mark Trumbo, desperately clinging to the home run as the heralded harbinger of scoring runs. Whether this was evidence of a failing regime by general manager Jack Zduriencik remains up for debate, but the front office had seen enough. Around the same time, a wayward GM separated cleanly from the Mariners division rival Angels was seeking asylum, armed with his own vision of building a team.

Strategy 1: Get on Base

Jerry Dipoto, presumably having read Moneyball, understood the value of getting baserunners, and how to get players on base.

“Command the Strike zone” Dipoto told Justin Myers and Gee Scott on their ESPN 710 Seattle radio segment. “From the top of the lineup to the bottom, we will command the strike zone”.

Dipoto began addressing the team’s glaring need for baserunners by signing catcher Chris Iannetta, who had played for Dipoto in Anaheim, and had posted OBP numbers over .350 in 2011, 2013 and 2014. Dipoto found further help by trading for Adam Lind (.350 OBP in 2015) and signing free agent Norichika Aoki (.353 OBP in 2015, 6.4 K%).

None of these moves were meant to be earth-shattering, but each undoubtedly made the Mariners lineup better. With a solid core of Robinson Cano, Nelson Cruz, and Kyle Seager, Dipoto’s goal was to fill the remaining slots with valuable role players, each of whom is more than capable of getting on base.

Here is a table of several key Mariners offseason additions, with 2015 statistics, and 2016 ZIPS projections courtesy of Dan Szymborski. Note that season projections are often more conservative estimates, as they account for a certain level of player regression.

	OBP (2015, 2016)	wOBA (2015, 2016)	BB% (2015, 2016)	K% (2015, 2016)
Chris Iannetta	.293	.281	12.9	26.2
Chris Iannetta	.329	.306	14.0	25.8
Adam Lind	.360	.351	11.5	17.5
Adam Lind	.334	.315	10.1	19.5
Nori Aoki	.353	.326	7.7	6.4
Nori Aoki	.332	.313	7.0	7.8

Strategy 2: Prevent runs, Create runs

Dipoto, addressing the fallbacks of that revolutionary A’s season, also understood the value of defense and speed. “We see ourselves as a run-prevention club. You can create a lot of advantage playing good defense. We also see our overall team defense as our biggest area in need of improvement.”

Dipoto went primarily after well-rounded players, but several moves in particular focused on defense and speed. In November, Dipoto traded closer Tom Wilhelmsen to Texas in exchange for Leonys Martin, a light-hitting center fielder with blazing speed. Martin didn’t quite play enough innings (334) in 2015 to qualify for the CF leaderboard, but his 15.4 Ultimate Zone Rating/150 would have ranked him 5^th best among MLB center fielders, just above Lorenzo Cain. Martin, by the FanGraphs arm strength statistic, also had the strongest arm of any center fielder in baseball.

In terms of speed, Martin is as fast as they come. He’s been consistently valuable on the basepaths, posting a 4.3 and 4.2 BRR in 2014 and 2013 respectively (BRR is Baseball Prospectus’s baserunning statistic, where 0 represents an average baserunner). Martin posted a lower total BRR in 2015 (1.5), mostly because his on-base percentage dropped 61 points from 2014, and he appeared at the plate 273 fewer times (generally it’s harder to be a valuable baserunner if you don’t get on base as often).

The second move was to acquire Boog Powell, young center field prospect, from Tampa Bay. Powell was part of a larger trade, wherein Seattle received starting pitcher Nate Karns and Powell, and sent Logan Morrison and shortstop Brad Miller to the Rays. We’ll talk about Karns in the last section, but Powell further embodies Dipoto’s vision of commanding the strike zone, getting on base, and playing defense.

Powell’s defensive statistics are less clear than Martin’s, since Powell has never stepped foot in the major leagues, but he’s consistently graded out in the minor leagues as a plus defender. Powell is 22, and serves as outfield depth should Martin fall down a well in center field.

It’s clear that Dipoto aggressively wanted to improve the outfield defense. In his wild spree of moves, he also made his infield defense better. In trading for Lind, he incrementally made first base a more well-defended position (Lind posted a 3.8 UZR in 2015, compared to Logan Morrison’s -2.9). Brad Miller was a plus defensive shortstop (1.1 UZR, 4.6 dWAR), but with the emergence of talented, young Ketel Marte (1.2 UZR, 2.8 dWAR in 310 fewer innings at SS), Dipoto knew he could afford to trade Miller.

If one looks around at the Mariners in the field, Robinson Cano and Nelson Cruz are currently the only remaining defensive liabilities, and Cruz might not see much right-field time this year. Kyle Seager is a plus defender, Aoki is capable in left, and Seth Smith improved his defense dramatically last season. The team re-signed Franklin Guitierrez (3.4 UZR, 1.9 dWAR) to split Right Field with Smith and Cruz. At the catcher position, both Iannetta and Mike Zunino are among the 10 best pitch framers in baseball, saving an aggregate 26.8 runs in 2015.

The Mariners were the 5^th worst defensive team in 2015, but that looks likely to improve in 2016.

Strategy 3: Taking advantage of Dinger-hitting tendencies

When you play baseball in an extreme pitcher-friendly park, in a sea-level city whose summer nights are cool and humid, home runs are a rare commodity. The Mariners understand they won’t win by hitting home runs, but they also understand that the same difficulty exists for opposing teams. Thus, the Mariners can fill their starting rotation with pitchers with higher than average fly-ball rates. Here are the totals from Mariners starters in 2015. WARP is Baseball Prospectus’s cumulative wins above replacement player statistic.

	IP	FB %	GB%	BABIP	WARP
Felix Hernandez	201.2	26.9	56.2	.288	3.3
Taijuan Walker	169.2	39.0	38.6	.291	1.8
Hisashi Iwakuma	129.2	31.1	50.3	.271	2.5
James Paxton	67.0	34.4	48.3	.289	0.0
Roenis Elias	115.1	36.4	44.2	.280	0.9

Normally we’d expect a higher GB rate to correlate with a higher BABIP, since it’s more likely for ground balls to find holes and become hits than it is for fly balls. Felix has the highest GB rate of that table, and still maintained a better-than-average BABIP. That’s because he’s Felix Hernandez, and he’s better than you. Iwakuma, 34, also posted a ground-ball rate of 50%, and he’s never posted a BABIP above .287. After 2000 balls in play, a pitchers BABIP will normalize, and Iwakuma is quickly approaching that. Walker has the highest FB rate, so it’s probably good that he pitches where he does.

Before you even get beyond the innings pitched column, however, it’s clear the Mariners were thin on reliable starting pitching depth in 2015. Out of the players above, only Hernandez and Walker eclipsed 130 innings, only those two and Iwakuma provided any sort of positive contribution, and Roenis Elias is now on the Red Sox. So the offseason began, and Dipoto got to work.

Earlier we mentioned Boog Powell becoming a Mariner, but he came over as secondary piece that landed the team starting pitcher Nate Karns from Tampa Bay. Karns had a quasi-breakout season in 2015, posting a 3.67 ERA and 3.90 xFIP in 147.2 innings pitched (xFIP is a Fielding Independent Pitching statistic that takes fly-ball rate into account). This was the first full season for the 27-year-old Karns, who also had a 36.5% fly-ball rate in 2015. Of those fly balls, 12.5% went for home runs, an above-average rate for a starting pitcher. While Tropicana Field is not an especially friendly ballpark for hitters, every other park in the AL East dramatically favors home runs, and Karns’s HR rate was likely hurt by pitching frequently at parks like Yankee Stadium and Camden Yards.

Karns should be aided by the expansive parks of the American League West, where more fly balls will become outs. If Karns matches, or even exceeds his peripherals in 2016, while maintaining his high fly-ball rate (fly-ball rate normalizes after 70 fly balls, a total Karns exceeded long ago), he should lower his home-run rate, and his BABIP. Karns also has room for regression, as HR/FB doesn’t normalize until after about 500 IP.

There is a question of Karns’s durability, having only one major-league season with over 100 innings pitched, but no such question exists with Dipoto’s next trade target. A month after grabbing Karns, Dipoto traded Elias and closer Carson Smith to Boston for Wade Miley, one of the most consistently durable left-handed starters in the game. Smith was a bright spot in a bad Mariners bullpen, so Dipoto had to give up some value to acquire Miley, but the GM took that risk to bolster a shaky rotation. Miley has pitched more than 190 innings in four consecutive seasons: 2015 in Boston, and the previous three in Arizona. All of those years have featured FIPs below 4, and improvements across many categories in 2015, lowering his home run/9 rate by .24 despite pitching in the AL East. It’s no stretch of the imagination for Miley to improve even further in 2016, playing in front of an overhauled Mariners defense.

Miley and Karns, 2015 Statistics

Name	IP	FB%	GB%	BABIP	WARP
Nate Karns	147	36.5	41.9	.285	1.6
Wade Miley	193.2	30.5	48.8	.307	2.5

You start to see how exploiting these park advantages becomes mutually beneficial. A speedy outfield defense will turn more of Nate Karns’ fly balls into outs, and a more solid infield defense will help turn Miley’s ground-ball hits into outs as well. On the offensive side, players who don’t strike out will put the ball in play more often, and the increased speed of the lineup will turn more of those balls in play into hits, increasing the number of baserunners. If, with all of these improvements, we still believe in Nelson Cruz’s power, Kyle Seager’s upward trajectory, and continued King Felix domination, we believe in Mariners success.

The Truth About Power, Contact, and Hitting in General

by Brad McKay

March 16, 2016

The overarching purpose of this study was to identify the core skills that underlie hitting performance and investigate the extent to which hitters must choose between these skills. The article unfolds in two parts. In Part 1, I explore the ostensible trade-off between power and contact in search of the optimal approach. Then in Part 2, I show that 66% of variance in wRC+ can be explained by four skill-indicators: power, contact, speed, and discipline. It will be revealed that increasing hard contact should be of paramount importance to hitting coaches, while contact and discipline are complimentary assets.

PART ONE: IS THERE A POWER-CONTACT TRADE-OFF?

Eli Ben-Porat recently published a terrific study on the trade-off between contact ability and power and I will be building on his findings. As such, I will be using the same sample as his study, which includes all players since 2008 who have swung at 1000 pitches or more. First, I want to explain why it is assumed that there is a trade-off between power and contact. Not only is it intuitive that a hitter chooses between swinging for the fence and putting the ball in play — there is also clearly a trade-off between abilities among MLB hitters. Here is a plot of the relationship between SLG on Contact and Contact%.

SLG and Contact
Figure 1. Contact Rate and SLG on Contact.

There is a strong inverse relationship between power and contact, explaining 42% of total variance. However, Ben-Porat cited evidence that power hitters tend to face tougher pitches than light hitters, a factor that is likely to affect their contact rate. When Ben-Porat controlled for effect of pitch location on contact rate, the relationship between contact and power dropped to an R² of 33%. Figure 2 plots the relationship between Ben-Porat’s new True Contact, a location-independent measure of contact skill, and SLG on Contact.

SLG and True Contact
Figure 2. True Contact and SLG on Contact.

While controlling for location loosened the relationship between power and contact, there still appears to be a significant inverse correlation between the skills. Is this lingering relationship due to a necessary trade-off between hitting for power and making contact? I propose not. Instead, consider the relationship between Fastball% and SLG on Contact.

The graph in Figure 3 plots the relationship between percentage of fastballs faced and SLG on Contact.

SLG and Fastball%
Figure 3. Percentage of Fastballs Faced and SLG on Contact.

Predictably, pitchers tend to throw fewer fastballs to more powerful hitters. To parcel out the effect of pitch type, I examined the relationship between regular Contact% and SLG on Contact while controlling for Fastball%. This strategy is similar to Ben-Porat’s approach but controls for pitch type rather than location. The results of a simultaneous multiple regression analysis indicate that when holding Fastball% constant, Contact% explains just 12% of the variance in SLG on Contact. In other words, most of the relationship between Contact% and SLG on Contact was due to differences in the amount of fastballs faced.

To do a little better, I examined the relationship between Fastball% and True Contact. Figure 4 shows that Fastball% accounts for about a quarter of the variance in True Contact. Understandably, as Fastball% increases so does True Contact.

Fastball% and True Contact
Figure 4. Relationship between True Contact and Fastball%.

While True Contact controls for the location of pitches faced, it does not account for the proportion of fastballs faced. When the effect of Fastball% is held constant, True Contact accounts for just 9% of the variance in SLG on Contact. I computed a new Fastball%-independent version of True Contact, called Real Contact, and plotted it against SLG on Contact in Figure 5.

Figure 5. Relationship between Real Contact and SLG on Contact.

The plot resembles a shotgun distribution with only a slight relationship between power and contact left. It is possible this remaining relationship is due to what’s left of the “trade-off hypothesis.” If so, I suspected there would be evidence that an approach that maximizes slugging, such as hitting fly balls and pulling the ball, would be associated with lower Real Contact scores. Instead, FB% explained only 2.6% and Pull% only 2.4% of total variance in Real Contact. If there is real trade-off between contact and power, I still can’t isolate it.

Dr. Alan Nathan has demonstrated that home runs and base hits are optimized by different swing strategies. The implication is that there is a trade-off between base hits and power. Perhaps a contact swing is a base-hit swing. I tested this notion, and Figure 6 plots the relationship.

babip and contact

Figure 6. BABIP and Real Contact.

Surprisingly, contact and BABIP are unrelated. This is a counter-intuitive null finding, like the non-association between LD% and Hard%. In this case, I think base-hit skill requires more than not-missing.

I can’t test my final explanation, but I think selective sampling could explain the remaining small association between contact and power. Since hitters need to achieve a minimum level of success to stay in the league, it seems unlikely for hitters to lack both power and contact skills. Further, a hitter deficient in one skill would need to make it up with the other to avoid being released. Since I could not find evidence to support an adjustment-based trade-off between power and contact, I assume the skills are independent moving forward.

PART TWO: POWER, CONTACT, SPEED, AND DISCIPLINE

If power and contact are separate skills, how much does each contribute to a hitter’s overall production? What about speed and discipline? To answer these questions, I conducted a multiple regression analysis with wRC+ as the dependent variable and Hard%, Real Contact, Spd, and O-Swing% included as predictors. The predictors were chosen to reflect power, contact, speed, and discipline because they measure each construct without including outcome data that make up wRC+. A multiple regression allows us to measure the unique contribution of each predictor on wRC+ as well as the overall variance accounted for by all the predictors.

The correlation matrix for the four predictors and one dependent variable are presented in Figure 7. Only Spd and Hard% have a zero-order correlation over .20, with an R² of 11.6%. The four skills are mostly unique, which means the model avoids statistical problems of multicollinearity and singularity.

Matrix
Figure 7. Correlation matrix indicating zero-order correlations in the top row, 1-tailed p-values in the second row, and sample size in the third row.

The results of the multiple regression are presented in Figure 8. Note the adjusted R² of .66 indicating that the four predictors explained 66% of total variance in wRC+.

Model Summary
Figure 8. Results of multiple regression. Hard%, Real Contact, Spd, and O-Swing% predicted 66% of variance in wRC+.

The specific contribution of each measure is indicated in Figure 9. The Part Correlation statistic describes the unique contribution (R) of each predictor to explaining wRC+. When considering all predictors together, Hard% accounts for 60% of the variance in wRC+. The remaining three skills provide only incremental value compared to hitting the ball hard.

Figure 9. Coefficients and Correlations from multiple regression.

The Partial Correlation statistic indicates the proportion of the remaining variance explained by each predictor while controlling for the effects of the others. In other words, when controlling for Hard%, Spd, and O-Swing%, Real Contact explains 24% of the remaining variance in wRC+.

The strength of the multiple regression approach is clear when comparing the zero-order correlations to the partial and part correlations. In every case, the part and partial correlations are larger, suggesting that each predictor benefits from the inclusion of the others in the model. Further, the relationship between each skill and wRC+ seems more intuitive when the contribution of the other skills is accounted for. For example, Spd has a slight negative association with wRC+ on its own, but a positive relationship accounting for 11% of the remaining variance when included with the other predictors. It makes sense that speed is helpful, all else being equal. Similarly, Real Contact and O-swing% have larger, more intuitive relationships to wRC+ when controlling for all predictors.

CONCLUSION

I conducted this research from a coach and player’s perspective, with the goal of identifying the ideal composition of hitting skill. Previous research has already reported a strong association between Hard% and wRC+, and this study only reaffirms the contribution of Hard% to overall production. Given the same amount of speed, discipline, and contact skill, hard-hit percentage accounts for over two-thirds of remaining variance in a hitter’s wRC+.

A novel finding of this study is that there is little to no trade-off between power and contact ability. Almost all of the apparent effect was due to differences in how power hitters and light hitters are pitched. Given the same pitches, power hitters can make as much contact as light hitters. For example, Albert Pujols ranks 10th in the sample in Hard% and 15th in Real Contact.

The truth about hitting is that every hitter is swinging the bat just about as fast as they can. They are racing 95+, so they don’t really have a choice. That doesn’t leave a lot of room for a hitter to consciously swing easier. The hitter can choose to take a “shorter” swing, but should only do so if it results in more hard contact (or the same amount and more overall contact). Hitting the ball hard is the name of the game. Making contact, running well, and being disciplined complete the package.

xHR%: Questing for a Formula (Part 3)

by Jackson Mejia

March 8, 2016

Part 3 of a series of posts regarding a new statistic, xHR%, and its obvious resultant, xHR. This article will examine formulas 2 and 3.

As a reminder, I have attempted to create a new statistic, xHR%, from which xHR (expected home runs) can be derived. xHR% is a descriptive statistic, meaning that it calculates what should have happened in a given season. In searching for the best formula possible, I came up with three different variations, pictured below.

Today, I’m going to examine formulas 2 and 3 to measure their viability as formulas for xHR%. Hopefully the analysis will shine some light on a murky matter. Likely, formula 2 will end up being the best one because it probably balances in-season performance with prior performance better than formula 3, which has a heavier reliance on in-season performance. Thus, it will end up correlating too well with what actually happened (the same outcome is likely for formula 2).

Methodology

Luckily for myself and the readers, the process was a simple one. Pulling data from FanGraphs player pages, ESPN’s Home Run Tracker, and various Google searches, I compiled a data set from which to proceed. From FanGraphs, I collected all information for Part Two of the formula, including plate appearances and home runs. Unfortunately, because a few of the players from the sample were rookies or had fewer than three years of major league experience, I had to use regressed minor league numbers. In some cases, where that data wasn’t applicable, I dug through old scouting reports to find translatable game power numbers based off of scouting grades (and used a denominator of 600 plate appearances).

Then, from ESPN’s Home Run Tracker website, I obtained all relevant data for player home-run distance, average home-run distance for the player at home, and league average home-run distance. Due to my limited time, I only used players that qualified for the batting title during the 2015 season, yielding a potentially weak sample of only 130 players. Additionally, before anyone complains, please realize that the purpose of my research at this point is to obtain the most viable formula and refine it from there so that it can be applied across a wider population.

Results for Formula 2

Using Microsoft Excel, I calculated the resultant xHR% and xHR. Some key data points:

League Average HR% (actual): 3.03%

Average xHR%: 2.89%

Average Home Runs: 18.7

Expected Home Runs: 17.8

Please note that there is a significant amount of survivorship bias in this data. That is, because all of these players played enough to qualify for the batting title, they are likely significantly better than replacement level, which is why the percentages and home runs seem so high.

Correlation between xHR% and HR%: 0.974418884

R² for above: 0.949492162

HR% Standard Deviation: 1.5769373

xHR% Standard Deviation: 1.4265261

Correlation between xHR and HR: 0.977796283

R² for above: 0.956085571

HR Standard Deviation: 10.43771886

xHR Standard Deviation: 9.474596069

Results for Formula 3

League Average HR% (actual): 3.03%

Average xHR%: 2.92%

Average Home Runs: 18.7

Expected Home Runs: 18.1

Again, note the survivorship bias that comes with having a slightly skewed sample

Correlation between xHR% and HR%: 0.986440621

R² for above: 0.973065099

HR% Standard Deviation: 1.5769373

xHR% Standard Deviation: 1.4615323

Correlation between xHR and HR:0.988287804

R² for above:0.976712783

HR Standard Deviation: 10.43771886

xHR Standard Deviation: 9.698203408

Mostly Boring Analysis

I have opted to condense the analysis into one section instead of two because it would have otherwise been repetitive and boring.

I understand that that’s a lot to process, but the data really isn’t all that dissimilar. The expected home-run percentage is slightly lower than the actual home-run percentage for both of them, but it isn’t a massive difference by any means. When prorated to a 600 plate appearance season, xHR% for formula 2 predicts that the average player in the sample would have hit 17.3 home runs, while formula 3’s xHR% expects that the average home-run total would have been 17.5. In reality the average player hit 18.2 home runs per 600 plate appearances, so both were fairly close (maybe too close).

Both formulas had incredibly high correlations, with formula 3 correlating an insignificantly higher amount more. More importantly, formula 2 explains about 94% of the variance, while formula 3 accounts for 97%. The difference between those is relatively unimportant because they explain a very high amount of what occurred. Furthermore, p<.001, so the data must be statistically significant (actually many times lower than that).

Both formulas resulted in slightly lower standard deviations than what actually occurred, which is a recurring theme. In these formulas, the numbers have been clumped a little bit closer together and tend to underestimate rather than overestimate.

Players of Interest

Mr. Kole Calhoun – Last season he hit 26 home runs, but by both formulas he should have hit 3-4 fewer. Likely, this is because his only previous full season of home runs was in 2014, when he had only 17, in addition to the fact that I was forced to use scout grades for his third season. The scout grades were particularly off for Calhoun because he wasn’t even expected to be good enough for the majors, let alone be an above-average, high-value outfielder. Even though his overall offensive prowess declined slightly this past season (by 20 points of wRC+), he didn’t appear to be selling out for power, as his power profile numbers (FB%, Pull%, etc.) remained the same. Personally, I would expect him to regress next season, and I think the formula agrees with me.

Mr. Nolan Arenado – Arguably having the most unexpected offensive breakout of the season, he increased his home-run totals from 10 in 2013, to 18 in 2014, and finally to an astonishing 42 in 2015. While his totals were probably slightly Coors-inflated, they were real for the most part because his average home-run distance was excellent, in addition to the fact that 22 of his dingers came on the road. Arenado is young and likely to regress somewhat in the power department, but he is probably around to stay as a significant home-run threat. The formula was likely wrong on this one due to weighting of prior seasons, so go ahead and make the lazy Todd Helton comparison.

Mr. Carlos Gonzalez – Though Arenado’s teammate had the highest home-run total (40) of his career in 2015, it isn’t clear that he was anywhere near his peak statistically. His wRC+ was below his career average by six points, in addition to him being a net below-average player. All of this leads to the conclusion that he was selling out for power — which makes sense given that he lost over fifty points of batting average and on-base percentage from his 2010-13 peak years. While a viable argument could be made for his “subpar” performance being due to injuries, a better one could be made that his home runs were in part a result of playing half his games at Coors Field, where he hit 60% of his round-trippers. The formula says he should have hit about seven fewer home runs, which may be a best case scenario for next season given his penchant for injury. Additionally, while the Rockies are by no means full of talent, if Gonzalez continues his overall downward trend, he could get traded and lose the Coors advantage, or he could lose playing time.

Keep watch for a concluding piece in the next week. Criticism would be highly appreciated, but keep in mind that I’m still in high school and have yet to actually study statistics.

xHR%: Questing for a Formula (Part 2)

by Jackson Mejia

March 6, 2016

Part 2 of a series of posts regarding a new statistic, xHR%, and its obvious resultant, xHR, this article will examine formula 1. The primer, Part 1, was published March 4.

As a reminder, I have conceptualized a new statistic, xHR%, from which xHR (expected home runs) can be derived. Furthermore, xHR% is a descriptive statistic, meaning that it calculates what should have happened in a given season rather than what will happen or what actually happened. In searching for the best formula possible, I came up with three different variations, all pictured below with explanations.

HRD – Average Home Run Distance. The given player’s HRD is calculated with ESPN’s home run tracker.

AHRDH – Average Home Run Distance Home. Using only Y1 data, this is the average distance of all home runs hit at the player’s home stadium.

AHRDL – Average Home Run Distance League. Using only Y1 data, this is the average distance of all home runs hit in both the National League and the American League.

Y3HR – The amount of home runs hit by the player in the oldest of the three years in the sample. Y2HR and Y1HR follow the same idea. In cases where there isn’t available major league data, then regressed minor league numbers will be used. If that data doesn’t exist either, then I will be very irritated and proceed to use translated scouting grades.

PA – Plate appearances

(Apologies for my rather long-winded reminder, but if you really forgot everything from Part 1, then you should really invest in some Vitamin E supplements and/or reread the first post.)

The focus formula of this post is the first one, which also happens to be the one I think will work the least well because it relies too heavily on prior seasons to provide an accurate and precise estimate of what should have happened in a given season.

In the second piece of the formula, with only fifty percent of the results from the season being studied taken into account, it likely fails to take into account the fact that breakouts occur with regularity. As a result, it probably predicts stagnation rather than progress.

Methodology

Luckily for myself and the readers, the process was an incredibly simple one. Pulling data from FanGraphs player pages, ESPN’s Home Run Tracker, and various Google searches, I compiled a data set from which to proceed. From FanGraphs, I collected all information for Part Two of the formula, including plate appearances and home runs. Unfortunately, because a few of the players from the sample were rookies or had fewer than three years of major league experience, I had to use regressed minor league numbers. In some cases, where that data wasn’t applicable, I dug through old scouting reports to find translatable game power numbers based off of scouting grades (and used a denominator of 600 plate appearances).

Then, from ESPN’s amazingly in-depth Home Run Tracker website, I obtained all relevant data for player home run distance, average home run distance for the player at home, and league average home run distance. Due to my limited time, I only used players that qualified for the batting title during the 2015 season, yielding an iffy sample of only 130 players. Additionally, before anyone complains, please realize that the purpose of my research at this point is only to obtain the most viable formula and refine it from there.

Results

Using Microsoft Excel, I calculated the resultant xHR% and xHR. Some key data points:

League Average HR% (actual): 3.03%

Average xHR%: 2.85%

Average Home Runs: 18.7

Expected Home Runs: 17.7

Clearly, the numbers match up fairly well, with this version of the formula expecting that the league should have hit home runs at a .18% lower clip, and one fewer per player, which amounts to a significant difference. Over the course of a 600 plate appearance season, the difference between them is still only a little more than one home run, an acceptable distance.

Correlation between xHR% and HR%: 0.960506092

R² for above: 0.922571953

HR% Standard Deviation: 1.5769373

xHR% Standard Deviation: 1.3883746

Correlation between xHR and HR: 0.966224253

R² for above: 0.933589307

HR Standard Deviation: 10.43771886

xHR Standard Deviation: 9.201355342

While xHR% using this formula apparently explains about 92% of the variance, correlation may not be the best method of determining whether or not the formula works adequately. This holds at least for between xHR% and HR%, because there’s only a minuscule difference between their numbers (but one that matters), meaning it’s not a particularly explanatory method and that it may not have the descriptive power I’m looking for. Nevertheless, it is important to note that the correlation is not a product of random sampling, as p<.005. Unsurprisingly, the standard deviation for xHR% is smaller than that of HR% (nearly insignificantly so), indicating that the data is clumped together close to the mean as a result of using this formula, a potentially good thing (in terms of regression).

A better indicator of the success of the formula is the correlation between xHR and HR, a relatively high value of ≈.97. Here, presumably because the separation between home runs and expected home runs is greater, the formula ostensibly explains approximately 94% of the variance in outcomes and resultant data. However, in this case, the standard deviation for actual home runs is about 10.4, while for xHR it’s about 9.2, suggesting that, after being multiplied out by plate appearances, xHR is spaced nearly as evenly as HR. Ergo, it likely serves as a decent predictor of actual home runs.

Players of Interest

Mr. Bryce Harper – It’s likely there isn’t a better candidate for regression according to this formula than Bryce Harper, who the formula says have hit only 32 home runs as opposed to his actual total of 42. While he did lead his league in “Just Enough” home runs with 15, he’s also always been known for having prodigious power (or at least a potential for it). Furthermore, Mr. Harper dramatically changed his peripherals last season to ones more conducive to power. Suggesting this are the facts that he increased his pull percentage from 38.9% to 45.4%, his hard hit percentage from 32% to 40%, and his fly ball percentage from 34.6% to 39.3%. On their own, all of the previous statistics lend credence to the idea that Harper changed his profile to a more home-run-drive one, but when taken together they significantly suggest that. His season was no fluke, and the formula certainly failed him here because it weighted prior seasons far too heavily.

Mr. Brian Dozier – No surprises here. Mr. Dozier has certainly been trending upward for a long time, and in a model that heavily weights prior performance such as this one, upticks in performance are punished. Nevertheless, the data vaguely supports the idea that Dozier should have hit 24 home runs instead of 28. While he did significantly increase his pull percentage to an incredibly high 60% from 53%, he did play in a stadium where it’s of an average difficult to hit pull home runs as a right-handed hitter. Moreover, 10 of his 28 home runs were rated as “Just Enough” home runs, in addition to his average home-run distance being 12 feet below average (admittedly not a huge number, nor a perfect way of measuring power). If I were a betting man, I’d expect him to hit 4-6 fewer home runs this coming season.

Keep watch for Part 3 in the coming days, which will detail the results of the other formulas. Something to watch for in this series is the issue that the results of the formula correspond too closely to what actually happened, which would render it useless as a formula.

Note that because I have never formally taken a statistics course, I am prone to errors in my conclusions. Please point out any such errors and make suggestions as you see fit.

Hardball Retrospective – The “Original” 1905 New York Giants

by DerekBain

March 5, 2016

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. Accordingly, Vada Pinson is listed on the Reds roster for the duration of his career while the Red Sox declare Amos Otis and the Rockies claim Chone Figgins. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition. Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the finest single-season rosters for every Major League organization based on overall rankings in OWAR and OWS along with the general managers and scouting directors that constructed the teams. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com.

Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

Assessment

The 1905 New York Giants OWAR: 69.9 OWS: 348 OPW%: .634

Based on the revised standings the “Original” 1905 Giants edged the Phillies, seizing the pennant by three games. New York led the National League in OWS and posted the highest all-time OWAR.

Cy Seymour’s tremendous offensive outburst transformed the Giants’ attack. Seymour paced the circuit in seven major categories including batting average (.377), hits (219), doubles (40), triples (21), RBI (121), SLG (.559) and total bases (325). A .303 lifetime batter, Seymour never led the League in any categories during his other 15 MLB seasons. Harry H. Davis (.285/8/83) topped the home run charts in four consecutive campaigns. Danny F. Murphy ripped 34 two-base knocks and swiped 23 bags. Art Devlin pilfered a League-high 59 bases in his sophomore season. “Wee” Willie Keeler contributed 42 sacrifice hits along with a .302 BA – the twelfth of thirteen straight seasons with a batting average above the .300 mark. Keeler posted a career BA of .341 and collected at least 200 base knocks per year from 1894-1901.

Christy Mathewson leads the All-Time Second Basemen rankings according to Bill James in “The New Bill James Historical Baseball Abstract.” Teammates listed in the “NBJHBA” top 100 rankings include Seymour (30^th-CF), Keeler (35^th-RF), Murphy (51^st-2B), Devlin (58^th-3B) and Davis (60^th-1B).

LINEUP	POS	WAR	WS
Willie Keeler	RF	2.22	19.56
Danny F. Murphy	2B	4.04	25.62
Cy Seymour	CF	10.32	40.54
Harry H. Davis	1B	4.1	26.45
Art Devlin	3B	3.74	21.67
Dave Zearfoss	C	-0.35	0.5
Charlie Babb	SS	-1.07	3.32
Ike Van Zandt	LF/RF	-1.73	3.69

BENCH	POS	WAR	WS
Moonlight Graham	RF	-0.01	0
Offa Neal	3B	-0.17	0.15

Christy Mathewson (31-9, 1.28) dominated opposition batsmen as he topped the charts in victories, ERA, shutouts (8), strikeouts (206) and WHIP (0.933). Excluding 1902, “Big Six” tallied at least 20 wins per season from 1901-1914. The Hall of Fame hurler registered a lifetime won-loss record of 373-188 with an ERA of 2.13. Red Ames whiffed 198 batters and furnished a 22-8 mark with a 2.74 ERA. Dummy Taylor fashioned a 2.66 ERA and compiled 16 victories. Hooks Wiltse contributed a 15-6 mark with 2.47 ERA in 32 games (19 starts).

ROTATION	POS	WAR	WS
Christy Mathewson	SP	10.56	39.05
Hooks Wiltse	SP	3.56	18.38
Dummy Taylor	SP	2.04	14.76
Red Ames	SP	1.75	17.71

BULLPEN	POS	WAR	WS
Red Donahue	SP	-1.32	4.41

The “Original” 1905 New York Giants roster

NAME	POS	WAR	WS	General Manager	Scouting Director
Christy Mathewson	SP	10.56	39.05	John Brush
Cy Seymour	CF	10.32	40.54	John Brush
Harry Davis	1B	4.1	26.45	John Brush
Danny Murphy	2B	4.04	25.62	John Brush
Art Devlin	3B	3.74	21.67	John Brush
Hooks Wiltse	SP	3.56	18.38	John Brush
Willie Keeler	RF	2.22	19.56	John Brush
Dummy Taylor	SP	2.04	14.76	John Brush
Red Ames	SP	1.75	17.71	John Brush
Moonlight Graham	RF	-0.01	0	John Brush
Offa Neal	3B	-0.17	0.15	John Brush
Dave Zearfoss	C	-0.35	0.5	John Brush
Charlie Babb	SS	-1.07	3.32	John Brush
Red Donahue	SP	-1.32	4.41	John Brush
Ike Van Zandt	RF	-1.73	3.69	John Brush

Honorable Mention

The “Original” 1962 Giants OWAR: 52.6 OWS: 355 OPW%: .589

The Giants engaged in fierce late-season combat with the Braves and the Reds. “The Say Hey Kid” and his San Francisco teammates emerged with a hard-fought victory. Willie Mays (.304/49/141) supplied career-bests in runs (130) and RBI yet finished runner-up in the 1962 NL MVP balloting. The twelve-time Gold Glove Award winner retired in 1973 with 660 home runs, 2062 runs scored and 3283 base hits. Orlando “Baby Bull” Cepeda mashed 35 long balls, amassed 114 ribbies and registered 105 tallies. Felipe Alou (.316/25/98) and Leon “Daddy Wags” Wagner (.260/37/107) merited their first All-Star invitations. Seven-time Gold Glove Award winner Bill D. White swatted 20 big-flies, drove in 102 baserunners and produced a career-best .324 BA. Eddie Bressoud drilled 40 doubles while third-sacker Jim Davenport (.297/14/58) earned an All-Star nod along with the Gold Glove Award. Juan Marichal began a string of 8 consecutive All-Star appearances in ’62. The “Dominican Dandy” amassed 18 victories, completed 18 of 36 starts and compiled a 3.36 ERA.

On Deck

What Might Have Been – The “Original” 1904 Phillies

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive

xHR%: Questing for a Formula (Part 1)

by Jackson Mejia

March 4, 2016

One of the most important developments in statistics — and its subordinate field, sabermetrics — is the usage of multiyear data to produce an expected outcome in a given year. It’s an old concept, one that’s been around for centuries, but it likely originated in sabermetrics circles with Bill James. In Win Shares (arguably the birth of WAR), the sabermetric response to Principia Mathematica, he details a procedure of finding park factors wherein the calculator uses a weighted average of several years of data in conjunction with league averages to find park factors for a certain ballpark.

Methods such as Mr. James’s allow the amateur sabermetrician (and even the mighty professional statistician) to determine what ought to have happened over a specific time period. Essentially, a descriptive statistic. The best example of a descriptive statistic for the unlearned reader is xFIP, which basically describes what a pitcher’s fielding-independent average runs allowed would have been if the pitcher had a league-average home runs per fly ball rate.

Several statistics fluctuate greatly from year to year and are thus considered unstable. Examples include BABIP, HR/FB% for pitchers, and line-drive percentage. HR/FB% in particular is very fluid because all sorts of variables go into whether a ball leaves the park or not. For instance, on a particularly windy day, an otherwise certain dinger might end up in the glove of an expectant center fielder on the warning track instead of in the beer glass of your paunchy friend in the cheap seats. Rendered down, xFIP takes the uncontrollable out of a pitcher’s runs-allowed average.

With this, and an excellent article about xLOB% from The Hardball Times, in mind, I started developing my own statistic a few days ago. xHR%, as I dubbed it, attempts to find an expected home-run percentage, and from there one can easily find expected home runs (xHR) by multiplying xHR% by plate appearances, a more understandable idea to the casual baseball fan. In order to calculate this, I wrote several different (albeit very similar) formulas:

More likely than not, your eyes glazed over in that section, so I will explain.

HRD – Average Home Run Distance. The given player’s HRD is calculated with ESPN’s Home Run Tracker.

AHRDH – Average Home Run Distance Home. Using only Y1 data, this is the average distance of all home runs hit at the player’s home stadium.

AHRDL – Average Home Run Distance League. Using only Y1 data, this is the average distance of all home runs hit in both the National League and the American League.

Y3HR – The amount of home runs hit by the player in the oldest of the three years in the sample. Y2HR and Y1HR follow the same idea. In cases where there isn’t available major-league data, then regressed minor-league numbers will be used. If that data doesn’t exist either, then I will be very irritated and proceed to use translated scouting grades.

PA – Plate appearances

(For the uninitiated, HR% is HR/PA)

Essentially, what I have created is a formula that describes home-run percentage. First off, I used (.5)(AHRDH) + (.5)(AHRDL) in the denominator of the first part because a player spends half his time at home and half on the road. If I were so inclined, I could factor in every single stadium that gets visited, weight the average of them, and make that the denominator, but that’s just doing way too much work for a negligible (but likely more accurate) effect. Besides, writing that out in a formula would be a disaster because then there essentially couldn’t be a formula. Furthermore, having half of the denominator come from the player’s home stadium factors in whether or not the stadium is a home-run suppressor or inducer, which helps paint a more accurate picture of the player.

Dividing the player’s average HRD by(.5)(AHRDH) + (.5)(AHRDL) allows the calculator to get a good idea of whether or not the player was “lucky” in his home runs. If his average home-run distance is less than the average of the league and his home stadium, then it follows that he is a below-average home-run hitter and his home-run totals ought to be lesser.

Since the values in the numerator and the denominator will invariably end up close in value to each other, I decided that this part of the formula could be used as the coefficient (as opposed to just throwing it out) because it will change the end number only slightly. Moreover, the xCo (as I call it) acts as a rough substitute for batted-ball distance and park dimensions in order to factor those into the formula.

The second part, the meat of the formula, uses a weighted average of multiple years of home-run-percentage data to help determine what should have been the home-run percentage in year one (the year being studied). Basically, it helps to throw out any extreme outlier seasons and regress them back a little bit to prior performance without stripping out everything that happened in that season (notice that in every formula the biggest weight is given to the season studied).

At this juncture, I cannot say for certain how much weight ought to be given to prior seasons. Obviously, a player can have a meaningful and lasting breakout season, with continued success for the rest of his career, making it inaccurate to heavily weight irrelevant data from a season two years ago. On the other hand, a player can have a false breakout, making it better to include more data from previous seasons. Undoubtedly that will be the subject of future posts. At present, the formula is a developmental one that will no doubt experience heavy changes in the future.

For the interested reader, some prior iterations of the formula are below:

As a reminder, with some small addenda, here is the explanation for each variable:

HRDY3 – Average Home Run Distance Year Three (year three being the oldest of the three years in the sample). HRD is calculated with ESPN’s home run tracker. HRDY2 and HRDY1 follow the same idea.

AHRDH – Average Home Run Distance Home. Using only Y1 data, this is the average distance of all home runs hit at the player’s home stadium by any player.

AHRDL – Average Home Run Distance League. Using only Y1 data, this is the average distance of all home runs hit in both the National League and the American League.

Y3HR – The amount of home runs hit by the player in the oldest of the three years in the sample. Y2HR and Y1HR follow the same idea. n cases where there isn’t available major league data, then regressed minor league numbers will be used. If that data doesn’t exist either, then I will be very irritated and proceed to use translated scouting grades.

PA – Plate appearances

(You should be initiated at this point, so figure out HR% for yourself.)

The reason these formulas were thrown out was that the xCo relied too heavily on seasons past to provide an accurate estimate. When I briefly tested this one on a few players, it delivered incredibly scattered results. Furthermore, there wouldn’t be any data available for rookies to use these iterations on because there’s no such thing as a minor-league or high-school home-run tracker (and if there were I probably wouldn’t trust it). The first formulas described are overall more elegant and more accurate.

Stay tuned for Part 2, when results will be delivered instead of postulations.

Using Recent History to Analyze Dee Gordon’s Defensive Improvement

by WhyCantWeHavePeace

March 3, 2016

Dee Gordon is a polarizing player. His all-speed, no-power approach on offense has both fans and projection systems divided on what to make of his bat. Is he an elite offensive second baseman? Is he a one-hit wonder that won’t be able to repeat his numbers from 2015? Reasonable people can really disagree on Gordon’s bat.

Reasonable people can also really disagree on Dee Gordon’s defense, and that’s where I intend to focus my analysis today. Dee Gordon led all second basemen with a 6.4 Ultimate Zone Rating (UZR), which means he was worth roughly six runs on defense compared to an average second baseman. That doesn’t sound too unreasonable, right? Here’s where things get interesting. Gordon, despite his obvious athleticism, had previously been considered a below-average defender, coming in with a -3.4 UZR last year at second base. He had been a massively below-average defender at shortstop (where he played a few years ago before moving to second base full-time in 2014), so there are years of data painting him as a minus defender relative to other middle infielders.

In 2015, Gordon’s advanced defensive metrics took a massive jump forward. Dee Gordon improved by exactly 10 runs according to UZR, which is roughly an entire win difference thanks to his defense. Which defender is the real Dee — the one that flailed around in 2014, or the elite defender from 2015?

Let’s find some historical comparisons, and see what they can teach us about the repeatability of Dee Gordon’s defensive statistics.

We know Dee Gordon improved 10 runs defensively at second base to become one of the best defenders in the league at the position. Let’s take a look at the past 10 years, and find all second basemen that improved by at least 10 runs in UZR from year to year and had a UZR of at least 5 in the improved year. There are 16 player seasons that fit this criteria. Excluding those that didn’t play enough innings to qualify at second, 11 player seasons were left fitting the criteria. The numbers are presented below, along with the UZR that the player recorded the season following his improved year.

Table of Dee Gordon Comparisons

Among the second basemen in the last 10 years that made a big jump into the elite of the defensive statistics, on average those players lost almost nine runs of UZR the following season after the leap. The group lost about 60% of the improvements they had made the following season, indicating that a big jump in UZR for a second baseman is unlikely to signal a new level of performance. Among the qualifying group, not a single second baseman improved their UZR the following year again and only one member of the group, Placido Polanco in 2009, regressed by less than four runs.

However, there is a slight bright side. Only one member of the group had a UZR that was lower the year after “the leap” than before the improvement, indicating that taking a leap of over 10 runs of UZR means you almost certainly have improved as a defender. It’s just not by nearly as much as you would think from the leap-year UZR, but the players kept about 40% of the improvement they made in their improved year.

What does this mean for the Marlins’ speedy second baseman? While Dee Gordon’s huge jump in UZR this year means he’s almost certainly a better defender than he was two years ago, the improvement to his talent is likely only modest and not nearly what you would hope for after his great 2015 defensively. To those who pointed to Dee Gordon’s greatly improved UZR this season as a reason to believe he’s made big strides as a defender, I’ll sadly have to point out that we can expect Dee Gordon to return much closer to the mediocre defender he was in 2014 than the star he was in 2015.

The Best Bets for Over/Under Team Win Totals

by Christopher Rinaldi

March 2, 2016

Typically, projections and conjecture about the upcoming baseball season serve the general purpose of piquing your interest. However, sometimes they are good for making money. In this instance, here are some gambles you can make based on the Atlantis Race and Sports Book.

This article was written on February 28, 2016 and the initial lines from this Fox Sports article were published on February 12, 2016.

The team win projections referenced are some basic (keyword, “basic”) projections I made for this season.

Colorado Rockies — Over 68 1/2 Wins, -110

Projected Wins: 81

The projection for the Rockies is shockingly bullish at first glance. But, take a step back and put it in context. The Rockies gave up 844 runs last year, the highest amount in MLB. This year they are projected to surrender 757, or 87 less runs; an improvement of over a half-run per game.

This is not ridiculous considering what you can expect from their pitching staff. They will have a full season from a maturing Jon Gray and they bolstered their bullpen with Jason Motte, Chad Qualls, and Jake McGee. These highlights may not be awe-inspiring, but they don’t need to be. The 757 projected runs against is the worst projected runs against in the NL. The projection doesn’t signify the Rockies are good; they signify they are not as bad as last year.

The Rockies offense is projected to keep chugging along, with 761 runs scored, which would be the ninth-lowest runs scored for a Rockies team from 1995–2015, and only 24 runs greater than last year’s Rockies team. It’s not all that extreme.

You don’t need to buy into the projections to view this as a good bet. You just need to buy into the idea that the Rockies are better than they were last year (when they won 68 games). The Rockies are the best bet at the dawn of spring training.

Chicago Cubs — Over 89, -110

Projected Wins: 100

A pessimist may ask some of the following questions of the Cubs: (1) It’s the Cubs. Will they find some way to blow it?; (2) Will Jake Arrieta be able to carry over his performance of the past season and a half?; (3) Will Kris Bryant and Kyle Schwarber suffer a decline in performance now the league has had an off-season to study their strengths and weaknesses?

A pessimist would probably have more questions along these lines, but a pessimist would have more of these types of questions about other teams. So, don’t be a pessimist; play the odds, particularly if you’re betting. The odds say the Cubs are the best team in the league.

You may not want to bet on the Cubs’ projected win figure of 100, but it seems foolish to not bet on 90+ wins. Teams can be ravaged by injuries (see 2015 Washington Nationals) and teams can be ravaged by bad luck, but don’t let the world of possibilities cloud the virtue of probabilities. The probability that the Cubs win over 90 games for the second year in a row is greater than the pessimistic possibilities that may (but probably aren’t) dancing through your head.

Los Angeles Dodgers — Over 87, -115

Projected Wins: 95

How much can one man be vilified? Snark surrounded Andrew Friedman and the Dodgers’ offseason, beginning with the departure of Zack Greinke. It continued as the Dodgers added more starting pitchers to their pitching staff than they did former general mangers to their front office staff. But that’s okay. You know better, don’t you?

This writer is hard-pressed to think of a team so well-equipped to survive the maladies and booby traps that a major-league-baseball team may encounter in a trek through a 162-game season (well, all but Clayton Kershaw’s arm falling off). They have a cadre of infielders (Kendrick, Turner, Utley, Seager, Guerrero), outfielders (Puig, Pederson, Ethier, Crawford, Van Slyke, Thompson), and Enrique Hernandez is essentially baseball’s equivalent to the utility knife. As suggested in the first paragraph, the Dodgers’ positional depth may only blush when it encounters the depth of their own pitching staff.

If you doubt the Dodgers, you may be the kind of person who’d choose a wallet with a $100 bill over another with ten $20 bills. But, don’t fear if you did that, you can turn that $100 into $187 if you bet on the Dodgers to win more than 87 games this year.

If you’re still unsure, you should have chose the wallet with ten $20 bills. You wouldn’t need to gamble at all if you did that.

Washington Nationals — Over 87, -115

Projected Wins: 94

I will not blame you if you begin to feel a greater degree of uncertainty at this point. The luster may have come off the Nationals last year, but don’t you believe they could be re-polished? It’s feasible the Mets and Nationals (and maybe the Marlins) take the battleground of the mid-80s to determine the NL East champion, but it’s more likely that the division winner will walk away with more than 90 wins, or the Nationals will surpass everyone at that level.

You may not want to bet on the health of Stephen Strasburg, Anthony Rendon, and Jayson Werth. Or, you may just want to bet. If the latter is the case, the Nationals are a good bet; not a sure bet. But what is a sure bet? The Nationals’ biggest offseason splash was Daniel Murphy, but their most effective offseason acquisitions likely went under the radar. They bolstered their bullpen with the additions of Shawn Kelley, Oliver Perez, Yusmeiro Petit, and Trevor Gott. They also have a farm system that can (1) patch holes this year (Lucas Giolito) and (2) be used to acquired talent to fill any other holes through trade.

Oh, and Dusty Baker is their manager. You can feel how you want about that, but that means Matt Williams isn’t their manager this year and there’s only one way to feel about that.

Kansas City Royals — Under 87, -115

Projected Wins: 74

Lets establish two things: (1) The projected wins are low, and (2) the universe may haunt you for making this bet.

Disregard the universe for the moment. The Royals should be the favorites to win the AL Central. I don’t state that in a hypothetical way. There is no team in the AL Central that is so good that you should expect them to overcome the Royals’ Black Magic. But, for purposes of this exercise, ask the important question: Is the Royals’ Black Magic so good that it will propel them to win more than 87 games? I think not.

Much like the Nationals, I wouldn’t take my last $115 and make this bet, but if you want to bet on, say, five over/under win totals for a MLB team, I would make this your fifth bet. But realize, you’re not making a bet on a the performance of a baseball team; you’re making a bet on the rhythms of the universe.

If you’re hesitant to bet on the universe, here are some other reasonable (but not as reliable) choices:

6. Boston Red Sox — Over 85 1/2 Wins, -105