Peter O’Brien’s Raw Power: Estimating Batted-Ball Velocities in the Minor Leagues

On May 20th Peter O’Brien hit a massive home run to straight away center clearing the 32 foot tall batter’s eye at Arm & Hammer Park more the 400 feet from home plate.  O’Brien is currently 1 home run behind Joey Gallo, in what looks to be an exciting competition for the minor league home run title.  O’Brien isn’t as highly touted a prospect as Gallo, but he still has some of the most impressive power in the minor leagues.  Reggie Jackson saw O’Brien’s home run and said it was one of hardest hit balls in the minor leagues that he had ever seen (and Reggie knows a thing or two about tape measure home runs).

How hard was that ball actually hit?  It is impossible to figure out exactly how hard and how far the ball was hit from the available information.  You can however use basic physics to make a reasonable estimation.

Below I explain the assumptions and thought process I used to get to an estimate of how hard the ball was hit.  If that does not interest you, then just skip to the end to find out what it takes to impress Reggie Jackson. But, if you’re curios or skeptical stick around.

OBSERVATIONS

I started off by watching the video to see what information I could gather (O’Brien’s at bat starts at the 37 second mark in the video).

TIME OF FLIGHT From the crack of the bat, to the ball leaving the park – it appears to take 5 seconds. If you watched the video, you can tell this is not a perfect measurement since the camera doesn’t track the ball very closely. If you think you have a better estimation, let me know and I’ll rework the numbers.  

LOCATION LEAVING THE PARK  The ball was hit to straight away center. From the park dimensions we know when it left the park it was 407 feet from home plate and at least 32 feet in the air to clear the batter’s eye.

ASSUMPTIONS

COEFFICIENTS OF DRAG (Cd) – The Cd determines how much a ball will slow down as it moves through the air. I chose 0.35 for the Cd because it is right in the middle of the most frequently inferred Cd values for the home runs that Allan Nathan was looking at in this paper.In looking at the Cds of baseballs, Allan Nathan showed there is reason to believe that there is some significant (meaning greater than what can be explained by random measurement error) variation in Cd from one baseball to another.

ORIGIN OF BALL I assume the ball was 3.5 feet off the ground and 2 feet in front of home plate when it was hit.  These are the standard parameters in Dr. Nathan’s trajectory calculator. But what if the location is off by a foot? The effects of the origin on the trajectory are translational. One foot up, one foot higher. One foot down, one foot lower. The other observations and assumptions are more significant in determining the trajectory of the home run.

Using these assumptions and the trajectory calculator, I was able to determine the minimum speed and backspin a ball would need in order to clear the 32 foot batter’s eye 5 seconds after being hit at different launch angles.  The table below shows the vertical launch angle (in degrees), the back spin (in RMPs) and the speed of the balled ball (in MPH).

Vertical launch angle Back spin Speed off Bat
19 14121 101
21 6817 101.9
23 4155 102.75
25 2779 103.69
27 1940 104.7
29 1375 105.89
30 1156 106.5
32 805 107.88
34 536 109.4
36 322 111.1
38 149 112.99
40 4 115.1

The graph shows a more visual representation of the trajectories in the table above (with the batter’s eye added in for reference).

http://i1025.photobucket.com/albums/y314/GWR87/OBrienhomerun_zpsb1507cf4.png

Looking at the graph you will notice that all of these balls would be scraping the top of the batter’s eye.  This makes sense because the table shows the minimum velocities and back spins needed for the ball to exactly clear the batter’s eye.

What is the slowest O’Brien could have hit the ball?

If you were in a rush, looking at the table you would think the slowest O’Brien could have hit the ball would be 101 MPH at 19o. But, not so fast! The amount of backspin required for the ball to travel at that trajectory is humanly impossible.

What is a reasonable backspin?

I am highly skeptical of backspin values greater than 4,000 rpm based on the Baseball Prospectus article by Alan Nathan “How Far Did That Fly Ball Travel?.” The backspin on home runs Nathan examined ranged from 500 to 3,500 rpm, with most falling in around 2,000. The first 3 entries in the table have backspins of over 4,000 and can be eliminated as possibilities. If the ball with the 19o launch angle only had 3,500 rpm of back spin it would have hit the batter’s eye less than 11 feet off the ground instead of clearing it.  Maybe you’re skeptical that I eliminated the 3rd entry because it’s close to the 4,000 rpm cut off.  Think about it this way, if a player was able to hit a ball with over 4,000 rpm of back spin, they would have to be hitting at a much higher launch angle than 23o (Higher launch angles generate greater spin while lower launch angles generate less spin).

The high launch angle trajectories with very little back spin (like the bottom three in the table) are also not very likely.  A ball hit with a 40o launch angle would almost certainly have more than 4 rpm of back spin.  If the ball hit with the 40o launch angle had 1,000 rmp of back spin (instead of 4) it would have been 70 feet off the ground, easily clearing the 32 foot batter’s eye.

Accounting for reasonable back spin, the slowest O’Brien could have hit the ball is 103.69 MPH at 25o with 2,779rpm of backspin.

So what do all these observations and assumptions get us?

We can say that the ball was likely hit 103.69 MPH or harder, with a launch angle of 25o or greater.  103.69 MPH launch velocity is not that impressive, it is essentially the league average launch velocity for a home run.  Distance wise, how impressive of a home runs was it? Unobstructed the ball would have landed at least 440 feet from home plate (assuming the 25o scenario).  The ball probably went further than 440 because it did not scrape the batter’s eye. So, how rare is a 440+ foot home run? Last year during the regular season there were 160 home runs that went 440 feet or further, there were a total of 4661 home runs that season, meaning only 3.4% of all home runs were hit at least that far.

For those of you who wanted to just skip to the end. My educated guess is that the ball went at least 440 feet and left the bat at at least 103.69 MPH.

If you like this, you can read other articles on my blog GWRamblings, or follow me on twitter  @GWRambling

None of this would have been possible without Alan Nathan’s great work on the physics of baseball.  I used his trajectory calculator to do this, and I referenced his articles frequently to make sure I wasn’t way making stupid assumptions. The information on major league home run distance is based off of hittrackeronline.com


Old Player Premium

One of Dave Cameron’s articles a while back showed payroll allocations by age groups, and it shows that over the last five years or so more money is going to players in their prime years while less is being spent on players over 30.  That seems to be a logical thing for teams to do, but that trend can only continue for so long.  Eventually a point will be reached where older players are undervalued, and it might be possible that we are already there.

There are several things to keep in mind when comparing these age groups, and one of the biggest is the survivorship bias.  There is a natural attrition over time for players in general.  Let’s look at an example, and for all the following I will be using 2012 versus 2013 as a way to see what happens from year to year.  To look at survivorship, I looked at all position players in 2012 and then their contribution in 2013 to see how many disappeared the next year.  The players that were not in the 2013 year could be due to retirement, demotion, injury, etc.  I also took out a small group that played in both seasons, but were basically non-factors in 2013, for example Wilson Betemit played in both seasons, but in 2013 he only had 10 plate appearances.  The attrition rate for the age groups looks like this:

Age Group % of 2012 Players That Did Not Contribute in 2013
18-25 22.2%
26-30 25%
31-35 29.3%
36+ 38.9%

As you would expect, the attrition rate increases over time.  Players in their late teens and early 20s who make it to the majors are likely to be given opportunities in the near future, but as the age increases the probability of teams giving up on the player, major injury, or retirement goes up.  Players who make it from one group to the next have survived, and that is where the bias comes in.  By the time you get to the 36+ group a significant number of the players are really good because if they weren’t they would not have made it so far.  This ability to survive is also a reason why they should be getting a good chunk of the payroll.  As I will show you, it leads to steady play which teams should pay a premium for.

The next step is looking at performance risk among the groups.  To look at this I took each group’s performance in 2012 and compared it to the group’s performance in 2013, again only with survivors from year to year.  I looked at both wRC+ and WAR just to see if only the hitting component or overall performance behaved differently.

Further, to calculate a risk level I looked at the standard deviations of the differences (2013 minus 2012) for each player, but those are not directly comparable.  Standard deviation is higher for distributions with higher averages due to scaling issues.  For instance, the average 36+ player had a 95 wRC+ in 2012 versus, which is more than 10 wRC+ above the average 18 to 25 year old in the same year.  A 10% drop or increase  in production is therefore a larger absolute change for the 36+ player, so they naturally end up with a higher standard deviation.  To take care of this I calculated the standard deviation of the difference as a % of 2012 average production as the overall riskiness measure.

Age Group wRC+ Risk WAR Risk
18-25 56.5% 167.7%
26-30 48.3% 118.9%
31-35 46.4% 140.7%
36+ 35.2% 92.8%

Don’t compare the wRC+ to WAR figures as there are again scaling issues, but look at the age groups.  A one standard deviation change is most volatile for the youngest age group, so the younger players are the most uncertain or most risky.  That is what we would expect as we have all seen prospects flame out.  The middle two groups are similarly volatile with the 31 to 35 group have a slightly lower risk level in the hitting for this sample and slightly higher overall play according to the WAR risk.  More years might need to be compared to see how consistent those groups are relatively.  The 36+ players are significantly less risky than the other ages.  If they decline by 1 standard deviation it will mean a smaller reduction in performance, less volatile and less risky.

The only thing that really hurts the older players is the aging curve.  They are more likely to see a decline in performance.  From the youngest group to oldest the percent of players who were worse in 2013 than they were in 2012 by wRC+ was 52.3%, 54.5%, 64.4%, 63.6%, and for WAR 52.9%, 48.7%, 56.7%, and 81.8%.  So it is more likely that the older players will see performance worse than the previous year, but again a drop for them will likely be smaller due to lower volatility and it is on average from a higher level of performance to begin with.

Older players are like buying bonds for your investment portfolio, you have a pretty good idea of what there going to pay in the next period with occasional defaults.  Younger players are more like growth stocks, you aren’t sure when or if they are going to pay dividends but when they do you can make huge returns.  Investors pay a premium for bonds (accept a lower rate of return) due to their stability, and teams pay more for older players than maybe their production seems to warrant for the same reason.

 photo Survivor_zpsee696878.jpg

If you go back to the payroll allocation, part of the shift is in the number of players in each group.  The 31-35 year-olds no longer get the largest chunk of payroll in part because there are more 26 to 30 year-old players.  Baseball is getting younger overall, so a larger portion of the money going to younger players is inevitable.  The 18 to 25 group isn’t getting a large change in payroll allocation because they are generally under team control, but the teams are extending the players at that age with the money showing up as they get into the next couple age groups.  Like Chris Sale, who is making $3.5 million this year on the extension he signed (he’s 25), but when he is 26, 27, and 28 he will make 6, 9.15, and 12 million respectively.

So the 36+ group, as you can see only 4.7% of the players, used to make about 20% of the total salaries paid, but now they make 15 or 16% (I don’t have Dave’s exact numbers).  Is that premium fair, four times more of the allocation than they make up of the overall player pool?  That is a tough question, and one I am working on.  If anyone can give me tips on how to dump lots of player game logs, that is probably what I am going to do next, but haven’t figured out how to do it without eating up my entire life.  Being more certain on this sort of thing, and having a relative risk measure for players could make contracts a lot easier to understand and predict.


The Tim Hudson Renaissance

As a general rule, giving multi-year contracts to 38-year-old pitchers coming off major ankle injuries is not a good idea. Yet Brian Sabean and the San Francisco Giants did just that, inking Tim Hudson to a two-year, $23M contract this off-season, and thus far have come out smelling like roses.

While Hudson has been a reliable and at times masterful starter during his long career, he is en route to his best overall year since 2003. The data further suggests that he is pitching better now than he has at any other point.

Examining Hudson’s career statistics suggest that his current pace, while not completely sustainable, is not a mirage by any means. The one stat that jumps off the page is his BB/9, which is a paltry 0.77. Of course that rate is bound to rise, but it’s certainly reasonable to expect it to stay in the low 2s. Hudson’s career low BB/9 is 2.10, and he hasn’t had a rate above 2.91 since 2006.

This season, Hudson’s strikeout rate—5.63—is actually lower than his career rate of 6.05. But he has never been a strikeout pitcher; his highest K/9 (8.71) came in 1999, his rookie season, when he also walked 4.09 batters per nine. He hasn’t had a strikeout rate above 6.51 since 2001.

What Hudson is now doing better than he has at any time in his career is limiting baserunners and stranding those that do manage to reach. His miniscule 0.88 WHIP is far off from his career total of 1.22, but it’s by no means a complete anomaly. As recently as 2011, Hudson has had a WHIP as low as 1.14; in 2003 he posted a career best of 1.08. While his current rate is likely to regress closer to the mean, he has proven capable of keeping batters from reaching base at an impressive rate.

When the WHIP does rise, it will likely be a result of an increased BB/9 and BABIP. Against Hudson in 2014, hitters have a BABIP of .243, a number well below his career mark of .278. But Hudson has posted similar rates in the past. In 2010, a year in which he pitched 228.2 innings, he held opposing hitters to a .249 BABIP. He hasn’t allowed a BABIP above .300 in a full season since 1999, though he threw just 136.1 innings that year.

Further, Hudson has stranded 80.8% of his baserunners thus far in 2014, his highest rate since 2010 (81.2). His groundball rate—60.7%—is a big reason why, as is his refusal to allow home runs. His HR/9 is a measly 0.51, a number he’s only bettered twice in his career (0.38 in 2004, 0.40 in 2007). While pitching in the friendly confines of AT&T park has helped, his FIP- of 83 is relatively close to his career mark of 88. In 2007, pitching half his games at Turner Field, Hudson posted a FIP- of 77.

So how is Hudson doing it? Besides the absurdly low walk rate, what has made him so effective this year?

Thus far, he is throwing his split/changeup and cutter with more frequency than his career rates from 1999-2013. His split/change—which he throws 14.60% of the time—has been especially effective this season, garnering a whiff/swing rate of 36.84. Before this season, the pitch amassed a whiff/swing rate of 27.94. His cutter, while getting slightly less whiffs this season (16.67%) than in years past (17.12% from 1999-2013), is forcing more ground balls (11.26 compared to 9.05).

Hudson’s curveball has also been a more valuable weapon this season than it has been in the past. While he’s throwing it at a rate that is almost identical to his career line, it gets him more whiffs (17.19%) than any of his other pitches besides the split/change (20.14). Before, batters whiffed at Hudson’s curve just 11.74% of the time.

When batters do put the ball in play, they aren’t hitting it very hard. Hudson’s LD% of 15.9 is the second lowest number he’s posted in his career, and a decent chunk below of his career mark of 18.0. In 2010, he had a career best 13.6%. This has resulted in Hudson throwing strikes at a higher rate than he ever has in his career. In 2014, 68.2% of the pitches he has thrown have been strikes, compared to a career rate of 63.7%.

As amazing as Hudson has been through 10 starts this season, the data suggests that, for the most part, his rates are legitimate and sustainable. Besides the infinitesimal walk rate, which translates to a low WHIP, and improved whiff rates on two of his pitches, Hudson isn’t doing anything that he hasn’t proven able to do in the past.


Nick Markakis, What Happened?

Nick Markakis has carved himself out a nice major league career. He now has the 8h most hits in Orioles history and by seasons end he’ll likely be in sole possession of 6th place. Markakis, now with nearly 1,500 hits, at 30 years old has a shot at gathering 2,500 hits in his career. While hits are a compilation statistic,  that would still place him in the top 100 of all time. However, Markakis still strikes me as a player of unfulfilled potential. In his last four seasons, Markakis has not compiled a WAR higher than his rookie season (2.1 in 2006). His two highest WAR seasons—far and away—were at ages 23 and 24. In 2008, a season in which he compiled 6.1 WAR, he had the 11th highest total in all of baseball. To peak so young is a very odd career trajectory. While Markakis was on the path to being one of the best all around players in baseball, he cratered early. This loss in value is due to two reasons, which are readily apparent to date this season, a reduction in power and a reduction in defense.

Markakis early on posted decent advanced defensive numbers. But, since 2009 he has been bad according to the metrics. To follow up that up with some regular scouting, he has simply lost a step. He lost his range at a young age and has never been able to get it back. His arm keeps him respectable but he has even lost some of that strength as well. He remains a below average right fielder and it is not getting any better.

While his defense has hindered his overall value, the most critical aspect of his game to leave him at young age was his power. Markakis never hit many home runs, with a career high of 23, but the doubles were critical to his value. He had four straight seasons of 43, 48, 45, and 45. All fantastic numbers. In fact, after the 2010 season, he had a decent shot of reaching the top 10-20 for the all time doubles record if he kept up that pace. However, his homers and doubles fell following 2010. If he had maintained a 40 double, 15-20 homer pace over the course of his career, alongside his .300 batting average and decent walk rate, Markakis could have been one of the most valuable outfielders in the game. The graph below tells the story best of when he lost his power. Those are his season by season ISO and SLG numbers.

NickMarkakis_PowerGraph

Looking at the graph above, once can see that Markakis was average to above average in power production for his first handful of seasons. Starting in 2010 is when his power began to fall to below average. His numbers spiked in 2012, however that is his shortest season to date so the sample size is smaller than the other years around it. Also, 2012 was still lower in both ISO and SLG than 2007 and 2008. Since 2009, Nick Markakis has been a below average power hitter. And his most recent season, 2013, was his worst ever producing a paltry .085 ISO (.145 is considered average and .080 is considered awful) and posting a -.1 WAR number. But, the question still remains to why did he lose his power?

After watching Markakis for years and staring at hours of tape it is hard to tell if this power reduction is due to mechanical issues. Markakis has been known to change his stance and approach at the plate nearly every week. He will lower or raise his hands, stay open or close up, he is a constant tinkerer at the plate with his mechanics. I do not believe mechanics has anything to do with the steady power decline. Nor is it necessarily how pitchers are pitching to Markakis. Looking at the numbers, he is seeing a similar amount of pitches in the zone, a little less than the early years but nothing unexpected and in fact his rate has rebounded recently. Furthermore, the mix of pitches he is seeing is similar to his early years. It has not been an adjustment from pitchers. Rather, much like his defense, he simply lost a step earlier than most other position players do.

Looking at the two heat maps below. One shows his power peak years (2007 to 2010) and the one below that shows the last two seasons (2013 to 2014). They are ISO heat maps showing which pitches in which locations Markakis has been able to drive for extra bases.

Markakis2007to2010ISOMarkakis2013to2014ISO

Clearly, Nick Markakis has shown over the past two seasons to not be able to drive the pitches for extra bases that he once could. In particular the pitches in the outside middle of the plate—which if you remember those great Markakis years he could artfully fade right in between the center fielder and the left fielder for a double like clockwork—he has shown a clear ability to not drive for extra bases anymore. The only power left in Markakis’ game comes from pitches down and in and even then its limited power at best. Basically, he can still run into a meatball, but his double-hitting days are over. And with someone who cannot and has never been able to hit the ball out of the park readily, Markakis is basically a slap-hitting right fielder who can post some decent value at the plate, but nothing special.

The career arc is strange and unfortunate but clearly obvious. Markakis simply could not and cannot maintain the production of his early seasons. His skills broke down sooner than most. He is a nice piece and if he kept up his early pace, he would have been a steal on his current contract. However, unless he is brought back at a reduced price—or if Peter Angelos decides that loyalty is worth $17.5 million—Orioles fans better get used to having a new right fielder in 2015.

Article originally posted at www.Orioles-Nation.com


Satchel Paige: Baseball’s Believable Myth

One of the biggest drawbacks of statistics is the how they can get in the way of our imagination. I’ve heard stories of how Pete Rose could will his team to victory on any given day of his career that spanned 23 years. Our stats claim that, actually, you can value his contributions at 80 wins. Rickey Henderson’s speed was electric and unfathomable, and no one can put a number on that, we’ve heard. FanGraphs says, really, his baserunning was worth 142 runs. Aroldis Chapman throws so hard, his fastball isn’t comparable to anyone else’s in baseball. Our data suggest that last year it was 7 runs above average.

While statistics have contributed significantly more than they’ve taken from us, it is occasionally fun to ignore them and just pretend the stories we want to believe are true. However, for a pitcher that is the focus of some of the most incredible tales in baseball history, a few stats from the end of his career are all the more reason to trust the absurd stories we have about him.

Satchel Paige pitched almost all of his professional baseball career in the Negro Leagues and barnstorming. He estimated that he played for 250 teams, though his “facts” about himself were often far from reality (for instance, he claimed that he never hit under .300, but he actually hit .097 in the majors). Baseball wasn’t integrated until Paige was 41 years old. Up until that point, he had built a legendary career that earned him the first Hall of Fame induction for any Negro Leagues player. Unfortunately, record keeping from these leagues was nearly non-existent, and almost no statistical evidence remains of his elite performances.

Stories of Paige paint a picture of arguably the most talented and entertaining pitcher to ever throw a baseball. As a teenager playing semi-pro baseball in Alabama, he supposedly got so mad at a poorly performing defense that he ordered his outfielders to sit down in the infield, where they watched him strike out the game’s last batter to complete his shutout with the bases loaded.

The greatest Negro Leagues hitter, Josh Gibson, once told Paige that he was going to hit a grand slam off of him in an upcoming game. With Gibson in the hole and one player on base, Paige intentionally walked the next two hitters, so Gibson would have an opportunity to hit a grand slam. Paige struck him out.

Joe DiMaggio called Paige the best pitcher and hardest thrower he had ever seen. Teammates claimed he could consistently throw his fastball over a gum wrapper. In his six exhibition matchups against Dizzy Dean (during two seasons in which Dean achieved a total WAR over 13), Paige won 4 games, and Dean said Paige’s fastball made his own look like a changeup.

Witnesses of Paige’s pitching would go on to tell countless other stories of his heroics, and a good number of them can’t be true. But what is possibly most remarkable is how historically effective he was when he was finally allowed to play in the majors, long after his prime.

Satchel Paige’s pitching demands were enormous, because through almost his entire career, people only paid to watch him pitch. He would frequently throw over 100 pitches in consecutive days. While his estimate of 2,500 games started is almost certainly exaggerated, he may very well have thrown more professional innings than anyone ever has. He pitched professionally for 22 years before Major League teams would allow him to join a roster; he would have done so with more financial incentive to pitch frequently than any reasonable person could expect.

Considering the wear and tear on his arm, expectations even for such a legendary pitcher would need to be very tempered for his performance in his 40’s. After all, only 67 pitchers have ever even thrown 100 innings after they turned 40.

Of those 67, Paige ranks 8th in ERA- (81). Of the seven in front of him, three were knuckleball pitchers, one pitched before World War I, and one has been held out of the Hall of Fame due to steroid allegations (whether fair or not).

In the course of his first 4 seasons, 128 pitchers threw at least 300 innings. Of those 128, Paige’s strikeout rate ranked 2nd. At the end of that four-year stretch, he was 46. 46 year olds don’t strike players out. You have to go down 20 spots to find a pitcher who was less than 10 years younger than Paige.

After Paige had been out of the majors for over a decade, the Kansas City A’s had him throw for them when he was 59 years old. He threw three scoreless innings, allowing only one runner.

It’s easy to wish we had better stats of Satchel Paige’s early career. It could help us establish if he really had, as he said, over 20 no-hitters. We could definitively say whether or not he had 250 shutouts, 2000 wins, 21 straight wins, or over 60 consecutive scoreless innings, all of which he claimed to be true. It’s quite likely all those numbers are fabricated. It’s possible that many of the stories about his pitching are exaggerated.

But when Satchel Paige was finally given a chance to prove himself, he blew away any realistic expectations anyone could have set for him. No one will ever know what stories about Satchel Paige really happened, or how trustworthy people’s observations of him were. But 25 years into his career, at years in his life few ever spend pitching professionally, he gave us a reason to believe them.


Performance With and Without Runners On, and Hitter Valuation

The increased prevalence of defensive shifts, as well as recent stories touting certain players as “shift-proof,” got me thinking: Is it a good thing to be shift-proof?  Is it inherently better to be a player against whom defensive shifting is less effective, or is there room for different players with different make-ups?  A downstream effect of defensive shifts is that, because teams shift less often (and shifts are less exaggerated) with runners on base, we start to see differences in a hitter’s performance with runners on versus with the bases empty.  We also notice other effects of players performing differently based on the number of baserunners.  In this post we’ll take a look at how we observe significant changes offensive performance (often fueled by changes in BABIP) of a few sample players when there are runners on base, versus with the bases empty.

Let’s take 3 players with very high similarity scores to each other: David Ortiz, Jason Giambi, and Carlos Delgado.  First, a look at their career stats:

Player G PA HR ISO BABIP AVG OBP SLG wOBA wRC+ WAR
Delgado 2035 8657 473 0.266 0.303 0.280 0.383 0.546 0.391 135 43.5
Ortiz 2020 8467 443 0.261 0.304 0.286 0.381 0.548 0.392 138 41.7
Giambi 2242 8864 440 0.241 0.294 0.277 0.400 0.518 0.395 140 49.3

Pretty comparable overall.  Giambi has accumulated more WAR, primarily through having a few more plate appearances, but also from having a better walk rate, which drives up his OBP, wOBA, and wRC+ significantly as well.

Now let’s look at their splits with runners on vs. bases empty:

Player

Split G PA HR HR/PA BB% SO% AVG OBP ISO OPS BABIP
Delgado Bases Empty 1932 4430 255 5.8% 11.7% 21.4% 0.275 0.374 0.273 0.922 0.303
Delgado Men On 1895 4227 218 5.2% 14.0% 18.9% 0.286 0.393 0.258 0.936 0.304
Ortiz Bases Empty 1862 4193 262 6.2% 11.2% 19.1% 0.271 0.356 0.282 0.908 0.281
Ortiz Men On 1851 4274 181 4.2% 15.2% 16.6% 0.302 0.406 0.240 0.948 0.327
Giambi Bases Empty 1999 4513 224 5.0% 13.0% 18.1% 0.256 0.367 0.228 0.851 0.271
Giambi Men On 2020 4351 216 5.0% 17.8% 17.1% 0.302 0.434 0.255 0.991 0.320

Here we start to see a lot of divergence.  With Ortiz and Giambi, we see a large increase in BABIP when there are runners on base (and corresponding increases to AVG and OPS).  With Delgado, there is only a trivial increase in BABIP, and a much smaller increase in OPS.

Here’s the difference in BABIP and OPS each player shows in the split between {bases empty} and {runners on}:

Player BABIP(runners on) – BABIP(empty) OPS(runners on) – OPS(empty)
Delgado 0.001 0.014
Ortiz 0.046 0.040
Giambi 0.049 0.140

Note that to some extent, all hitters tend to put up better numbers with runners on due to sampling bias – in an average “runners on” situation, a batter is more likely to be facing an inferior pitcher than in an average bases-empty situation.  Delgado’s splits are in line with the league-average splits for {bases empty} vs. {runners on}; in a given league season, the league-wide runners-on-vs.-bases-empty split in BABIP tends to range from 0.000-0.005; for OPS, the increase ranges from 0.010-0.030.  Ortiz and Giambi on the other hand show splits well outside this range that indicate there are other factors at play causing these effects.

Does this mean Ortiz and Giambi are tapping into some part of their psyche that allows them to suddenly transform into better players when runners are aboard?  Unlikely.  Ortiz and Giambi are pretty heavy pull hitters, especially looking at their ground ball spray charts, against whom defenses have often employed dramatic shifts to great effect.  However, with runners on base, these shifts tend to be less dramatic and less effective.  This is likely the primary reason for the large increases in BABIP with runners on (a 0.046 increase for Ortiz, 0.049 with Giambi).

Beyond this, although Ortiz and Giambi both show similar BABIP splits, they still differ greatly from each other in terms of their production with runners on.  Giambi’s OPS increases a whopping 140 points, while Ortiz’s only increases by 40 points.  This is largely due to Ortiz’s dramatic decrease in home run rate with runners on.  While Ortiz’s HR% drops by nearly 33%, Giambi has managed to continue hitting homers at the same rate when runners are aboard.  Do pitchers change their approach when facing Ortiz with runners on to “minimize the damage” and try to prevent him from hitting home runs?  Likewise Ortiz (based on the knowledge that pitchers will approach him differently) may change his approach at the plate as well.  The splits for other stats seem to bear this out, as Ortiz increases his walk rate and decreases his strikeout rate; this isn’t particularly revelatory, and in fact these trends are present for Giambi and even Delgado as well.

This has profound implications for player valuation.  Given 3 players who put up similar aggregate numbers over the course of the season, would you rather have the player who is going to produce at roughly the same level (similar AVG / BABIP / OPS) regardless of whether there are runners on base, or the player who is going to overproduce with runners on and underproduce with bases empty?  I’d go with the latter.  I’d prefer Ortiz to Delgado.  And then, since the decrease in Ortiz’s HR% with runners on is curious (and warrants further investigation), I’d prefer Giambi to Ortiz, Giambi being the even more extreme example of increased production with runners on.

As we start to see more and more defensive shifts (and if the assumption holds that shifts cannot be employed as effectively with runners on base), there will be more and more players who demonstrate these splits in performance.  WAR, for example, does not take this into account at all.  If a player is dramatically more productive (e.g. a 140-point increase in OPS!) with runners on, you would project his team to score more runs and win more games than if that player was replaced by a player who puts up equivalent full-season numbers (and hence, has the same WAR) but did not have the same splits.

It would be interesting to run some simulations (probably using Markov models) to more precisely quantify the impact a given player’s splits have on team run production.  Said impact would likely vary based on the team too (e.g. overall team OBP).  This could be similar to the analysis comparing how 2 players with similar wRC+ but different makeup (an OBP guy versus an ISO guy) can impact expected run totals for different teams in different ways.


Home-Run Environment And Win-Homer Correlation

Home runs are good, I think we can all agree on that, and in the presumably post-steroid environment they have been in decline.  Does that make the home run more or less important?  It is hard to say.  In some ways it means that they are more scarce, and you might expect that home run hitting teams might be at a larger advantage than previously.  On the other hand, teams that don’t hit a lot of balls out of the park will not be as far behind their peers if said peers are not taking the ball yard quite so frequently.  So which is it?

FanGraphs, of course, can give the answer.  I took every team in the expansion era (1961 and on) and then tracked two things year over year.  The first was how far each team was from the average home runs for a team, just home runs for a team minus the average of all MLB teams.  From there I calculated the correlation of those differences with the wins that the team accumulated in that year.  Then I tracked that correlation versus the overall home run environment.  To get them in the same scale I tracked home run environment as a percent of the max average home runs per team, so 2000 became 100%, or peak home run environment, as it was the highest average per team and every other year the average was some percent below that with the average in 2000 as the denominator.

I did omit 1994 and 1981 due to how much the seasons were shortened by strikes.  It made the overall graph harder to read.  The results look like this:

 

 photo HRenvironment_zps35a42fa7.jpg

 

And the answer is…it doesn’t matter!  Home runs are always positively correlated with wins, meaning it is never advantageous for a team to be below average when it comes to hitting home runs.  That correlation over time has a best fit line with a near zero slope.  Home runs are equally valuable with respect to winning in lower home run environments and the more recent high ones.  You can also see that the correlation is rather volatile ranging from barely positive to about .65 which is a fairly strong positive relationship.  Volatile, but never negative, so there are no years where a bunch of below average home run hitting teams took the league by storm.

The home run environment last year was back to 81.9% of the peak in 2000, and this year’s pace is a little slower than last with home runs in 2.38% of plate appearances rather than 2013’s 2.52%, which could reduce the total home runs hit by more than 8 per team for the year, though the heat of summer will probably close that gap up some.  It is likely though that the overall home run environment will be down to the levels we saw in 2011 and 2012, and maybe the drop off from 2000 has flattened out.

Anyway, I know everyone hates a non-result, there are published papers that have been published about the bias against them even, but this is still interesting to at least me.  You always want to hit home runs, we already knew that, but the value of the home runs should not be increased in times when they are scarce and they don’t become even more necessary during a homer boom.  This means that teams shouldn’t for instance overpay for a guy like Giancarlo Stanton right now because his power bat is more valuable in the current home run environment.  It means they should overpay so that their fans can enjoy the majestic blasts and feel content knowing they will be just as valuable as ever.


Foundations of Batting Analysis – Part 2: Real and Indisputable Facts

In Part 1 (http://www.fangraphs.com/community/foundations-of-batting-analysis-part-1-genesis/), we examined how the hit became the first estimate of batting effectiveness in 1867 leading to the creation of the modern batting average in 1871. In Part 2, we’ll look more closely at what the hit actually measures and the inherent flaws in its estimation.

Over the century-and-a-half since Henry Chadwick wrote “The True Test of Batting,” it has been a given that if the batter makes contact with the ball, he has only shown “effectiveness” when that contact results in a clean hit – anything else is a failure. At first glance, this may seem somewhat reasonable. The batter is being credited for making contact with the ball in such a way that it is impossible for the defense to make an out, an action that must be indicative of his skill. If the batter makes an out, or reaches base due to a defensive error that should have resulted in an out, it was due to his ineffectiveness – he failed the “test of skill.”

This is an oversimplified view of batting.

By claiming that a hit is entirely due to the success of the batter and that an out, or reach on error, is due to his failure, we make fallacious assumptions about the nature of the game. Consider all of the factors involved in a play when a batter swings away. The catcher calls for a specific pitch with varying goals in mind depending on the batter, the state of the plate appearance, and the game state. The pitcher tries to pitch the ball in a way that will accomplish the goals of the catcher.[i] The batter attempts to make contact with the ball, potentially with the intent to hit the ball into the air or on the ground, or in a specific direction. The fielders aim to use the ball to reduce the ability of the batting team to score runs, either by putting out baserunners or limiting their ability to advance bases. The baserunners react to the contact and try to safely advance on the bases without being put out. All the while, the dirt, the grass, the air, the crowd, and everything else that can have some unmeasurable effect on the outcome of the play, are acting in the background. It is misleading to suggest that when contact between the bat and ball results in a hit, it must be due to “effective batting.”

Let’s look at some examples. Here is a Stephen Drew pop up from the World Series last year:

Here is a Michael Taylor line drive from 2011:

The contact made by Taylor was certainly superior to that made by Drew, reflecting more batting effectiveness in general, but due to fielding effectiveness—and luck—Taylor’s ball resulted in an out while Drew’s resulted in a hit.

Here are three balls launched into the outfield:

In each case, the batter struck the ball in a way that could potentially benefit his team, but varying levels of performance by the fielders resulted in three different scoring outcomes: a reach on error, a hit, and an out, respectively.

Here are a pair of a groundballs:

Results so dramatically affected by luck and randomness reflect little on the part of the batter, and yet we act as if Endy Chavez was effective and Kyle Seager was ineffective.

Home runs may be considered the ultimate success of a batter, but even they may not occur simply due to batting effectiveness. Consider these three:

Does a home run reflect more batting effectiveness when it lands in front of the centerfielder, when it’s hit farther than humanly possible,[ii] or when it doesn’t technically get over the wall?

The hit, at its core, is an estimate of value. Every time the ball is put into play in fair territory, some amount of value is generated for the batter’s team. When an out is made, the team has less of an opportunity to score runs: negative value. When an out is not made, the team has a greater opportunity to score runs: positive value. Hits estimate this value by being counted when an out is not made and when certain other aspects of the play conform to accepted standards of batting effectiveness, i.e. the 11 subsections of Rule 10.05 of the Official Baseball Rules that define what are and are not base hits, as well as the eight subsections of Rule 10.12.(a) that define when to charge an error against a fielder.

Rule 10.05 includes the phrase “scorer’s judgment” four times, and seven of the 11 parts of the rule involve some form of opinion on the part of the scorer to determine whether or not to award a hit. All eight subsections of Rule 10.12.(a) that define when to charge an error against a fielder are entirely subjective. Not only is the hit as an estimate of batting effectiveness muddled by the forces in the game that are outside of the batter’s control, but the decision whether to award a hit or an error can be based on subjective opinion. Imagine you’re the official scorer; are these hits or errors?

If you agreed with the official scorer on the last play, that Ortiz reached on a defensive error, you were “wrong” according to MLB, which overturned the call and awarded Ortiz a hit retroactively (something I doubt would have occurred if Darvish had completed the no-hitter). Despite Chadwick’s claim in 1867 that “there can be no mistake about the question of a batsman’s making his first base…whether by effective batting, or by errors in the field,” uncertainty in how to designate the outcome of a play is all too common, and not a modern phenomenon.

In an article in the 6 April 1916 issue of the Sporting News, John H. Gruber explains that before scoring methods became standardized in 1880, the definition of a hit could vary wildly from scorer to scorer.

“It was evidently taken for granted that everybody knew a base hit when he saw one made…a group of ‘tight’ and another of ‘open’ scorers came into existence.

‘Tight’ were those who recognized only ‘clean’ hits, when the ball was not touched by a fielder either on the ground or in the air. Should the fielder get even the tip of his fingers on the ball, though compelled to jump into the air, no hit was registered; instead an error was charged.

The ‘open’ contingent was more liberal. To it belonged the more experienced scorers who used their judgment in deciding between a hit and an error, and always in favor of the batter. They gave the batter a hit and insisted that he was entitled to a hit if he sent a ‘hot’ ball to the short-stop or the third baseman and the ball be only partly stopped and not in time to throw it to a bag.

Some of them even advocated the ‘right field base hit,’ which at present is scored a sacrifice fly. ‘For instance,’ they said, ‘a man is on third base and the batsman, in order to insure the scoring of the run by the player on third base, hits a ball to right field in such a way that, while it insures his being put out himself, sends the base runner on third home, and scores a run. This is a play which illustrates ”playing for the side” pretty strikingly, and it seems to us that such a hit should properly come under the category of base hits.’”

While official scorers have since become more consistent in how they score a game, there will never be a time when hits will not involve a “scorer’s judgment” on some level. As Isaac Ray wrote in the North American Review in 1856, building statistics based on opinion or “shrewd conjecture” leads to “no real advance in knowledge”:

“The common fallacy that, imperfect as they are, they still constitute an approximation of the truth, and therefore are not to be despised, is founded upon a total misconception of the proper objects of statistical inquiry, as well as of the first rules of philosophical induction. Facts—real and indisputable facts—may serve as a basis for general conclusions, and the more we have of them the better; but an accumulation of errors can never lead to the development of truth. Of course we do not deny that, in a mere matter of quantity, the errors on one side generally balance the errors on the other, and thus the value of the result is not materially affected. What we object to is the attempt to give a statistical form to things more or less doubtful and subjective.”

Hits, these “approximations of the truth,” have been used as the basic measurement of success for batters for the entire history of the professional game. However, in the 1950s, Branch Rickey, the general manager of the Los Angeles Dodgers, and Allan Roth, his statistical man-behind-the-curtain, acknowledged that a batter could provide value to his team outside of just swinging the bat. On August 2, 1954, Life magazine printed an article titled “Goodby to Some Old Baseball Ideas” in which Rickey wrote on methods used to estimate batting effectiveness:

“…batting average is only a partial means of determining a man’s effectiveness on offense. It neglects a major factor, the base on balls, which is reflected only negatively in the batting average (by not counting it as a time at bat). Actually walks are extremely important…the ability to get on base, or On Base Average, is both vital and measurable.”

While the concept didn’t propagate widely at first, by 1984 on base average (OBA) had become one of three averages, along with batting average (BA) and slugging average (SLG), calculated by the official statisticians for the National and American Leagues. These averages are currently calculated as follows:

BA = Hits/At-Bats = H/AB

OBA = (Hits + Walks + Times Hit by Pitcher) / (At-Bats + Walks + Times Hit by Pitcher + Sacrifice Flies) = (H + BB + HBP) / (AB + BB + HBP + SF)

SLG = Total Bases on Hits / At-Bats = TB/AB

The addition of on base average as an official statistic was due in large part to Pete Palmer who began recording the average for the American League in 1979. Before he began tracking these figures, Palmer wrote an article published in the Baseball Research Journal in 1973 titled, “On Base Average for Players,” in which he examined the OBA of players throughout the history of the game. To open the article, he wrote:

“There are two main objectives for the hitter. The first is to not make an out and the second is to hit for distance. Long-ball hitting is normally measured by slugging average. Not making an out can be expressed in terms of on base average…”

While on base average has proven popular with modern sabermetricians, it does not actually express the rate at which a batter does not make an out, as claimed by Palmer. Rather, it reflects the rate at which a batter does not make an out when showing accepted forms of batting effectiveness; it is a modern take on batting average. The suggestion is that when a batter reaches base due to a walk or being hit by a pitch he has shown effectiveness, but when he reaches on interference, obstruction, or an error he has not.

Here are a few instances of batters reaching base without swinging.

What effectiveness did the batter show in the first three plays that he failed to show in the final play?

In the same way that there are a litany of forces in play when a batter tries to make contact with the ball, reaching base due to non-swinging events requires more than just batting effectiveness. Reaching on catcher’s interference may not require any skill on the part of the batter, but there are countless examples of batters being walked or hit by a pitch that similarly reflect no batting skill. A batter may be intentionally walked because they are greatly skilled and the pitcher, catcher, or manager fears what the batter may be able to do if he makes contact, but in the actual plate appearance itself, that rationalization is inconsequential. If we’re going to estimate the effectiveness of a batter in a plate appearance, only what occurs during the plate appearance is relevant.

Inconsistency in when we decide to reward batters for reaching base has limited our ability to accurately reflect the value produced by batters. We intentionally exclude certain results and condemn others as failures despite the batter’s team benefiting from the outcomes of these plays. Instead of restricting ourselves to counting only the value produced when the batter has shown accepted forms of effectiveness, we should aim to accurately reflect the total value that is produced due to a batter’s plate appearance. We can then judge how much of the value we think was due to effective batting and how much due to outside forces, but we need to at least set the baseline for the total value that was produced.

To accomplish this goal, I’d like to repurpose the language Palmer used to begin “On Base Averages for Players”:

There are two main objectives for the batter. The first is to not make an out and the second is to advance as many bases as possible.

“Hitters” aim to “hit for distance” as it will improve their likelihood of advancing on the bases. “Batters” aim to do whatever it takes to advance on the bases. Hitting for distance may be the best way to accomplish this, in general, but batters will happily advance on an error caused by an errant throw from the shortstop, or a muffed popup in shallow right field, or a monster flyball to centerfield.

Unlike past methods that estimate batting effectiveness, there will be no exceptions or exclusions in how we reflect a batter’s rate at accomplishing these objectives. Our only limitation will be that we will restrict ourselves to those events that occur due to the action of the plate appearance. By this I mean that baserunning and fielding actions that occur following the initial result of the plate appearance are not to be considered. For instance, events like a runner advancing due to the ball being thrown to a different base, or a secondary fielding error that allows runners to advance, are to be ignored.

The basic measurement of success in this system is the reach (Re), which is credited to a batter any time he reaches first base without causing an out.[iii] A batter could receive credit for a reach in a myriad of ways: on a clean hit,[iv] a defensive error, a walk, a hit by pitch, interference, obstruction, a strikeout with a wild pitch, passed ball, or error, or even a failed fielder’s choice. The only essential element is that the batter reached first base without causing an out. The inclusion of the failed fielder’s choice may seem counterintuitive, as there is an implication that the fielder could have made an out if he had thrown the ball to first base, but “could” is opinion rearing its ugly head and this statistic is free of such bias.

The basic average resulting from this counting statistic is effective On Base Average (eOBA), which reflects the rate at which a batter reaches first base without causing an out per plate appearance.

eOBA = Reaches / Plate Appearances = Re/PA

Note that unlike the traditional on base average, all plate appearances are counted, not just at-bats, walks, times hit by the pitcher, and sacrifice flies. MLB may be of the opinion that batters shouldn’t be punished when they “play for the side” by making a sacrifice bunt, but that opinion is irrelevant for eOBA; the batter caused an out, nothing else matters.[v]

eOBA measures the rate at which batters accomplish their first main objective: not causing an out. To measure the second objective, advancing as many bases as possible, we’ll define the second basic measurement of success as total bases reached (TBR), which reflects the number of bases to which a batter advances due to a reach.[vi] So, a walk, a single, and catcher’s interference, among other things, are worth one TBR; a two-base error and a double are worth two TBR; etc.

The average resulting from TBR is effective Total Bases Average (eTBA), which reflects the average number of bases to which a batter advances per plate appearance.

eTBA = Total Bases Reached / Plate Appearances = TBR/PA

We now have ways to measure the rate at which a batter does not cause an out and how far they advance on average in a plate appearance. While these are the two main objectives for batters, it can be informative to know similar rates for when a batter attempts to make contact with the ball.

To build such averages, we need to first define a statistic that counts the number of attempts by a batter to make contact, as no such term currently exists. At-bats come close, but they have been altered to exclude certain contact events, namely sacrifices. For our purposes, it is irrelevant why a batter attempted to make contact, whether to sacrifice himself or otherwise, only that he did so. We’ll define an attempt-at-contact (AC) as any plate appearance in which the batter strikes out or puts the ball into play. The basic unit to measure success when attempting to make contact is the reach-on-contact (C), for which a batter receives credit when he reaches first base by making contact without causing an out. A strikeout where the batter reaches first base on a wild pitch, passed ball, or error counts as a reach but it does not count as a reach-on-contact, as the batter did not reach base safely by making contact.

The basic average resulting from this counting statistic is effective Batting Average (eBA), which reflects the rate at which a batter reaches first base by making contact without causing an out per attempt-at-contact.

eBA = Reaches-on-Contact / Attempts-at-Contact = C/AC

Finally, we’ll define total bases reached-on-contact (TBC) as the number of bases to which a batter advances due to a reach-on-contact. The average resulting from this is effective Slugging Average (eSLG), which reflects the average number of bases to which a batter advances per attempt-at-contact.

eSLG = Total Bases Reached-on-Contact / Attempts-at-Contact = TBC/AC

The two binary effective averages—eOBA and eBA—are the most basic tools we can build to describe the value produced by batters. They answer a very simple question: was an out caused due to the action in the plate appearance. There are no assumptions made about whose effectiveness caused an out to be made or not made, we only note that it occurred during a batter’s plate appearance; these are “real and indisputable facts.”

The value of these statistics lies not only in their reflection of whether a batter accomplishes his first main objective, but also in their linguistic simplicity. Miguel Cabrera led qualified batters with a .442 OBA in 2013. This means that he reached base while showing batting effectiveness (i.e. through a hit, walk, or hit by pitch) in 44.2 percent of the opportunities he had to show batting effectiveness (i.e. an at-bat, a walk, a hit by pitch, or a sacrifice fly). That’s a bit of a mouthful, and somewhat convoluted. Conversely, Mike Trout led all qualified batters with a .445 eOBA in 2013, meaning he reached base without causing an out in 44.5 percent of his plate appearances. There are no exceptions that need to be acknowledged for plate appearances or times safely reaching base that aren’t counted; it’s simple and to the point.

The two weighted effective averages—eTBA and eSLG—depend on the scorer to determine which base the batter reached due to the action of the plate appearance, and thus reflect a slight level of estimation. As we want to differentiate between actions caused by a plate appearance and those caused by subsequent baserunning and fielding, it’s necessary for the scorer to make these estimations. This process at least comes with fewer difficulties, in general, than those that can arise when scoring a hit or an error. No matter what we do, official scorers will always be a necessary evil in the game of baseball.

While I won’t get into any real analysis with these statistics yet, accounting for all results can certainly have a noticeable effect on how we may perceive the value of some players. For example, an average batter last season had an OBA of .318 with an eOBA of .325. Norichika Aoki was well above average with a .356 OBA last season, but by accounting for the 16 times he reached base “inefficiently,” he produced an even more impressive .375 eOBA. While he was ranked 37th among qualified batters in OBA, in the company of players like Marco Scutaro and Jacoby Ellsbury, he was 27th among qualified batters in eOBA, between Buster Posey and Jason Kipnis; a significant jump.

In the past, we have only cared about how many total bases a batter reached when he puts the ball into play, which is a disservice to those batters who are able to reach base at a high rate without swinging. Joey Votto had an eSLG of .504 last season – 26th overall among qualified batters. However, his eTBA, which accounts for the 139 total bases he reached when not making contact, was .599 – 7th among qualified batters.

This is certainly not the first time that such a method of tracking value production has been proposed, but it never seems to gain any traction. The earliest such proposal may have come in the Cincinnati Daily Enquirer on 14 August 1876, when O.P. Caylor suggested that there was a strong probability that “a different mode of scoring will be adopted by the [National] League next year”:

“Instead of the base-hit column will be the first base column, in which will be credited the times a player reached first base in each game, whether by an error, called balls, or a safe hit. The intention is to thereby encourage not only safe hitting, but also good first-base running, which has of late sadly declined. Players are too apt, under the present system of averages, to work only for base hits, and if they see they have not made one, they show an indifference about reaching first base in advance of the ball. The new system will make each member of a club play for the club, and not for his individual average.”

Of course, this new mode was not adopted. However, the National League did count walks as hits for a single season in 1887; an experiment that was widely despised and abandoned following the end of the season.

It has been 147 years since Henry Chadwick introduced the hit and began the process of estimating batting effectiveness. Maybe it’s time we accept the limitations of these estimations and start crediting batters for “reaching first base in advance of the ball” and advancing as far as possible, no matter how they do so.


 

[i] Whether it’s the catcher, pitcher, or manager who ultimately decides on what pitch is to be thrown is somewhat irrelevant. The goal of the pitching battery is to execute pitches that offer the greatest chance to help the pitching team, whether that’s by trying to strike out the batter, trying to induce weak or inferior contact, or trying to avoid the potential for any contact whatsoever.

[ii] Technically, it only had a true distance of 443 feet—not terribly deep in the grand pantheon of home runs—but the illusion works for me on many levels.

[iii] The fundamental principle of this system, that a reach is credited when an out doesn’t occur due to the action of the plate appearance, means that some plays that end in outs are still counted as reaches. In this way, we don’t incorrectly subtract value that was lost due to fielding and baserunning following the initial event. For instance, if a batter hits the ball cleanly into right field and safely reaches first base, but the right fielder throws out a baserunner advancing from first to third, the batter would still receive credit for a reach. Similarly, if a batter safely reaches first base but is thrown out trying to advance to second base, for consistency, this is considered a baserunning mistake and Is still treated as a reach of first base.

[iv] There is one type of hit that is not counted as a reach. When a batted ball hits a baserunner, the batter receives credit for a hit while an out is recorded, presumably because it is considered an event that reflects batting effectiveness. In this system, that event is treated as an out due to the action of the plate appearance—a failure to safely reach base.

[v] Sacrifice hits may be strategically valuable events, as the value of the sacrifice could be worth more than the average expected value that the batter would create if swinging away, but they are still negative events when compared to those that don’t end in an out—a somewhat obvious point, I hope. The average sacrifice hit is significantly more valuable than the average out, which we will show more clearly in Part III, but for consistency in building these basic averages, it’s only logical to count them as what they are: outs.

[vi] There are occasionally plays where a batter hits a groundball that causes a fielder to make a bad throw to first, in which the batter is credited with a single and then an advance to second on the throwing error. As the fielding play is part of the action of the plate appearance—it occurs directly in response to the ball being put into play—the batter would be credited with two TBR for these types of events.


 

I’ve included links to spreadsheets containing the leaders, among qualified batters, for each effective average, as well the batters with the largest difference between their effective and traditional averages, for comparison. Additionally, the same statistics have been generated for each team along with the league-wide averages.

2013 – Effective Averages for Qualified Players

2013 – Largest Difference Between Effective and Traditional Averages for Qualified Players

2013 – Effective Averages for Teams and Leagues


Feasting on Garbage: Early Strength of Schedule and Team Offense

The Oakland Athletics and the Colorado Rockies are two of the most productive offenses in the league this year, both ranking in the top 5 teams by wRC+. By contrast, the Brewers and Cardinals have been below-average so far, with a 93 wRC+ and 96 wRC+ respectively. Could the strength of these teams’ early schedules be a factor in these varying levels of production?

To evaluate this, I tabulated the actual innings pitched by opponents of the Athletics, Rockies, Brewers, and Cardinals so far in 2014, and then tabulated the anticipated innings for upcoming opponents in June, assuming 9 innings per game. (You could pick any four teams you wanted; these were the ones that interested me). To evaluate the quality of the pitching staffs faced, I used SIERA (published here at FanGraphs) to evaluate the runs the pitching staffs would have been expected to give up, on average, in light of their actual skill sets. Last year, SIERA explained 63% (by r2) of the variance in runs given up by team pitching staffs, making it a good choice for this exercise. Because the pitchers faced in a game are largely outside an opposing team’s control, I used the current, team-average SIERA for each pitching staff, and weighted each inning of a team opponent by that value. I totaled the weighted values to get an aggregate SIERA for the collective opponents of each team.

Let’s start with quality of opposing pitchers for each team in the two months so far:

Opponent SIERA
Lg. Avg. Athletics Rockies Brewers Cardinals
3.73 3.86 3.65 3.58 3.62
AVG RUN EFFECT +7 -4 -8 -6

SIERA can be a difficult statistic to appreciate because it operates on a tighter curve than other pitching statistics (ERA, FIP), and small differences have a surprisingly large effect on runs allowed.  Remember that as with most pitching metrics, however, lower is better.

Let’s work from the league-average SIERA so far this year — 3.73 — to make some overall observations. First, the Rockies’ production is quite impressive, as they were facing above-average pitching skills yet managed to generate a 110 wRC+. The Athletics, on the other hand, generated the same 110 wRC+ as the Rockies, but the quality of competition was entirely different. For the past two months, they’ve had the privilege of teeing off on opponents with an average staff SIERA of 3.84. That is literally like facing a team slightly worse than the Astros (3.83 SIERA) every day for two months.

Contrast that with the task faced by the Brewers and Cardinals so far. To date, the teams faced by those two clubs have posted an aggregate SIERA of 3.58 (Brewers) and 3.62 (Cardinals). On average, that’s like facing a top-10 pitching staff every day for two months. Is it all that surprising, then, that these two teams, widely thought to be above-average offensively when the season began, have struggled to live up to offensive expectations so far?

How does this difference actually affect runs scored? That is a tricky fact to isolate. Drawing a zero-coefficient, least-squares line, each .01 of SIERA has been worth about half of a run so far in 2014. (That rate is comparable to the entire season of 2013, suggesting that this ratio stabilizes fairly quickly). By that measure, as shown in the above table, we would expect their tough schedule to have cost the Brewers almost a win (8 runs) over average in runs scored so far, and almost a win-and-a half as compared to Oakland (15 runs difference). The Cardinals are not far behind.

But that is just the average runs lost, and does not account for the outliers. It probably won’t surprise you to learn that the largest deviations (residuals, technically) from the relatively modest average tend to come from teams at the bottom half of the pitching barrel. When these teams have a bad day, they are really bad, and they are prone to getting blown out. These teams include the White Sox, the Rangers, and the Astros — teams that, as it so happens, have been well-represented on the Athletics’ schedule to date. Certainly, we should expect good teams to blow bad teams out, but when your offensive success consists substantially of beating up bad pitching, it’s hard to say how good your offense really is. The Brewers and Cardinals, on the other hand, have enjoyed healthy servings of the Braves, the Cubs, the Reds, and also each other. All of those teams are in the top half of the league by SIERA, and none of them has a tendency toward outlier scores that allow an opponent to super-size their run differential.

What’s particularly interesting, though, is that this imbalance is about to change in the month of June. Here is how it looks right now:

Opponent SIERA
Lg. Avg. Athletics Rockies Brewers Cardinals
3.73 3.67 3.48 3.87 3.75
AVG RUN EFFECT -3 -13 +7 +1

Things project to be different this month. In June, it is the Brewers’ turn to feast on garbage pitching, as they essentially get to bat against the Astros pitching staff for the entire month (3.87 SIERA). The Cardinals aren’t quite as fortunate, although they still get to face slightly below-average pitching (akin to facing the Rays every day), whereas the Athletics at least have to face a top-half schedule by aggregate SIERA. The poor Rockies, on the other hand, fare worst of all, with a schedule that could not be more grueling: the Braves, Brewers, Cardinals, Dodgers, and Nationals, among others. If the Rockies still come out of June with an above-average wRC+, we can safely say that they are probably a true-talent, above-average ball club, at least when healthy.

The point of all this is not to say that Oakland is some kind of fluke. That team’s out-sized run differential is also a credit to excellent pitching, and it is not Oakland’s fault that it was assigned what turned out to be a favorable early schedule. Yet, this analysis provides yet another reason to be careful when relying upon early-season run differentials.  Before you get too enamored with a team’s production to date, take a close look at the opponents a team has played. You may find that a team’s seemingly-extraordinary results appear to be less so, when you properly weight the skills of the opponents who allowed those results to come about.

Follow Jonathan on Twitter @bachlaw.

Jonathan writes a weekly column about the Brewers at Disciples of Uecker. He has also published at Baseball Prospectus.  


What Data Can Tell Us About Kansas City’s Home Run Struggles

After getting out homered 5-0 by the Angels this weekend, the Royals sit at an underwhelming 20 home runs in 49 games, good for 30th in the league and less than half of the league average of 45. Early in the season, it can be tough to distinguish if under-performance in a certain outcome is due to random fluctuation or an actual decline in talent. Luckily, we have a litany of data at our disposal that can help to answer this question.

Since Kansas City does not have a lineup stacked with power hitters, and playing in Kauffman Stadium makes hitting home runs more difficult than many other stadiums, it’s preferable to compare current production to a projection system instead of league average in order to get a sense of the scale of the Royals’ current power struggles. This already takes into account both the team’s lineup and ballpark factors, giving us a better comparison. In the preseason, Steamer projected that the Royals would hit 126 home runs in 2014. Applying that projection to the 49 games Kansas City has played, we get that the team was projected to have scored 38 home runs through this point in the season. Using the linear weights from the wOBA formula, we can calculate that had the Royals hit 38 home runs as Steamer projected, they would own a (league average) .317 wOBA and a wRC of 202. Instead, Kansas City has a team wOBA of .296 and a 173 wRC. In essence, these 18 home runs have cost the team 29 runs in total, or 2.9 WAR.

Things should change going forward, however. Steamer posts daily updated projections that change as more historical data becomes available (i.e. more games are played) . Taking into account the abysmal start by KC, Steamer has updated their projected year end total home runs from 126 to 102. We already know that 18 of that 24 home run difference is historical, so the change in home runs projected through the rest of the season amounts to only 6 for the remaining 113 games. After factoring in playing time adjustments, Steamer has now discredited Kansas City 9 home runs that were expected at the beginning of the season. Although this represents a non-trivial  drop in home run rates, it is significantly less severe than the pace the Royals have set so far this season.

This does make some sense. Steamer has years of major league performance data to shape player performance for each of Kansas City’s starters, and centuries of baseball data on which to base aging curves. It seems pretty unreasonable to significantly change a projection based on less than two months of data from the current season. This would be especially unreasonable given that home run rates do not stabilize for a given player until about 300 plate appearances. Eric Hosmer has the most PA on the team at 218, so it will probably be another month before we have an idea of whether or not the Royals’ power outage is anything more than random fluctuations.

Another reason we might expect that this trend will not sustain is that much of it appears to be luck-based. Over the past five years, the Royals have had a HR/FB of around 8%, and the lowest they posted over a full season in that time frame was 6.9% in 2010. So far this season, Kansas City has a HR/FB of 4.5%. In addition, the team has hit 7 more doubles than Steamer projected for the season so far, supporting the theory that the Royals have had more than their fair share of balls land just on the wrong side of the fence. This does not account for all 18 home runs that were projected to be hit and were not, however. Bad luck only explains so much, and the majority of KC’s offensive woes still should be credited to poor hitting.