Archive for Player Analysis

2014’s Most Average Hitter

The premise of this article is a very simple one: which hitter has been the most average in 2014? Considering this question led me through a very simple process, and to a very sad answer (I urge you not to look at the links until the end because suspense). To the leaderboards we go!

Seeing as we’re looking for the most average hitter (not considering defense), and wRC+ is a hitting statistic designed to compare hitters against the average, it seems like a natural starting place. Considering only players with wRC+ between 95 and 105 gives us a list of 24 players.

Next, let’s look at wRC+’s partner in crime: wOBA. League average for wOBA is .316, so this round we’ll be restricting our list of 24 even further, only looking at hitters with wOBA between .310 and .320. Doing so cut our list (almost) in half! We are now left with only 13 players, progress!

Now that we’ve condensed the list based on production, it’s now time to look at the composition of said production. Our average player should have a BB% of about 7.9, and a K% of 19.8. Adjusting our leaderboard leaves us with the three most average hitters in the league. One of these three is not a surprise. The other two are very sad surprises.

But we want 1 average player, not 3, so to narrow it down to the end, I have included another filter for ISO, because our most average hitter should hit for an average amount of power. This final filter leaves us with the single most average player in the major leagues, and fair warning, it will sadden you:

Evan Longoria: BB%: 8.8 / K%: 18.8 / ISO: .139 / BABIP: .287 / OBP: .324 / SLG: .390 / wOBA: .312 / wRC+: 102

League Average: BB%: 7.9 / K% 19.8 /ISO .140 / BABIP: .301 / OBP: .319 / SLG: .396 / wOBA: .316 / wRC+ 100

There was a time when Longoria was to baseball what Mike Trout is today (well maybe not quite on the same level). He came up in 2008 and was the the star of the Rays in their surprising march to the World Series. He showed off 100% not-fake, seemingly-superhuman powers. From 2008 to 2013, Longoria’s wRC+ was 15th in baseball, in a virtual tie with David Wright (who happens to be one of the other most average players). He was also the single most valuable position player by WAR (36.1) in that time. For the first 6 years of his career, Longoria was a model of offensive consistency.

2014 has been a different story though. I’m not the first to write about Longoria’s down year, so I’ll refer you to the works of Jeff Sullivan and James Krueger. The bottom line: Longoria’s bat speed is down, which is killing his power and his ability to hit inside fastballs. This can be seen in his power numbers: a .139 ISO is a far cry from his career ISO of .225 (for reference, David DeJesus has a career ISO of .140). Longoria’s only hitting 9.7% of his fly balls for home runs, compared to 15.5% for his career.

His power hasn’t fully disappeared, but it’s nowhere close to what it was. It’s this sort of sharp power decline that usually suggests some sort of injury à la Matt Kemp (.236 ISO in 2012, .125 in 2013 following a shoulder surgery). Longoria is not expected to miss much time with his latest foot injury, and as Krueger points out, Longoria himself has attributed these struggles with mechanical issues. However, if I were a betting man (or at least old enough to legally gamble in casinos), I would put money on the Rays’ third baseman undergoing some sort of procedure over the offseason.

Now the good news for the Rays is this: even as the league-average hitter, Longoria is still very valuable. Dave Cameron ranked him 9th in his trade value series, no doubt in large part due to his superb defense and very team-friendly contract. Projections have Longoria finishing 2014 with 4.0 WAR. If the cost of a win is approximately $6 million, then he’s worth about $24 million in 2014, but only being paid $7.5 million. Even if Longoria continues to be a league-average bat with excellent defense, he will be very underpaid and very valuable. Really goes to show how great that contract was, huh?

Even more fortunate for Rays’ fans is that given Longoria’s career history, this sort of drop off in offensive production likely is not representative of his true-talent level. While his days as a ~135 wRC+ hitter may be behind him, 119 games is not a huge sample size and Longoria is still just 28. It’s likely that Longoria’s production increases closer to his career averages (Oliver has him 126 wRC+ for next year, which definitely passes the sniff test). The fact remains: Evan Longoria, despite being the most average hitter in baseball, is still one of the most valuable. Now we’ll just have to see what happens to that other average-hitting third baseman.


The Chicago Cubs and the Terrible, Horrible, No Good, Very Bad Junior Lake Season

The future is bright on the North side of Chicago. As Rany Jazayerli pointed out this week on Grantland, that future is especially bright in the batter’s box. With an absolute stud to build around, arguably the best bat in the minor leagues right now, a 24-year old short stop who has been worth over ten wins in his career so far, and seemingly every single middle infield prospect in the world, even the biggest curmudgeon on TV seems to be enthusiastic about the Cubs future. However, if this article were written a year ago today, there would be another name that Cubs fans would be screaming at me for leaving off, Mr. Junior Lake. While Lake was not the biggest prospect in the Cubs’ system, when he burst on to the scene last July, excitement was high. It took him only five games to tie a major league record, as no player has ever had more than the 12 hits he garnered through his first five games.

After six games, Lake had an article of his own on FanGraphs, (deservedly) espousing his hot start, while (prophetically) looking at a few red flags in Lake’s game (more on that later). In those six games he slashed .519/.536/.852 with a pair of home runs and three doubles. He even continued to hit well for the remainder of the season, finishing with a wRC+ of 110 and despite below average defense in the field, was still worth over a win to the Cubs for the 64 games in which he appeared. Coming into this season, Lake was a 23-year old with a decent upside who had Cubs’ fans excited for the future.

Cut to: August 13, 2014.

Lake has lost his regular spot in the lineup, and no one has deserved to lose their spot in the lineup more this season. Now Lake may bounce back to be a decent contributor for future iterations of the Cubs (again, more on that in a little bit), but his numbers this season are incredibly poor. Check out his ranks (these will all be from the bottom ranks, as in first really means the worst in the league) in these essential statistics in 2014:

Stat Lake in 2014 League Rank (min. 300 PAs)
OBP .246 2nd
wOBA .269 11th
LD% 15.6 8th
wRC+ 66 10th
fWAR -0.7 6th

Being near the bottom of one of these statistics is hardly a death knell (check out the killer lineup that could be created with the lowest line-drive rates in the league), but if you’re at the bottom of the barrel across the board, it’s fair to say you’ve Schruted up your season in a big way.

So what has been the cause of Lake’s collapse at the plate? The answer is pretty simple. All that’s needed is a trip over to the plate discipline graphs on this very website. Once again using from the bottom ranks, check out some of Lake’s plate discipline statistics:

Stat Lake in 2014 League Rank (min. 300 PAs)
K% 33.4% 3rd
BB% 3.3% 7th
BB 10 T-1st
BB/K 0.10 1st
O-Swing% 43.0% 7th
Z-Swing% 77.2% 7th
Swing% 58.1% 4th
O-Contact% 43.3% 5th
Z-Contact% 74.5% 4th
Contact% 61.6% 3rd
F-Strike% 69.5% 1st
SwSt% 21.7% 1st

As someone who is neither here nor there on the Cubs (aside from a long-term bet in which I have them making the playoffs before the Astros), I think it’s fair to look at this man who has somehow combined a Pablo Sandoval-esque lack of patience at the plate with a Chris Carter-esque lack of contact on those reckless swings, and simply say, “I’m not even mad, that’s amazing.”

That duo isn’t easy to pull off. Of the top thirty highest swing rates among hitters with at least 300 plate appearances, only one other hitter (Mike Zunino) has a contact rate in the sixties, and Zunino’s swing rate is lower and his contact rate is higher than Lake’s. Not to mention that Zunino’s defense has actually made him worth two wins for the Mariners this season, while only George Springer has committed more errors in the outfield than Lake in 2014.

Lake’s Brooks Baseball profile is maybe the most depressing thing in Chicago since this guy.

His performance against fastballs is described as, “an exceptionally aggressive approach at the plate (-0.19 c) with a disastrously high likelihood to swing and miss (30% whiff/swing),” against breaking pitches, “an exceptionally aggressive approach at the plate (-0.32 c) with a disastrously high likelihood to swing and miss (50% whiff/swing),” and against offspeed pitches, yeah, “an exceptionally aggressive approach at the plate (-0.65 c) with a disastrously high likelihood to swing and miss (55% whiff/swing).”

The man clearly needs a private lesson with Wade Boggs, although that might not even be enough. Interestingly enough, in his aforementioned piece on Lake last year, FanGraphs’ Bradley Woodrum spotted a couple of potential flaws that Lake would have to fix. Woodrum mentions Lake’s lack of plate discipline in the minor leagues, but he also touches on two other drawbacks to Lake’s game: his extremely “loud” swing, and his struggles with low sliders.

As far as the “loud” swing, scouting player’s swings is not my specialty, but his swing actually does seem a little toned down in 2014. Check out the gif used in last year’s FanGraphs piece showing Lake’s bat going crazy as the pitch comes in. Now here’s his toned down approach taken from the middle of this season. Well, toned down until he swings and misses, at least:

While his hands are still moving, they don’t seem to be doing so with the same reckless abandon as last year. That would seem to be a good sign, one that Lake is willing to tinker with his swing to get better results. As I said though, I am far from a scout, and would be curious to hear feedback on what others think about his swing.

In terms of his struggles with sliders, those have only been exasperated in 2014, as he has derived the fourth-lowest value against sliders this season, at -7.4 runs. And considering that that pitch value is a cumulative statistic and the three men in front of him all have more than 100 more plate appearances than Lake in 2014, it’s fair to say Lake has struggled as much as any hitter in baseball against the slider in 2014.

With Arismendy Alcantara having made a far smoother transition to the outfield (believe it or not, Lake was yet another middle infield prospect originally), and Jorge Soler/Kris Bryant due to be called up in the not-too-distant future, one has to wonder whether Lake’s shot at as a member of the Cubs is through with. The best option likely would have been to send him down to Triple-A about a month ago, as sitting on the bench in the major leagues will neither help his confidence, nor give him the chance to get in regular swings every day, and begin to tinker with his swing etc.

There is some evidence that Lake far prefers to play left field instead of center field, slashing .312/.333/.561 in his 44 games in left field, but this seems a little bit more noise than signal. However, given that the Cubs really have no motivation to win at this point, their best option may well be to put Chris Coghlan, their current left fielder and a useful piece for say the Oakland A’s, on waivers, and see what they have with Lake in left field for the remainder of the season. With a month and a half of season left, the Cubs could see if those splits really are statistically significant, and if they were, the Cubs could have yet another piece of their future lineup in place. And if not, there are plenty of reinforcements on the way.


Biogenesis Players: Then vs. Now

After watching Nelson Cruz this year and all the noise he has been making, on top of a recent report by Buster Olney stating, “The average distance of the fly balls pulled by Ryan Braun this season is down 42 feet, from 302 to 260…”, it inspired me to look up the numbers for players suspended in the Biogenesis case. The big four suspended were Alex Rodriguez, Ryan Braun, Nelson Cruz, and Jhonny Peralta. Other position players involved and suspended were Everth Cabrera, Jesus Montero, Francisco Cervelli, and Jordany Valdespin.

This article will focus on the big four with the exception of A-Rod because he has been suspended all season. Obviously enough this is a small sample size so take heed. I will be making a couple of assumptions, the main one being that these players had been using steroids for at least 3 years (2010-2012) prior to their being caught and suspended. The other assumption being that enough time has passed for the effects of the steroids to have worn off and that their bodies/abilities are back to their more natural state.

Ryan Braun 2010 2011 2012 2014 2014 (ZiPSU)
HR/FB 14.00% 18.80% 22.80% 15.10%
Slug% 0.501 0.597 0.595 0.496 0.505
ISO 0.197 0.265 0.276 0.211 0.231
WRC+ 134 171 160 129 133
OFF 32.5 58.8 52 12.5 21
True Distance (ft) 408.2 406.7 406.9 387.9
Average Speed Off Bat (mph) 105.1 104.7 104.2 102.1

 

Nelson Cruz 2010 2011 2012 2014 2014 (ZiPSU)
HR/FB 15.20% 18.70% 13.10% 20.00%
Slug% 576 509 460 513 505
ISO 258 246 200 253 246
WRC+ 147 116 105 130 127
OFF 26.6 7.7 0.8 14.9 18.5
True Distance (ft) 405.2 411.6 418.6 398.9
Average Speed Off Bat (mph) 105.2 106.4 106.8 104.2

 

Jhonny Peralta 2010 2011 2012 2014 2014 (ZiPSU)
HR/FB 7.50% 10.80% 8.30% 12.50%
Slug% 392 478 384 447 441
ISO 143 179 145 187 180
WRC+ 91 122 85 122 120
OFF -12.7 11.2 -13.8 8.4 10.3
True Distance (ft) 392.5 388.4 391.9 397
Average Speed Off Bat 101.2 102.3 101.7 102.8

 

The main thing that jumps out at you is that Cruz and Peralta are statistically putting up some of the best numbers of their careers (without a doubt, top 3)! Braun, however, is having his worst season of the 4 above, while Peralta and Cruz both are having their most powerful seasons yet. Their HR/FB rates are each at their highest as well as their ISO numbers, while again Braun’s are at his worst of the 4 seasons. Looking at WRC+ and OFF, Peralta is having his 4th best season ever, Cruz is having his 2nd best ever, and Braun is having the worst season of his career to date (with the possible exception of 2008).

Using ESPN’s hittrackeronline.com I looked up each player’s True Distance on home runs this year as well as the average exit speed velocity of their home runs. Ryan Braun has lost 3 mph which has correlated to a shortage of almost 20 feet on his balls. Nelson Cruz has lost about 2 mph and 20 feet off his home run balls from his peak of the four years. Jhonny Peralta, on the other hand, is showing his best numbers this year.

So what does all this mean? In summary, I believe the main thing we can take away from this is that each player who used steroids should be assessed on a case by case basis. Every player is affected differently. We cannot group all steroid users together. Using the above statistics as proof, after being charged in the Biogenesis case, 2 players are having among the best seasons of their careers while another is having his worst. In addition the best all-around athlete and youngest of the 3 (so therefore closest to his prime) is the one who is struggling most, Ryan Braun! Whether it is the HOF vote, or evaluating future value of perceived steroid users, we can’t lump them all into the same group and assume that they will automatically decline. Yes, using steroids is absolutely cheating, however it doesn’t necessarily mean that those players wouldn’t have been just as productive had they chosen legal supplements or nothing at all.


Xander Bogaerts’ Rookie Struggles

Coming into the year, the Boston Red Sox were riding high after the 2013 title in which they’d gone from worst to first. Just about everyone with a worthwhile opinion thought that’d they at least be in contention for the playoffs again this year, and it wasn’t uncommon to see people picking them to repeat in 2014.

One of the few questions people did have about the team was how would they integrate their two young players, Jackie Bradley, Jr. and Xander Bogaerts, in their first full seasons as starters. Of these two players, Bradley was the one that people seemed most concerned about. This made sense, since he was less regarded as a prospect than Bogaerts (number 2 overall on most prospect top 100 lists). But while Bradley has been a complete zero with the stick (57 wRC+), his defense has carried him to 1.5 fWAR so far this season. Bogaerts, on the other hand, has a wRC+ of 82, which combined with mediocre defense has left him hovering around replacement level.

Now, there’s no doubt that people are disappointed by Bogaerts’ season, and they have every right to be. Bogaerts was hyped as the rare prospect with superior skills and a significant amount of polish, and he showed why when he played like a veteran down the stretch in last year’s playoffs. Nobody was expecting him to be replicate Mike Trout’s rookie season, but a league average regular was probably a reasonable expectation. Obviously Bogaerts has underperformed relative to that standard.

Guys like Trout, Yasiel Puig and Manny Machado have essentially ruined the kind of expectations we now put on guys going through their first full seasons. Do you know how many batting-title-qualified rookies have had an OPS lower than Bogaerts’ current .650? 311! And of that 311, 283 of them were older than Bogaerts’ current 21 years of age. Bogaerts is struggling, but that’s what rookies do. There’s no greater jump in professional baseball than the one to the majors.

Bogaerts is actually hitting pretty well against fastballs and changeups. The crux of his issues this year have been against breaking balls. And there’s really no way to sugarcoat it. He’s been terrible against any and all spin, hitting just .143 and slugging .167. Unfortunately, opposing pitchers have noticed, and Bogaerts has only seen more breaking balls as the season has progressed.

plot_hco_bytime (1)

As the rate of breaking balls has gone up against Bogaerts, his numbers have gone down. The Red Sox shortstop was actually a well above average hitter heading into June (119 wRC+ in March/April, 149 wRC+ in May), but then everything fell apart. Bogaerts posted an almost unthinkable .426 OPS in June, a number less than half (.897) of what he posted the month before. He followed that up with a much improved, but still terrible July (.595 OPS) and continued to struggle in August.

Bogaerts’ struggles with breaking balls coincide with the part of his game that has perhaps regressed the most as his season has progressed: his plate discipline. After working 25 walks through the end of May, Bogaerts has been told to take his base just seven times since. A large part of that has been the decline of his ability to discriminate between a breaking ball thrown for a strike, and one thrown for a ball.

plot_hco_bytime

As you can see in the graph above, Bogaerts has stayed fairly steady against fastballs and changeups, but his ability to recognize breaking balls has completely melted away. As for why this has happened, that’s difficult to say. Maybe Bogaerts has always struggled against breaking pitches. But the most likely answer is that he’s a rookie struggling to adjust against pitchers capable of taking advantage of his weaknesses. Nevertheless, it’s at least been a prolonged slump, and one that Red Sox fans have to hope isn’t a glimpse into continual struggles for their youngest player.

Then, putting aside things that we can actually measure, there’s the possibility that Bogaerts is simply in his own head right now. As a ballplayer, he’d probably tell you he’s trying to do too much. There’s certainly something to that side of the argument. It can’t be easy to fail so spectacularly after being hyped as the next face of one of the most prestigious franchises in the game.

There’s also an argument to be made that some responsibility for Bogaerts’ struggles can be set at the feet of his manager, John Ferrell. There have been rumors that Ferrell was the person in the organization pushing the hardest for the Red Sox to resign Drew, which they ultimately did in late May. Drew, who had never played any position but shortstop in his big league career at that time, would be forcing Bogaerts over to third base, the position he played down the stretch of the 2013 title run. Bogaerts expressed some disappointment at time as a result, and an argument can be made that the team’s decision to resign Drew shook his confidence. Before Drew joined the lineup on June 2nd, Bogaerts was batting .296/.389/.427. Since then he’s hit .169/.201/.279. You might say that those dates are arbitrary and coincidental, and you can make of them as you wish. I will say that confidence is a huge part of succeeding in this game, and it should not be overlooked.

Overall, Bogaerts probably won’t look like he belongs back in AA forever, though we may have to wait until 2015 to see the player we were all hoping for. We got that player in the first couple months of this season, but pitchers’ adjustments, along with Bogaerts lack of adjustment to those adjustments, have torpedoed what was initially a very promising rookie year. That said, young players with Bogaerts pedigree and polish often turn into solid players at the very least, and I’m still as excited as ever about his career going forward. He’ll figure it out.


Does Troy Tulowitzki Suffer Without Carlos Gonzalez?

Does Troy Tulowitzki suffer without Carlos Gonzalez in the lineup?

Several weeks ago, in the same way my last article on rookie first and second half splits was inspired, my attention was alerted when a podcast personality contrived that Troy Tulowitzki, before his most recent bout with the injury bug, had performed poorly because Carlos Gonzalez had been out of the lineup.

The pundit grabbed the lowest handing fruit he could find in an effort to create a narrative, and a dogmatic one at that, as to why the Colorado Rockies slugger had not lived up to his pre All-Star break numbers.

******* *******’s (I’d prefer the article to be more about the subject of Tulowitzki and Gonzalez than the podcast member) argument was that without Carlos Gonzalez in the lineup, pitchers could approach Tulowitzki without fear, give him less strikes, and that is why his hitting has declined.

While this pundit surmised that Troy Tulowitzki’s performance declines when Carlos Gonzalez is out of the lineup, the numbers tell a much different story.

While we will look at the more direct numbers in a moment, the idea that Tulowitzki plays worse without Gonzalez is essentially the idea of lineup protection at a micro level. There have been countless instances that have debunked the idea of lineup protection, and, to my knowledge, none that have proved its existence.

Screen Shot 2014-08-10 at 6.02.45 PM

The research looked at all games from 2010—Carlos Gonzalez’ first complete season—to today.

The results paint a much lighter picture than the Guernica that ******* ******* painted.

In games where Tulo has played without Cargo, he has had a higher AVG, OBP, OPS, and BB%. One might think that Tulowitzki would continue his normal performance without Carlos Gonzalez in the lineup, but, as this information suggests, it is hard to imagine that Tulo plays better because Carlos Gonzalez is not in the lineup, which leads me to believe what one would normally think about out of the ordinary performances in a small amount of at bats.

The utility of these results should be used for descriptive, and not predictive, purposes. Troy Tulowitzki has only had 479 plate appearances without Carlos Gonzalez, and that is far from a large enough sample size to be deemed reliable.

But because of the recent remarks made by Tulowitzki, it seems like it will be more likely than not that sooner rather than later we will see a large enough of a sample size of Tulo in another uniform to see if this trend continues.

While Tulo has played worse and is hurt as of late, we might expect that it is because he was unlikely to live up to the performance he had in the first half, and not because of Cargo’s presence or lack thereof in the lineup. Over the course of the first half of the season, Tulowitzki’s posted the 15th best OPS in a half of a season since 2010.

Tulo’s latest play suggests a regression to the mean, and while we are powerless to know exactly why regression happens, some pundits proclaim to know the reason (i.e. Tulo plays worse without Carlos Gonzalez), when really their specious statement is noise with a coat of eloquent words painted upon it.

When the next “expert” tells you that Tulo has preformed poorly, because “ he wants out of Colorado” or  “he wants to be traded”, you’ll know to be more skeptical and not passively agree.

If he gets healthy at some point this season, we should expect Tulowitzki to perform close to his projections in all areas for the rest of the year, and it will be with or without Carlos Gonzalez, not because of him.


Manny Machado and Selective Agression

In the off season I wrote about how Manny Machado’s 2013 second-half struggles lied in his inability to select pitches he could hit. Essentially, his innate ability to get bat to ball combined with a poor understanding of knowing which pitches he could drive led to him swinging and making contact on pitches he could not barrel up. This led to an increase in fly balls – especially infield fly balls – which indicate poor contact is being made. Machado was swinging at junk and this caused his batting average and extra base hit production to plummet in the second half of 2013.

Fast forward to now, and Machado has been hitting .301/.340/.494 for the last two months after a cold start coming off of knee surgery. He has a .193 ISO during that time period as well as a 24.3% line drive rate. He certainly has been barreling up the ball for the last two months and has been one of the only Orioles hitters  doing so since the All Star break. Therefore, I wanted to see if Machado has changed his approach in any meaningful way and has learned to be selectively aggressive. Meaning, while he still is never going to be an on base machine, he can still be patient enough to wait on pitches he knows he can hit and hit well rather than making contact on pitches he cannot hit well.

Looking into his basic 2014 plate discipline numbers, interestingly, reveals little to no positive change from 2013. He is swinging more at pitches in the zone and out of the zone. He has an O-Swing% of 32.8%, a Z-Swing% of 68.6% and an overall swing% of 49.9% all of which are two to three percentage points higher than last year. Furthermore, he is making less contact on pitches in and out of the zone. His O-Contact% is 63.5% and his Z-Contact% is 85.4% alongside an overall contact% of 77.9% all of which are three to four points lower than last season. If Manny was swinging and barreling up better pitches to hit his swing rate may be the same, but he should not be swinging at pitches out of the zone more often. Furthermore, his contact rate would be higher especially on pitches in the zone, which it is not. Also, his swinging strike rate is up and his pitches seen in the zone are down. So, if those numbers are not showing why Manny is being more successful to date this year, then either there is another reason or it has simply been luck so far.

A quick look at some other figures tells a slightly different story.  His walk rate is nearly two points higher to date this season and his pitches per plate appearance is up from 3.53 P/PA to 3.68 P/PA. While that may not seem like an astronomical increase, it is significant. Manny had 710 plate appearances last season, if he had this season’s rate of P/PA he would have seen 106.5 more pitches last year. Therefore, his increasing strikeout rate is not surprising simply because he is seeing more pitches. He also is not striking out at an absurdly high rate to begin with, only slightly above 2014 league average. Basically, Machado is seeing more pitches this year than last, and this has led to a higher walk rate and a higher strike out rate.  This, however, does not quite prove the theory of selective aggression that I am purporting.

Using heat maps, this theory can truly be put to the test. Manny may be swinging slightly more and making a little less contact, but what matters here is whether or not he is swinging at good pitches for him to hit, which his recent numbers and line drive rate would suggest he is doing. Below are two heat maps. One is of Manny’s 2013 season swing rates by pitch location, the whole season, and the second one is his 2014 season swing rates by pitch location to date.

Manny 2013 Swing Manny 2014 Swing

Of note, Manny Machado thus far in 2014 has swung significantly less at pitches out of the zone that are down and away, down and in, and up and in.  Also, he has focused on swinging at pitches that are middle in, middle, down, middle up, and even up and away. This allows him to extend his arms and drive the ball, especially the other way. In 2013, Manny focused much more on pitches in the middle of the plate and up and in. The swings at pitches that are up and in especially, and the other problem areas as well, zapped his ability to make solid contact and nosedived his 2013 offensive production.

Next up in this what is turning more and more into a slideshow are contact rate heat maps for 2013 (first picture) and to date in 2014 (second picture).

Manny 2013 Contact

Manny 2014 Contact

Manny seemed to make much more contact on pitches inside and outside the strike zone in 2013.  In particular, he made contact at a much higher rate on pitches up and in, down and in, and up and away. These are pitches that Manny simply cannot drive well, which means that if he is making contact with these pitches they are most likely to be outs, which in turn led to his struggles in the second half of 2013.  In 2014 his contact rates are much more concentrated within the strike zone and specifically middle in, down, and up. He is still making lots of contact on pitches too far up and away and down and away, but much lower than he was in 2014. This minimizes the bad contact and allows him to see more pitches that he can make hard contact on.

To bring it all home, below are two more heat maps. These heat maps are Manny’s batting average by pitch location again for the entirety of the 2013 season and the 2014 season to date. Again, the first one will be 2013 and the second one will be 2014.

Manny 2013 AVG

Manny 2014 AVG

These heat maps reveal more about how Manny’s approach at the plate has transformed. In 2013, the averages were decently high all around the plate and even out of the zone. However, this is not necessarily a great thing. Manny was swinging at pitches wherever they may be and his average was not great in many of those pitch locations. Fast forward to 2014, and the hitting zones are much more concentrated and with higher batting averages. The section that is middle in Manny is hitting .242 on pitches thrown to that location and is swinging 82% of the time at pitches in that location, tied for highest of any spot on his 2014 swing map.  He is swinging and driving pitches that are middle in, up, and down. He can drive the ball by extending his arms on up and away pitches and he can pull his arms tight to either pull the middle pitches or inside out them to center field or right field. These are the pitch locations that Manny can hit and hit hard and he is swinging more at those pitches than he was in 2013.

The adjustments made to Machado’s plate discipline provide a selective aggression that make him a better batter. As stated before, he is unlikely to become an on base machine. But, Manny has shown that he can hit doubles and home runs. If he maintains a higher average and his selectively aggressive eye at the plate he can continue to be an all star level player for the Orioles. Time will tell how pitchers adjust and how he adjusts, but the developments this year over last provide a great picture into Machado’s ability to adapt and thrive.

This post was originally posted to www.Orioles-Nation.com on 8/8/2014


Using xBABIP to Examine the Offensive End of the Mets’ Shortstop Dilemma

It’s no secret that a vast majority of Mets fans want Wilmer Flores to be playing shortstop every day. It’s also no secret that manager Terry Collins has some strange infatuation with Ruben Tejada, opting again and again to give him starts at shortstop.

Although Collins hasn’t given the media any clear reasoning as to why this is, there are a few reasons we can speculate. The biggest one is defense — Ruben Tejada has made major strides at shortstop this season, posting the highest DRS of his career. Flores, on the other hand, is a second baseman, and even his defense at second is questionable — he really profiles more as a corner infielder. However, with the other three infield positions being blocked by Daniel Murphy, David Wright, and the new-and-improved Lucas Duda, Ruben Tejada is the odd man out.

The other side of the coin is the one I’m going to be focusing on: offense. When Tejada started getting regular playing time as a 21-year-old in 2011, he showed some legitimate offensive potential, hitting line drives at an extremely impressive 28.1% rate (would have ranked 2nd among qualified batters,) good for .287/.345/.345 in 877 PAs between 2011 and 2012. Then, in 2013, he came to spring training out of shape, hit .202, got sent down, got hurt a couple times, and basically threw yet another monkey wrench into the Mets’ rebuild. The job became his to lose in 2014, and he’s hit a measly .228/.348/.280, the OBP even being inflated by the amount of intentional walks he received in the 8 hole. His 0.4 fWAR this season cancels out his -0.4 last season, making him a perfect replacement-level player.

Meanwhile, Wilmer Flores has been a top offensive prospect in the Mets system since he was signed out of Venezuela as a 16-year-old in 2007.  His numbers finally started to reflect his talent in 2012, when he hit .300/.349/.479 between high A and AA. In 2013, he exploded in AAA, and the past two seasons has hit .321/.360/.543 with 28 home runs and 47 doubles in exactly 162 games. Sure, he plays in Vegas, one of the most hitter friendly parks in AAA, but these are still numbers that demand attention — attention that he hasn’t yet seemed to receive from Terry Collins. Despite Tejada’s offensive struggles, he has still started 86 games at short this season, as opposed to Flores’ 20. One of the reasons a few Mets fans have been pointing to is the fact that Flores has yet to actually produce at the major league level, hitting only .220/.254/.304 in his 201 big league plate appearances. But is that slash-line an accurate reflection of his talent? And, for that matter, is Tejada’s?

For this mini-evaluation, we’ll use slash12’s xBABIP formula. It’s never a perfect system, but it will give us a good estimation of what these players slash-lines should look like (or at least their average and OBP.)

After inserting Ruben Tejada’s batted ball profile, we get that his xBABIP for 2014 is .329 — much higher than his actual BABIP of .288. We can then plug that backwards into the BABIP formula to determine how many hits he theoretically should have. Since the formula is (H-HR)/(AB-HR-K+SF), we can plug in everything except for hits to get (H-2)/(289-2-65+0)=.329, simplify that to (H-2)/(222)=.329, multiply both sides by 222 to get H-2=73, and we can come to the conclusion that Ruben Tejada should have 73 hits on the year, instead of the 66 he has. This would make his batting average .253 and his OBP .364 (although, keep in mind that that’s being inflated by the 10 intentional walks he’s had while hitting 8th in the order. If we decided to remove those, his OBP would drop to .345).

Now, doing the same to Wilmer Flores is slightly tricky, as we don’t have nearly as large a sample size worth of batted ball data to use. In the interest of accuracy, we’ll use his career profile, so we can at least get a sample of 201 PAs instead of his 100 this year. Plugging his batted ball profile into the xBABIP calculator, we get a result of .333, compared to his actual career BABIP of .268. Doing the same backwards math we did with Tejada, this brings his expected career batting average up to .272, and his career OBP up to .304.

Now, these are only two stats, and they only tell us so much — Flores seems to be a better hitter, but his career 4.5% BB rate is clearly overmatched by Tejada. There isn’t a formula out there for expected slugging percentage — at least, as far as I know — so we can’t really determine what that would be (and subsequently, what their OPS would be). We could assume the same ISO, which would not be entirely accurate, but it would give us a .305/.669 for Tejada and a .355/.659 for Flores. Still, I think it’s clear, both from my biased perspective as a Mets fan and my objective perspective as a baseball fan, that Flores has the brighter future offensively — but it’s up to the Mets to decide how to capitalize on it.


Foundations of Batting Analysis: Part 4 — Storytelling with Context

Examining the foundations of batting analysis began in Part 1 with an historical examination of the earliest statistics designed to examine the performance of batters. In Part 2, I presented a new method for calculating basic averages reflecting the “real and indisputable” rate at which batters reached base. In Part 3, I examined the development of run estimation techniques over the last century, culminating with the linear weights system. I will employ that system now as I reconstruct run estimation from the bottom up.

We use statistics in baseball to tell stories. Statistics describe the action of the game or the performance of players over a period of time. Statistics inform us of how much value a player provided or how much skill a player showed in comparison to other players. To tell such stories successfully, we must understand how the statistics we use are constructed and what they actually represent.

A single, for instance, seems simple enough at first glance. However, there are details in its definition that we sometimes gloss over. In general, a single is any event in which the batter puts the ball into play without causing an out, while showing an accepted form of batting effectiveness (reaching on a hit), and ultimately advancing to first base due to the primary action of the event (before any secondary fielding errors or advancement on throws to other bases). Though this is specific in many regards, it is still quite a broad definition for a batting event. The event could occur in any inning, following any number of outs, and with any number of runners on the bases. The ball could be hit in any direction, with any speed and trajectory, and result in any number of baserunners advancing any number of bases.

These kinds of details form the contextual backdrop that characterizes all batting events. When we construct a statistic to evaluate these events, we choose what level of contextual detail we want to consider. These choices define our analysis and are critical in developing the story we want to tell. For instance, most statistics built to measure batting effectiveness—from the simple counting statistics like hits and walks, to advanced run estimators like Batter Runs or weighted On Base Average (wOBA)—are constructed to be independent of the “situational context” in which the events occur. That is, it doesn’t matter when during the game a hit is made or if there are any outs or any runners on the bases at the time it happens. As George Lindsey noted in 1963, “the measure of the batting effectiveness of an individual should not depend on the situations that faced him when he came to the plate.”

Situational context is the most commonly cited form of contextual detail. When a statistic is described as “context neutral,” the context being removed is very often the one describing the out/base state before and after the event and the inning in which it occurred. However, there are other contextual details that characterize the circumstances and conditions in which batting events occur that also tend to be removed from consideration when analyzing their value. Historically, where the ball was hit, as well as the speed and trajectory which it took to reach that location, have also not been considered when judging the effectiveness of batters. This has partly been due to the complexity of tracking such things, especially in the century of baseball recordkeeping before the advent of computers. Also, most historical batting analyses focus exclusively on the outcome for the batter, independent of the effect on other baserunners. If the batter hits the ball four feet or 400 feet but still only reaches first base, there is no difference in the personal outcome that he achieved.

If the value of a hit was limited to only how far the batter advances, then there would be no need to consider the “batted-ball context,” but as F.C. Lane observed in 1916, part of the value of making a hit is in the effect on the “runner who may already be upon the bases.” By removing the batted-ball context when considering types of events in which the ball is put into play, we’re assuming that a four-foot single and 400-foot single have the same general effect on other baserunners. For some analyses, this level of contextual detail describing an event may be irrelevant or insignificant, but for others—particularly when estimating run production—such a level of detail is paramount.

Let’s employ the linear weights method for estimating run production, but allow the estimation to vary from one completely independent of any contextual detail to one as detailed as we can make it. In this way, we’ll be able to observe how various details impact our valuation of events. Also, in situations where we are only given a limited amount of information about batting events, it will allow us to make cursory estimations of how much they caused their team’s run expectancy to change.

To begin, let’s define the run-scoring environment for 2013.[i] While we have focused on context concerning how events transpired on the field, the run scoring environment is another kind of contextual detail that characterizes how we evaluate those events. The exact same event in 2013 may not have caused the same change in run expectancy as it would have in 2000 when runs were scored at a different rate. We will define the run scoring environment for 2013 as the average number of runs that scored in an inning following a plate appearance in each of the 24 out/base states – a 2013-specific form of George Lindsey’s run expectancy matrix:

Base State 0 OUT 1 OUT 2 OUT
0   0.47   0.24   0.09
1   0.82   0.50   0.21
2   1.09   0.62   0.30
3   1.30   0.92   0.34
1-2   1.39   0.84   0.41
1-3   1.80   1.11   0.46
2-3   2.00   1.39   0.56
1-2-3   2.21   1.57   0.71

While we will focus on examining various levels of contextual detail concerning the events themselves, the run-scoring environment can also be varied based on contextual details concerning the scoring of runs. The matrix we will employ, as defined by Lindsey, reflects the average number of runs scored across the entire league. If we wanted, we could differentiate environments by league or park, among other things, to try and reflect a more specific estimate of the number of runs produced. As the work I’m going to present is meant to provide a general framework for run estimation, and these adjustments are not trivial, I’m going to stick with the basic model provided by Lindsey.

With Lindsey’s tool, we can define a pair of statistics for general analysis of run production. Expected Runs (xR) reflect the estimated change in a team’s run expectancy caused by a batter’s plate appearances independent of the situational context in which they occur. A batter’s expected Run Average (xRA) is the rate per plate appearance at which he produces xR.

xRA = Expected Runs / Plate Appearances = xR / PA

xR and xRA create a framework for estimating situation-neutral run production. Based on the contextual specificity that is used to describe the action of a plate appearance, xR and xRA will yield various estimations. The base case for calculating expected runs, xR0, is calculated independently of any contextual detail, considering only that a plate appearance occurred. By definition, an average plate appearance will cause no change in a team’s run expectancy. Consequently, no matter a player’s total number of plate appearances, his xR0 and, by extension, his xRA0, will be 0.0.

This is completely uninformative of course, as base cases often are. So let’s add our first layer of contextual specificity by noting whether an out occurred due to the action of the plate appearance. This is the most significant contextual detail that we consider when evaluating batting events – it is the only factor that determines whether a plate appearance increases or decreases a team’s run expectancy. In 2013, 67.5 percent of all plate appearances resulted in at least one out occurring. On average, those events caused a team’s run expectancy to decrease by .252 runs. The 32.5 percent of plate appearances in which an out did not occur caused a team’s run expectancy to increase by .524 runs on average. We’ll define xR1 as the estimated change in run expectancy based exclusively on whether the batter reached base without causing an out; xRA1 is the rate at which a batter produced xR1 per plate appearance.

You’ll notice that the components that construct xRA1 can only take on two values—.524 and -.252—in the same way that the components that construct effective On Base Average (eOBA) (as defined in Part 2) can only take on two values—1 and 0. These statistics—xRA1 and eOBA—have a direct linear correlation:

1

In effect, xRA1 is a weighted version of eOBA, incorporating the same contextual details but on a different scale. This estimation provides us with an association between reaching base safely and producing runs. However, the lack of detail would suggest that all players that reach base at the same rate produce the same value, which is over simplified. It’s why you wouldn’t just use eOBA, or eBA, or any other basic statistic that reflects the rate which a batter reaches base, when judging the performance of a batter. Let’s add another layer of contextual detail to account for the different kinds of value a batter provides when he reaches base.

xR2 will represent the estimated change in run expectancy based on whether the batter safely reached base and the number of bases to which he advanced due to the action of the plate appearance; xRA2 will be the rate at which a batter produces xR2 per plate appearance. While xR1 and xRA1 were built with just two components to estimate run production, xR2 and xRA2 require five components: one to define the value of an out, and four to define the value of safely reaching each base.

In 2013, a batter safely reaching first base during a plate appearance caused an average increase of .389 runs to his team’s run expectancy. Reaching second base was worth .748 runs, third base was worth 1.026 runs, and reaching home was worth 1.377 runs on average. Where xRA1 provided a run estimation analog to eOBA, xRA2 is built with very similar components to effective Total Bases Average (eTBA), though it’s not quite a direct linear correlation:

The reason xRA2 and eTBA do not correlate with each other perfectly, like xRA1 and eOBA, is because the way in which a batter advances bases is significant in determining how valuable his plate appearances were. Consider two players that each had two plate appearances: Player A hit a home run and made an out, Player B reached second base twice. Their eTBA would be identical—2.000—as they each reached four bases in two plate appearances. However, from the run values associated with reaching those bases, Player A would record 1.125 xR2 from his home run and out, while Player B would record 1.496 xR2 from the two plate appearances leaving him on second base. Consequently, Player A would have produced a lower xRA2 (.5625) than Player B (.7480), despite their having the same eTBA. These effects tend to average out over a large enough sample of plate appearances, but they will still cause variations in xRA2 among players with the same eTBA.

As stated in Part 2, the two main objectives of batters are to not cause an out and to advance as many bases as possible. If the only value that batters produced came from accomplishing these objectives, then we would be done – xR2 and xRA2 would reflect the perfect estimations of situation-neutral run production. As I hope is clear, though, the value of a batting event is dependent not only on the outcome for the batter but on the impact the event had on all other runners on base at the time it occurred. Different types of events that result in the batter reaching the same base can have different average effects on other baserunners. For instance, a single and a walk both leave the batter on first base, but the former creates the opportunity for baserunners to advance further on average than the latter. To address this, the next layer of contextual detail will bring the official scorer into the fray. xR3 will represent the estimated change in run expectancy produced during a batter’s plate appearance based on:

(1)    whether the batter safely reached base,

(2)    the number of bases, if any, to which the batter advanced due to the action of the plate appearance, and

(3)    the type of event, as defined by the official scorer, that caused him to reach base or cause an out.

xRA3 will, as always, be the rate at which a batter produces xR3 per plate appearance.

Each of the run estimators that were examined in Part 3, from F.C. Lane’s methods through wOBA, are subsets of this level of xR. Expected runs incorporate estimations of the value produced during every event in which the batter was involved, including those which may be considered “unskilled.” The run estimators examined in Part 3 consider only those events that reflected a batter’s “effectiveness,” and either disregard the “ineffective” events or treat them as failures. xR3 provides the total value produced by a batter, independent of the effectiveness he showed while producing it, based solely on how the official scorer defines the events. Consequently, some events, like strikeouts, sacrifice bunts, reaches on catcher’s interference, and failed fielder’s choices, among other more obscure occurrences, are examined independently in xR3. From the two components of xR2 and the five of xR3, we build xR4 with 18 components: five types of outs and 13 types of reaches.

To help illustrate how xR has progressed from level to level, here is a chart reflecting the run values for 2013 as estimated by xR based on the contextual detail provided thus far.

xR Progression

Beyond any consideration of skilled or unskilled production, xR3 is the level at which most run estimators are constructed. It incorporates events that are well defined in the Official Rules of the game, and have been for at least the last few decades, and in some cases for over a century. While we still define most of a batter’s production by his accomplishing these events, we live in an era where we can differentiate between events on the field in more specific ways. Not all singles are identical events. We weaken our estimation of run production if we don’t account for the different kinds of singles, among other events, that can occur. xR3 brought the official scorer into action; xR4 will do the same with the stat stringer.

While the scorer is concerned with the result of an event, a stringer pays attention to the action in between the results. They chart the type, speed, and location of every pitch, and note the batted ball type (bunt, groundball, line drive, flyball, pop up) [ii] and the location to which the ball travels when put into play.While we don’t have this data as far back in time as we have result data, we do have decades worth of information concerning these details. By differentiating events based on these details, we will begin to unravel the “batted-ball context.” Ideally, we would know every detail of the flight of the ball, and use this to group together the most similar possible type of events for comparison.[iii] At present, we’re limited to what the scorers and stringers provide, but that’s still quite a lot of information.

xR4 will represent the estimated change in run expectancy produced during a batter’s plate appearance based on:

(1)    whether the batter safely reached base,

(2)    the number of bases, if any, to which the batter advanced due to the action of the plate appearance,

(3)    the type of event, as defined by the official scorer, that caused him to reach base or make an out,

(4)    the type of batted ball, if there was one, as defined by the stat stringer, that resulted from the plate appearance,

(5)    the direction in which the ball travelled, and

(6)    whether the ball was fielded in the infield or outfield.

xRA4 will be the rate at which a batter produces xR4 per plate appearance.

There are 18 components in xR3 which describe the assorted types of general events a batter can create.  When you add in these details concerning the batted-ball context, the number of components increases to 145 for xR4. With such specific details being considered, we can no longer rely on a single season of data to accurately inform us on the average situation in which each type of event occurs; the sample sizes for some events are just too small. To address this, there are two steps required in evaluating events for xR4. The first is to build a large sample of each event to build an accurate picture of their relative frequency in each out/base state. I’ve done this by using a sample covering the previous ten seasons to the one in which the estimations are being made. Once this step is completed, the run-scoring environment in the season being analyzed is applied to these frequencies, in the same way it is when looking at single season frequencies for basic events.

For instance, the single, which is traditionally treated as just one type of event, is broken into 24 parts based on the contextual details listed above. By observing the rate at which each of these 24 variations of singles occurred in each out/base state from 2004 through 2013, and applying the 2013 run-scoring environment, we get the following breakdown for the estimated value of singles in 2013:

Single Left Center Right   All
Bunt, Infield .418   .451  .436 .427
Groundball, Infield .358   .361  .384 .363
Pop Up, Infield .391   .359  .398 .369
Line Drive, Infield .343   .369  .441 .369
Groundball, Outfield .463   .464  .499 .474
Pop Up, Outfield .483   .480  .498 .488
Line Drive, Outfield .444   .463  .471 .460
Flyball, Outfield .481   .479  .490 .482

This process is repeated for every type of batting event in which the ball is put into play. One of the ways we can use this information is to consider the run value based not on the result of the event, but on the batted-ball context that describes the event. Here are those values in the 2013 run-scoring environment:

Popups Groundballs Fly Balls Line Drives All Swinging BIP
All Outs -.261 -.257 -.226 -.257 -.249
Infield Out -.260 -.257 ——- -.297 -.260
Outfield Out -.269 ——- -.226 -.233 -.229
Left Out -.262 -.260 -.230 -.251 -.253
Center Out -.262 -.281 -.223 -.257 -.257
Right Out -.260 -.229 -.227 -.262 -.237
All Reaches   .514   .468 1.108   .571   .629
Infield Reach   .436   .381 ——-   .390   .382
Outfield Reach   .517   .503 1.108   .572   .659
Left Reach   .516   .463 1.172   .577   .632
Center Reach   .535   .443 1.006   .546   .593
Right Reach   .483   .510 1.166   .593   .672
All Infield -.257 -.199 ——- -.267 -.211
All Outfield -.003   .503   .093   .402   .262
All Left -.219 -.058   .161   .332   .054
All Center -.205 -.078   .030   .312   .030
All Right -.191 -.069   .123   .326   .045
All -.207 -.068   .093   .323   .042

Similarly, we can break down each player’s xR4 by the value produced on each type of batted ball. Here are graphs for xR4 produced on each of the four types of batted balls resulting from a swing, with respect to the number of batted balls of that type hit by the player. For simplicity, from this point on, when I drop the subscript when describing a batter’s expected run total, I’m referring to xR4.

Line drives are the most optimal result for a batter. The first objective of batters is to reach base safely, and they did that on 67.0 percent of line drives last season. No batter who hit at least eight line drives in 2013 caused a net decrease in his team’s run expectancy during those events. For most batters, hitting the ball into the outfield in the air is the ideal way to produce value, as fly ball production tends to create a positive change in a team’s run expectancy. However, fly balls have the most variance of any of the batted ball types, and there are certainly batters who hurt their teams more when hitting the ball at a high launch angle than a low one. Here are the players to produce the lowest xRA on fly balls last season (minimum 50 fly balls):

Lowest xRA on Fly Balls, MLB – 2013
 (minimum 50 fly balls)
Pete Kozma, StL -.1626
Ruben Tejada, NYM -.1546
Cliff Pennington, Ari -.1513
Andres Torres, SF -.1465
Placido Polanco, Mia -.1224

For each of these batters, hitting the ball on the ground or on a line drive were far better results on average.

xRA by Batted Ball Type – 2013
FB GB LD
Pete Kozma, StL -.1626 -.0738 .2496
Ruben Tejada, NYM -.1546 -.0961 .1227
Cliff Pennington, Ari -.1513 -.0421 .3907
Andres Torres, SF -.1465 -.0155 .4269
Placido Polanco, Mia -.1224 -.0981 .1889

While groundballs may be a preferable result for some batters when compared to fly balls, they are still effectively batting failures for the team. There were 840 batters in 2013 to hit at least one groundball and only 44 produced a net positive change in their team’s run expectancy. Of those 44 players, only 11 hit more than 10 groundballs, and only two (Mike Trout and Juan Francisco) hit at least 100 groundballs. Here are the players with the highest xRA on groundballs in 2013 who hit at least 100 groundballs:

Highest xRA on Groundballs, MLB – 2013
 (minimum 100 groundballs)
Mike Trout, LAA   .0187
Juan Francisco, Atl-Mil   .0123
Brandon Barnes, Hou -.0076
Andrew McCutchen, Pit -.0081
Marlon Byrd, NYM-Pit -.0093

xR4 allows us to tell the most detailed story concerning the type of value a batter produced, independent of the situational context at the time the plate appearance occurred. Because we gradually added layers of detail to our estimation, we can compare how each level of expected runs correlates to this most detailed level. In this way, we can judge how much information each level provides with respect to our most detailed estimation. Here is a graph that charts a batter’s xR4 with respect to his xR1, xR2, and xR3 estimations:

The line that cuts through the data reflects the xR4 values charted against themselves. For each xRn, we can calculate how well it correlates with xR4 and, consequently, how much of xR4 it can explain. Remember that we have already shown that xR1 has a direct linear correlation with eOBA and xR2 has a very high, though not quite direct, correlation with eTBA. For the xR1 values, we observe a correlation, r, with xR4 of .912, and an r2 of .832, meaning that knowing the rate at which a batter reaches base explains over four-fifths of our estimation of xR4. For the xR2 values, r2 increases to .986; for the xR3 values, r2 increases slightly higher to .990.[iv]

The takeaway from this is that when considering the whole population of players, there is little difference in a run estimator that considers the batted-ball context and one that does not; you can still explain 99 percent of the value estimated by xR4 by stopping at xR3. In fact, if all you know is the rate at which a batter accomplishes his two main objectives—reaching base and advancing as far as possible—you can explain well over 90 percent of the value estimated by xR4. However, on an individual level, there is enough variation that observing the batted-ball context can be beneficial. Here are the five players with the largest positive and negative differences between their xR3 and xR4 estimations:

Largest Increase from xR3 to xR4, MLB – 2013
Player xR3 xR4 Diff
David Ortiz, Bos 44.1 48.2 +4.1
Kyle Seager, Sea 11.8 15.9 +4.1
Chris Davis, Bal 57.2 61.0 +3.8
Matt Carpenter, StL 36.6 40.3 +3.7
Freddie Freeman, Atl 38.6 41.9 +3.3

 

Largest Decrease from xR3 to xR4, MLB – 2013
Player    xR3    xR4 Diff
Adeiny Hechavarria, Mia -27.2 -32.9 -5.7
Jean Segura, Mil     9.7     4.2 -5.5
Jose Iglesias, Bos-Det     4.5    -0.1 -4.7
Elvis Andrus, Tex   -8.6  -12.9 -4.3
Alexei Ramirez, CWS   -1.9    -5.8 -3.9

These changes are not massive, and these are the extreme cases for 2013, but they are certainly large enough that ignoring them will weaken specific analyses of batting production. Incorporating batted ball details into our analysis adds a significant layer of complexity to our calculation, but it must be considered if we want to tell the most accurate story of the value a batter produced.

If this work seems at all familiar, you may have read this article that I wrote last year on a statistic that I called Offensive Value Added (OVA). For all intents and purposes, OVA and xR are identical. I decided that the name change to xR would help me differentiate estimations more simply, as I could avoid naming four separate statistics for each level of contextual detail, but there was also a secondary reason for changing the presentation of the data. OVAr was the rate statistic associated with OVA, and it was scaled to look like a batting average, much in the same way that wOBA is scaled to look like an on base average. At the time, I choose to do this to make it easier to appreciate how a batter performed, since many baseball enthusiasts are comfortable interpreting the relative significant of a batting average.

After thinking on the subject, though, I came to decide that I prefer statistics that actually “mean” something to those that give a general, unit-less rating. For instance, try to explain what wOBA actually reflects. It starts as a run estimator, but then it’s transformed into a number that looks like a statistic with specific units (OBA), while not actually using those units. Once that transformation occurs, it no longer reflects anything specific and only serves as a way to rate batters. The same principle applies to other statistics as well, most notably OPS, which is arguably the most meaningless of all baseball statistics, perhaps all statistics ever (don’t get me started).

xR and xRA estimate the change in a team’s run expectancy caused by a batter’s plate appearances. They are measured in runs and runs per plate appearance, respectively. xRA may not look like a number you’ve seen before, and generally needs to be written out to four decimal places instead of three, unlike basic averages, but it’s linguistically very simple to use and understand. I’d rather sacrifice the comfort of having a statistic merely look familiar and instead have it actually reflect something tangible. This doesn’t take away from the value of a statistic like wOBA, which is a great run estimator no matter what scale it is on; a lack of meaning certainly does not imply a lack of value. Introducing an unscaled run average, xRA, will hopefully create a different perspective on how to talk about batting production.

There is one final expected run estimation that I want to consider that could easily cover an entire new part on its own, but I’ll limit myself to just a few paragraphs. The xR estimations we have built have been constructed independent of the situational context at the time of the batter’s plate appearance. Since we want to cover the entire spectrum of context-neutral run estimation to context-specific run estimation, we will conclude by considering xRs, which is an estimate of the change in a team’s run expectancy based on the out/base state before and after the action of the plate appearance. This is very nearly the same thing as RE24 but it only considers runs produced due to the primary action of plate appearances and not baserunning events.

In many respects, xRs is the simplest run estimator to construct of all that we have built thus far. There are only three pieces of information you need to know in a given plate appearance to construct xRs: the run-scoring environment, the out/base state at the start of the action of the plate appearance, and the out/base state at the end of the action of the plate appearance. Next time you go to a baseball game, bring along a copy of a run expectancy matrix, like the one provided earlier. On a scorecard, at the start of every plate appearance, take note of the value assigned to the out/base state, making adjustments if any runners move while the batter is still in the batter’s box. Once the plate appearance is over, note the value of the new out/base state, separating out any advancement on secondary fielding errors or throws to other bases. Subtract the first value from the second value, and add in any RBIs on the play, and write the number in the box associated with the batter’s plate appearance; you just calculated xRs. Do this for a whole game, and you will have a picture of the total value produced by every batter based on the out/base state context in which they performed.

The effective averages and expected run estimations provide a foundation on which batting analysis can be performed. They combine both “real and indisputable facts” with detailed estimations of the run produced in every event in which a batter participates. Any story that aims to describe the value that a batter provides to his team must consider these statistics, as they are the only ones which account for all value produced. 147 years ago, Henry Chadwick suggested that batters should be judged on whether they passed a “test of skill.” I think they should be judged on whether they passed a “test of value.”

Thanks to Benjamin H Byron for editorial assistance, as well as the staff at the Library of Congress for assistance in locating original copies of the 19th century newspaper articles included in Part 1.

Here is data on eOBA, eTBA, and each level of xR and xRA estimation, for each batter in 2013.

Bibliography


 

[i] I’ll be focusing on 2013 because the full season is complete. All the work described here could easily be applied to 2014, or any other season, I just don’t want to use incomplete information.

[ii] While these terms are used a lot, there aren’t any specific definitions commonly accepted that differentiate each type of batted ball. For terms used so commonly, it doesn’t make much sense to me that they are not well defined. It won’t apply to the data used in this research, but here is my attempt at defining them.

A bunt is a batted ball not swung at but intentionally met with the bat. A groundball is a batted ball swung at that lands anywhere between home plate and the outer edge of the infield dirt and would be classified as a line drive if it made contact with a fielder in the air. A line drive is a batted ball swung at that leaves the bat at an angle of at most 20° above parallel to the ground (the launch angle), and either lands in the outfield or makes contact with any fielder before landing (generally through a catch, but sometimes a deflection). A fly ball is a batted ball swung at, with a launch angle between 20° and 60° above parallel (not inclusive), that either lands in the outfield or is caught in the air by a player in the outfield. A popup is a batted ball swung at that either (a) leaves the bat at an angle of 60° or greater above parallel and lands or is caught in the air in the outfield, or (b) leaves the bat at an angle greater than 30° and lands or is caught in the air in the  infield.

This would result in some balls being classified differently than they currently are, and not just because differentiating between a line drive and a fly ball is somewhat difficult with just a pair of eyes. If the defense were to play an infield shift, and the batter were to hit a line drive into the outfield grass into that shift, subsequently being thrown out at first base, it would likely be called a groundout by current standards. Batted balls should not be defined based on defensive success or failure, but by the general path which they take when leaving the bat. It may be unusual to credit a batter with making a line out despite the ball hitting the ground, but it more accurately reflects the type of ball put into play by the batter.

I don’t know that these are the “correct” ways to group together these events, but as we now are using technology that tracks the flight of the baseball from the moment it is released by the pitcher through the end of the play, we should probably have better definitions for types of batted balls than those currently provided by MLB. I don’t expect a human stringer to be able to differentiate between a ball hit with a 15° launch angle or a 25° launch angle, but that doesn’t mean we shouldn’t have some standard definition for which they should aim.

[iii] In theory, xR5 would attempt to consider details that are even more specific, perhaps the initial velocity of the ball off the bat, the launch angle, and whatever other information can be gleaned from technology like HIT F/X. The xR framework leaves room to consider any further amount of detail that a researcher wants to consider.

[iv] Though not charted here, the r2 value based on the correlation between wRAA, the “counting” version of wOBA, and xR4 is .984. As wRAA is nearly identical to xR3 but excludes a few of the more rare events from its calculation, it’s not surprising that the r2 value between wRAA and xR4 is just slightly smaller than the r2 between xR3 and xR4.


Leadoff Rating 2.0

It feels icky to create a statistical formula based on what “feels right”.

Last month, I introduced a stat called Leadoff Rating, or LOR. The idea was that most systems to identify great leadoff hitters tab players like Ted Williams and Mickey Mantle, who would always hit closer to the middle of the order. I wanted to distinguish players specially suited to batting leadoff. The formula was simple: OBP minus ISO. By subtracting isolated power, we identified players who get on base a lot but aren’t true sluggers. It’s an easy calculation, and it produced fairly reasonable results. Two particular things bothered me:

1. Bad hitters occasionally had good leadoff ratings because of their very low ISO.

2. Rickey Henderson ranked 45th.

We know that leadoff is one of the two or three most important positions in the batting order. As little impact as lineup construction has on winning percentage, leadoff hitters are important. But LOR saw high OBP and low ISO as equally meaningful, so players with no power sometimes rated as desirable leadoff hitters. That seemed like something to correct.

Rickey Henderson is generally recognized as the greatest leadoff man of all time. LOR did not show this, for two main reasons. One was that the formula did not include baserunning. The other was that the all-time list slanted heavily towards Deadball players. Before Babe Ruth, everyone had low isolated power. Ty Cobb was a terrific power hitter, who led the AL in slugging eight times. Cobb’s career ISO (.146) is basically the same as Rickey’s (.140). Henderson only ranked among the top 10 in slugging twice. The game has changed.

Based on the feedback of FanGraphs readers and on my own muddlings, I’ve reworked the leadoff rating formula. The new system is more complicated — it’s annoying to do without a spreadsheet — and it’s kind of haphazard. OBP – ISO was a nice system because of its simplicity. With the updated formula, I’m guessing, choosing numbers that seem right. If someone better than I am at math would care to suggest revisions, please do so. I am fully prepared to give this stat away to smart people.

The formula I’m using now is — wait. There’s another calculation I abandoned, but it’s important for explaining how we arrived at the current iteration, and that middle step looked like this: OBP – ( .75 * ISO ) + ( ( .005 * BsR ) / ( PA / 600 ) )

On-base percentage is the heart of leadoff rating. A good hitter, and especially a good leadoff hitter, must get on base. But I only subtracted 3/4 of ISO, because (1) low ISO is not as important as high OBP, and (2) the original formula was probably a little too hard on doubles hitters. Guys like Rickey and Tim Raines ranked too low because they had more power than players like Jason Kendall and Ozzie Smith.

Commenter foxinsox suggested adding (Constant * BsR) to the calculation, which was a fine idea I should have seen earlier. The hitch was turning BsR into a rate stat.  By using BsR/PA or BsR/G, we can incorporate that element smoothly.

When I ran the numbers, the historical lists looked great (Rickey Henderson in the top 10!), but for active players, there were hits and misses. Elvis Andrus came back as the ideal leadoff hitter in 2013, and Craig Gentry (.264/.326/.299) ran away with 2014 to date. Even with the adjustments, LOR rewarded low ISO. While a .250 ISO isn’t really the right fit for the top of the batting order, neither is a sub-.050 ISO. We don’t want a guy who only hits singles, we just don’t want a cleanup hitter. Looking at the historical lists, I found that most of the top players had an ISO right around .100, so I created a Goldilocks formula, preferring a minimal absolute difference from .100 ISO. Rather than simply treating low ISO as desirable, we’re looking for the sweet spot between singles and slugging. The new formula is:

OBP –  .75 * | .100 – ISO |  + ( .005 * BsR ) / ( PA / 600 )

That’s on-base percentage, minus 3/4 of the absolute difference between ISO and .100, plus .005 times BsR per 600 plate appearances. Now very low isolated power is punished just as much as very high ISO.

Hopefully you want to see some lists. I’ll show you five: the all-time list, the post-Jackie Robinson list, the leaders for the 2013 season, 2014 to date (through July 31), and 2014 rest-of-season projections (ZiPS). We’ll also look at the 2014 leaders (both to date and projected) for every team in the major leagues. Read the rest of this entry »


Dallas Keuchel: Pitching to Strengths

Platoon splits have become a major part of baseball today.  The Athletics have ridden a split of Jaso and Norris to production from their catcher position.  Many left-handed starters have had success against righties and many have struggled.  Over the course of his career Cole Hamels has had more success against RHB than LHB (.294 vs .301 wOBA).  Hamels is known for his best pitch — his change up — which has helped him neutralize RHB throughout his career.

So often when a LHP struggles against RHBs the common fix is to use a change up more often or to improve the change up.  However, for some pitchers this model does not work.  Dallas Keuchel, a pitcher who used a change up as his primary offspeed pitch against RHBs in the beginning of his career struggled still against righties.  As shown in this article, from the time his career began until May 31st of this season Keuchel was one of the worst starting pitchers against RHB.  However, by breaking this down season by season it can be seen that Keuchel’s numbers have actually improved as his career’s progressed.

2012 2013 2014
wOBA .365 .363 .313
K% 10.3 15.0 16.3
HR/9 1.51 1.11 0.40

The key to Keuchel’s increased success against opposite-handed hitters seems to be found in his pitch selection.

2012 2013 2014
FT 36.0 31.2 38.5
SL 0.3 13.2 18.6
CH 20.4 16.5 19.2
FF 19.9 27.3 15.9
FC 11.4 5.7 7.7
CU 12.0 6.1 0.1

Keuchel has been an often-discussed topic on this site this season.  The key to his success this season has been his increased use of his rapidly-improving slider which was covered by Eno Sarris here. As Sarris states the slider will allow Keuchel to have increased success against lefties.  However, looking at Keuchel’s splits this season shows he has improved his numbers against righties significantly. According to PitchF/x data Keuchel has used the slider significantly more against righties this season.  He has done this at the expense of four-seam fastballs opting to throw more two-seamers and sliders.

While he is still using the changeup at around his career averages, his heavy increase of sliders in the biggest difference in his way of attacking hitters.  As his numbers for the season have shown he is limiting home runs and striking out the highest percentage of right-handed hitters in his career.  This has also lead to a significant improvement in his wOBA allowed. This season against righties the slider has produced a better than MLB average whiff rate (18% vs 13.%).

Keuchel provides an blueprint for other left-handed starters who struggle against righties.  Contrary to typical belief that in order to improve against opposite-handed batters pitchers must develop their change up, Keuchel has begun using his best offspeed pitch — the slider — more as a putaway pitch against off-handed batters.  Keuchel has become the poster boy for pitching to strengths, riding his sinking two-seam and slider to a breakout season while significantly improving his platoon splits.

Another pitcher who was mentioned as the worst in the league against righties was Eric Stults.  Stults, a lefty like Keuchel, features both a slider and a change up.  Additionally, much like Keuchel, Stults’s best offspeed pitch according to pitch values is his slider.  However, looking at his pitch selection to RHB he has used the change more than twice as much as the slider since 2007 (25.6% vs 10.8%).  If Stults followed in the footsteps of Keuchel and began to use his best pitch more against opposite-handed hitters it could cause him to minimize his platoon split and make him a better all-around starting pitcher.