Exploring Three True Outcome Quality

INTRODUCTION & EXPLORING THE QUESTION

So there’s been a lot of attention paid to Three True Outcome guys recently. The subject was touched upon in a recent article by Craig Edwards, as well as in this community blog by Brian Reiff. These articles brought attention to guys who are notable for putting 7 of the 9 defensive players to sleep. However, what caught my attention the most was a comment on Craig’s piece by “steex” who proposed a hypothesis about these sluggers:

I think this makes selecting TTO players strictly by the numbers difficult. For me, the spirit of TTO is a player that does enough good (HR+BB) to balance out for a lot of bad (K). Harper and Votto don’t really fit that definition in the intended way, but rather show up on the list because they do SO much good (HR+BB) that their total HR+BB+K makes the cut despite having not as much of the bad (K).

I wonder if a better list of players comparable to one another would be obtained by first sorting by TTO%, then subdividing that by the percentage that Ks represent from the TTO events (i.e., K/[HR+BB+K]). That provides a lot of separation between guys like Harper, Votto, and Goldschmidt who have strikeouts represent less than 50% of their TTO events and guys like Carter, LaRoche, and Belt who have strikeouts as more than 65% of their TTO events.

This was also supported by follow up comments speaking about how they differentiate the players into two groups, those who strike out at a higher clip and those who have BB% and HR% compensate for a reduced K%. My goal was to figure out whether the quality aspect of the Three True Outcomes was different between these high-K% players and the low-K% players, beyond the walks and strikeouts.

 

THE PROCESS

First, let me define how I picked out my sample, and how I classified the players into two groups, and then I’ll begin to discuss the details of the study. I pulled all the data from 2010-2014 for player seasons who qualified for the batting title (minimum 502 Plate Appearances). This gave me a sample of 723 player-seasons (where a single player may be listed as a qualifier separately for up to five seasons). Of these 723 player-seasons, I set the Three True Outcome bar at 40%. Why 40%? Well the simple average (weighted to PA) was 29% Three True Outcome (I’ll abbreviate to TTO from now on), with a standard deviation of approximately 8%. So that would make 40% TTO somewhere around 1.5 standard deviations above the mean, which seemed like a reasonable line to draw in the sand.

There is now a sample of 52 player-seasons (7.2% of the qualifiers). From here, I had to draw a new line, and I wanted to go by “steex”’s suggestion of using the proportion of strikeouts to TTO% as the barrier. The key was getting a decent number of player-seasons on either side. I started off with 50% (using the formula K/[HR+BB+K]), but that would have left me with only two player-seasons (2011 Bautista, 2013 Votto, for those who are curious). I bumped it up continually until I reached a 60% ratio, which seemed to be reasonable. That placed 11 player-seasons in the low-K TTO group (which will be referred to as TTO-L) and 41 player-seasons in the high-K TTO group (which will be shown as TTO-H).

The whole TTO population is now divided into two groups, TTO-L (with 11 player-seasons) and TTO-H (with 41 player-seasons). Now what? I was truly curious about how these two groups differed in their hitting abilities. It seems fairly obvious that those who have lower K% and higher BB% will have higher (better) wOBAs, wRC+s, and the like (just due to trading strikeouts for walks). As Craig showed in his article, the average TTO player is an above average hitter due to a typically lumbering stature and a penchant for not being great at defense. Those who aren’t above average hitters and are bad at defense usually find themselves riding minor league buses around the country. But I’m not trying to compare TTO hitters to non-TTO hitters, rather comparing the two halves based on TTO quality.

 

BATTED BALL DISTRIBUTIONS

I decided to compare them using  statistics that might glean differences between good and bad hitters. I looked at batted ball distributions to start. I compiled the GB%, LD%, FB% and IFFB%, as well as the PULL%, CENTER% and OPPO% from the leaderboards (plus HR/FB for good measure), and computed the mean, standard deviation, and p-value based on a two-tailed T-Test. The results are in TABLE ONE:

 

(legend)Statistical Significance
p < 0.1
p < 0.05
p < 0.01

 

TABLE ONE: Batted Ball Distributions

TTO-H TTO-L t-test
Measure     mean-H     StDev-H     mean-L     StDev-L     p-val  
COUNT 41 plyr-sea 11 plyr-sea
GB% 38.1% 5.5% 38.9% 3.9% 0.654
LD% 19.6% 3.0% 19.0% 3.5% 0.572
FB% 42.3% 5.3% 42.2% 5.5% 0.956
IFFB% 8.3% 4.1% 9.8% 5.2% 0.314
Pull% 43.9% 5.1% 45.8% 7.7% 0.332
Cent% 33.6% 3.7% 31.4% 2.5% 0.070
Oppo% 22.6% 3.7% 22.9% 6.9% 0.846
HR/FB 19.4% 4.9% 19.4% 4.0% 1.000

 

Interestingly enough, the batted ball distributions are very similar between the two groups. The groups are pretty much interchangeable, with the only thing close to being statistically significant is the percentage of balls hit to center field. However, when looking at that in the bigger picture of pull/center/opposite, the numbers are nearly identical. So far, the two groups are relatively indistinguishable from one another.

 

BATTED BALL AUTHORITY

At this point, my mind went in another direction: do TTO-L player strike the ball better than their TTO-H counterparts? If you’ve got a good eye and can take a walk more easily, then you’re probably able to see the ball better, and therefore are able to drive the ball harder. So, even though it may not have manifested itself in the GB/LD/FB numbers, perhaps these “elite” players in the low-K group have better pop. To evaluate this, I pulled the HARD%, MED%, and SOFT% of balls by each group, along with BABIP for good measure, summarized in TABLE TWO:

 

TABLE TWO: Batted Ball Authority

TTO-H TTO-L t-test
  Measure     mean-H     StDev-H     mean-L     StDev-L     p-val  
COUNT 41 plyr-sea 11 plyr-sea
Soft% 15.5% 3.5% 16.0% 3.4% 0.925
Med% 48.6% 4.2% 46.6% 4.9% 0.182
Hard% 35.9% 4.0% 37.5% 3.4% 0.231
BABIP 0.297 0.034 0.307 0.043 0.417

 

Again, a little surprising to me. There’s no statistically significant difference between these low-K guys and high-K guys in terms of batted ball authority. Each group hits roughly the same, with the low-K guys trading a few medium hit balls for some hard hit ones (albeit not enough to differentiate the groups). BABIP would manifest itself in these guys striking the ball harder, and it comes out roughly even. One note that BABIP would control itself here more than in most hitter studies because the subset of TTO players typically have similar builds and are not artificially increasing BABIP by beating out infield hits (neither group would have a distinct advantage).

 

BATTING SELECTIVITY & CONTACT RATES

So where do these two groups separate? Something has to cause the disparity between the groups and show a differential in ability. And that something is at the plate in their selectivity – which only makes sense. Players who draw walks are those who lay off bad pitches out of the zone, and those who strike out typically struggle to identify strikes from balls, or lack the ability to contact balls when they swing (usually not both, or else they wouldn’t be in the majors). The data is summarized below in TABLE THREE:

 

TABLE THREE: Batting Selectivity & Contact

TTO-H TTO-L t-test
Measure   mean-H     StDev-H     mean-L     StDev-L     p-val  
COUNT 41 plyr-sea 11 plyr-sea
Z-Swing% 68.0% 5.0% 65.9% 4.2% 0.208
O-Swing% 29.5% 5.2% 25.5% 3.3% 0.020
Swing% 46.1% 4.1% 42.5% 2.1% 0.007
O-Contact% 53.9% 5.3% 57.5% 6.7% 0.065
Z-Contact% 79.4% 3.6% 83.3% 3.3% 0.002
Contact% 70.1% 3.6% 74.3% 4.6% 0.002
SwStr% 13.6% 2.4% 10.7% 2.3% 0.001

 

 

Here’s all that red you’ve been waiting for. Starting with the first three rows, there’s a statistically significant difference (p < 0.05) between the two groups in swinging at balls (O-Swing%), which goes to show the selectivity of the TTO-L group is better than the TTO-H group. In rows four to six, we see that for swings on pitches both in and out of the zone, the TTO-L group makes contact more often, with in-the-zone contact being statistically significant at the p < 0.01 level. To summarize this table, the TTO-L hitters don’t swing as often, but when they do they are better at making contact with the pitch as compared to the TTO-H batters.

 

GROUP SUMMARY

The final table, TABLE FOUR, summarizes the groups for anybody who was curious. 

TABLE FOUR: Group Summary

TTO-H TTO-L t-test
  Measure     mean-H     StDev-H     mean-L     StDev-L     p-val  
COUNT 41 plyr-sea 11 plyr-sea
HR% 4.8% 1.3% 4.9% 1.2% 0.819
K% 29.2% 2.8% 24.0% 3.5% 0.000
BB% 11.0% 2.0% 15.2% 2.5% 0.000
wOBA 0.341 0.031 0.379 0.034 0.001
TTO% 45.0% 4.3% 44.0% 2.8% 0.470

 

Obviously above you see that the K-rates and BB-rates are statistically significant, which only makes sense because that’s how we divided the groups, so that was artificially implanted. And, of course, you’ll always have a better wOBA if you walk more and strike out less, because walks count for approximately 0.7 runs based on linear weights each.

 

SUMMARIZING THE FINDINGS

Of the 723 player seasons between 2010 and 2014, inclusive, 52 were deemed to be Three True Outcome seasons (with 40% of the plate appearances ending in BB, K or HR). From there, the group was subdivided into two by the relative amount of K’s compared to total TTO% (with [K%/TTO%]>60% as TTO-H, and [K%/TTO%]<=60% as TTO-L.

The groups were compared against one another on Batted Ball Distributions, Batted Ball Authority, and Batting Selectivity & Contact. The vast majority of the statistically significant differences between the groups appeared in the third table, with the TTO-L group displaying a better eye for strikes, while also contacting the ball better when they decided to swing. Perhaps the most interesting finding of the study was that this increased contact did not manage to create better authority when hitting the ball, nor did it change the batted ball distribution significantly. Just because the TTO-L group made contact more often on their swings did not mean they were able to drive the ball better than the TTO-H players.

Just a quick thank you to end this, to the FG community comments that inspire people to write things like this and make my last college summer a little more (less?) exciting.


Three Undervalued Hitters to Help Down the Stretch

We’re officially in the dog days of summer, which means a few things of note: NFL is almost upon us; the fantasy baseball playoffs have begun for many; and finally, whether you’re in a roto league without playoffs or otherwise, you’re still looking to find value on your waiver wire.

I define value as something like: Players who produce counting stats (and/or average), who, for whatever reason, have low ownership rates and thus can be found on waivers for free, or in my case, for a few FAAB dollars (of which, I have zero remaining). The players I’m referring to are generally valuable in deeper mixed leagues or NL- or AL-only formats, but some, like Dexter Fowler, whom I’ve written about in the past, can offer solid numbers for leagues of any size/format.

I’ve recently written about guys like David Peralta, Fowler, and Jung-Ho Kang, and my advice on these players remains the same as it’s always been: pick them up ASAP. Their low ownership rates on ESPN continue to leave me flummoxed; E.g., David Peralta and his .294 average, 48 R, 13 HR, 66 RBI, and 5 SB is owned in just 70% of ESPN leagues. Go figure. Better yet: Go pick him up.

Here are a few more hitters I like who can help you down the stretch:

Yangervis Solarte: Solarte hit his tenth home run on August 21 and third in as many games. A switch-hitter, Solarte has multi-position eligibility (1B; 2B; 3B) and is owned in just 34% of ESPN leagues. With a triple-slash line of .269/.325/.425, Solarte has 47 R, 10 HR, and 49 RBI. Those stats play in most leagues, and while he is a bit streaky and on a power surge in August, his ambidexterity keeps him in the Friars’ lineup on a near-daily basis. Solarte has solid on-base skills (29:46 BB/K), hits for decent power, above league-average batting average, and the vast majority of his AB’s come in the leadoff or 2-holes in the lineup (110 and 142 AB, respectively).

That said, hitting in front of a hot Matt Kemp and a hopefully-getting-hot Justin Upton should help keep his run totals healthy, and he’s showing some nice HR power in August. His .283 BABIP is in line with career norms, so I don’t expect much regression in terms of batting average; if anything, that number seems somewhat low for a player who runs well, but ZiPS projects a BABIP of .280 the rest of the way. At any rate, you could certainly do a lot worse than Solarte, a player who might be finding his stride in the second half.

Colby Rasmus: In short, Rasmus is who he is: He hits for power and not much else. His power, particularly against righties, is the real deal: Rasmus owns a .451 slugging percentage and a solid .222 ISO in 2015 (with a career-norm .297 BABIP); his 17 HR and .750 OPS suggest he can help in AL-only or deeper mixed-leagues.

Owned in just 6.5% of ESPN leagues, Rasmus has 44 R, 17 HR, 44 RBI, and 2 SB to his credit (along with an unsightly .228 batting average), with the two most recent of his 17 Colby Jacks courtesy of Detroit lefty Matt Boyd. While he does sit against most LHP, Rasmus’ OPS against lefties in 2015 is a respectable .815 across 80 AB’s (compared to a .726 OPS vs. RHP over 244 AB). That said, you will see him in the lineup against a few soft-throwing lefties, but that will likely stop when Springer returns.

For perspective, consider Brandon Moss relative to Rasmus:

Moss is batting .211 with 38 R, 15 HR, and 51 RBI. He was recently ranked OF number 52 and 49 by two CBS analysts, whereas Rasmus is ranked 63 and 88. Although Rasmus’ power is less proven than that of Moss, Moss has been miserable since June and Rasmus has been steady, if unspectacular, effectively all season. But despite hitting more HR—and being projected to hit just 3 fewer HR than Moss (8 HR projected for Moss ROS seems totally absurd, incidentally)—Moss is owned in roughly 8 times more leagues than is Rasmus. In short: Colby is either massively under-owned, or Moss is hugely overvalued; or, I guess, both.

ZiPS has another 5 HR and 13 RBI projected for Rasmus rest of season, but those number seem a bit soft in the absence of Springer for a player hitting at Minute Maid Park. Rasmus won’t win a batting title anytime soon, but his solid OPS vs. lefties this year (an outlier, to be sure) and strong defense at all three OF positions keeps him in the lineup on a near-daily basis, especially given the recent, albeit short-term, demotion of Preston Tucker. Colby is a funk since his 2-HR game on 8/16, but like most power hitters, Rasmus is prone to streaks; my advice to you is exactly the same advice I took myself: pick him up and enjoy the HR power, but don’t expect him to suddenly become Bryce Harper.

Asdrubal Cabrera: Arguably the hottest hitter in baseball since he returned from the DL on July 28, Cabrera is hitting .404 with an OPS of 1.078 since the All-Star break. Those are not typos, though his numbers are propped up by a massively inflated BABIP. Also since the break, Cabby has 20 runs, 4 HR, 13 RBI, and 2 SB across 89 AB’s. He’s on fire, no two-ways about it.

What we’re seeing here, I think, are two things: 1) a player out-of-his-mind hot and 2) a veteran with proven, decent power and a solid hitter regressing to the mean. Currently batting .264 with 49 R, 9 HR, 35 RBI, and 5 SB (.730 OPS), Cabrera has hit at least 14 home runs every season since 2011 (career high of 25), and he’s on pace for roughly 12 this year. A career .267 hitter, Cabrera was miserable in April, May, and some of June, and while he’s hitting an unsustainable BABIP of .320, he was certainly due for a few bloopers to drop.

With dual 2B/SS eligibility, his ownership rate on ESPN has spiked from sub-20% in mid-August to 39% at the time of this writing. If you’re looking for help at a very weak SS position, or a possible Howie Kendrick replacement, Cabrera can certainly help you out; and as a switch-hitter, you’ll find him in the 5- or 6-hole in the Ray’s lineup on a daily basis.


The Evan Gattis Triples Game

There are 13 qualified hitters in baseball with at least six triples.  12 of the 13 players have at least five SB and the average among those 12 players is 18 steals.  Among the league leading ranks in triples stands one man who defies the common narrative that triples hitters are speedy.  He’s known as ‘El Oso Blanco’, which translates to “The White Bear” for non Spanish-speaking readers, and listed at a whopping 6’4”, 260 lbs, it’s easy to see why they call him that.  His story is one of modern day folklore, and it’s fitting that his wandering days eventually would lead him to an Astros squad that have taken the American League West by surprise.  Evan Gattis, has as many stolen bases as he has batting gloves, or as many as he appears to have, which is zero, because if you’ve witnessed him hit at all, one of the first things you notice about him is that he does not wear batting gloves.  Yet there his name is, one triple ahead of the likes of Adam Eaton and David Peralta; Evan Gattis, with nine triples, the man in sole position of second place for the most triples in major league baseball.

Consider this: he had 1 triple in his first 783 PA (or even 1 in his first 928, if we want to include all of his career PA up to May 28th, 2015 – the date of his first triple this year), and that one triple was hit into Triples Alley at AT&T Park in San Francisco on May 13th, 2014 (No, this was not a Friday the 13th).  Triples Alley is aptly named for the high volume of balls that are hit there that result in triples (relatively speaking).  So that was Gattis’ one and only, and yet he’s hit 9 in his following 446 plate appearances (or even scarier, 9 in 301 PA).  Before delving too much into this, I thought, “Conditions for an Evan Gattis triple would have to be perfect.  I bet at least 6 of these triples are due to Tal’s Hill“, which is the 90 foot wide, 30 degree incline, that extends the area of balls in play about 34 feet beyond where the fence would normally end at Minute Maid Park.  It is a whopping 436 feet to the wall at the top of Tall’s Hill.  However, a quick peek at Gattis’ home/away splits would reveal that he has just 5 triples at home and 4 on the road.

Well then he must have hit his triples in “triple-friendly” parks; below is a table showing where he has hit his 9 triples this year:

STADIUM 3B FACTOR
AT & T Park 1.211
Minute Maid Park (5) 1.549
Kauffman Stadium 1.240
Comerica Park (2) 1.465

Okay, that was predictable and makes a lot of sense to me.  Now here is a spray chart that shows his hit types (if you don’t read keys, the red dots are the triples):

chart (3)

*There is a sneaky red dot signifying a triple hiding behind a home run dot in left center just to the right of the most far left red dot*

Looking at the plotting of the red dots and considering what stadiums he hit his triples at is where I got the idea for this article – and I will now switch to writing in present tense to portray the feeling of spontaneity I felt when I first started this writing. Considering the factors, I get the feeling that I can guess which stadium each of his triples have been hit at – an exhibition of frivolity to be sure, but this is just the kind of thing that we’re looking for while we’re at work, trying to look busy, isn’t it?  If you wanna play, keep reading and guess along.  I am going to take a liberty and use the pronoun “we” instead of “I” so this feels more like a group effort.  And I also have a disclaimer: If you continue reading, you are assuming the risk that this could be a jarringly disjointed, moderately sarcastic, and gif cluttered article – it is.

The Evan Gattis Triples Game

Let’s consider my first hypothesis – that Tal’s Hill is responsible for a majority of these triples.  Looking at the red dots it looks like 3 of them may have very well landed there.  In order to kind of stick with my original idea, we’ll take the five most centrally located red dots and say that those are the triples he hit at home.

chart home

For reference into this reasoning, here’s the stadium layout of Minute Maid Park (all ballpark layouts are courtesy of Clem’s Baseball).  Note the massive depth of center field.

MinuteMaidPark

Using FanGraphs’ Game Logs I’ll pinpoint the dates of his 5 home triples and then plug those dates into Gattis’ spray chart over at BrooksBaseball.

1st Triple at home; 3rd Triple of Season: 06/28 vs NYY

triple1

That ball is not hit to Tal’s Hill, but it is one of his 5 most centrally hit triples of 2015, so that’s 1/1 if you’re scoring at home.

Now here’s the GIF – and here’s where I have to pause and give credit to another article.  When I started to write this post I hadn’t planned on including so much media, but as the post evolved it really did call for GIFs of these triples.  When I searched ‘Evan Gattis triples’ on google, the first link that popped up is this SB Nation post by Murphy Powell, and it’s the source for 6 of the 8 GIFs here and is, by all accounts, VERY similar and a much better article than mine, so check it out.  Any other GIFs were created using Baseball Savant media and makeagif.com.

gattis_3.0

“ARGH!”  That’s the sound of Michael Pineda groaning as he grimaces and falls on to bended-knee while telepathically willing the ball to stay in the park, which it does, barely.  Pineda is groaning because that was not a quality slider.  This information could probably be an entirely new post altogether, but I did warn you about this post being disjointed, so let’s to a quick detour.

This triple took place at the end of June – a table tracking velo and movement of Michael Pineda’s sliders shows that Pineda was throwing sliders of a lesser quality during this period.

Date(s) Velo x-movement v-movement BAA
Pitch to Gattis (06/28) 87.9 2.15 1.25 1.000 (obviously)
April 2015 84.08 4.54 -0.30 .208
May 2015 85.76 4.00 -0.41 .191
June 2015 87.12 2.47 0.02 .250
July 2015 87.10 1.34 0.46 .231

Whether it has been a conscious decision to throw his slider harder or it is a product of his ailing elbow, the results have not been so good.

Anyways, at this point, three triples into the season – and 3 in his last 36 games – Gattis’ reputation as a triples machine is really starting to build momentum (I warned you about the sarcasm, too) and as soon as the ball bounces away from Brett Gardner and is left to be retrieved by a scurrying Garrett Jones, Gattis is off to the races.

2nd Triple at home; 4th Triple of the season: 06/30 vs KCR

triple2

Bingo! This is a Tal’s Hill special and would be a home run at 29 other ball parks.

gattis_4.0

Lorenzo Cain, who has to at least be in the conversation for the smoothest looking active baseball player, is rendered looking like a reckless drunkard, smashing head-first into the wall and then toppling over on to his side after heaving the ball in towards a cut-off man from his knee.  Nonetheless, Gattis has his 4th triple of the year and we are 2 for 2.

3rd Triple at Home; 5th Triple of the season: 07/17 vs TEX

triple3

That one is not quite as impressive as the last one in terms of distance, but he laid into this one pretty good, too.

gattis_real_5.0

This hit scoots up on to Tal’s Hill after it nicks off Leonys Martin’s glove and then bounces off the wall – are you already missing the antics that Tal’s Hill won’t be causing in 2016?  The main thing here is that we are now 3 for 3 in this game.  I knew this would be easy.

4th Triple at Home; 7th Triple of the year: 07-28 vs LAA

triple4

So we’re wrong on this one and that brings our tally to 3 for 4 – and I’ll take most of the responsibility for the ones we get wrong – my bad.  “My bad” suffices when a player makes an errant pass out of bounds in a professional basketball game, so it should be enough here, too.

gattis_5.0 (1)

This one hit just under the yellow line against the Papa John’s sign, and it had to careen off the wall in such a way that it caused the ball to bounce into another empty center field where Shane Victorino finally picks it up and hurls it in just in time for Gattis to pull in to third base with a stand up triple.

5th Triple at Home; 7th Triple of the year: 08-14 vs DET

plot_hc_spray

This is technically another one of the 5 most centrally located triples so we are 4 out of 5.

Gattis Triple 5 Gif

 

 

 

The ball comes off the bat hard enough (99.3 mph) and then takes a generously frictional hop and loses speed as it trickles up against the wall in the deepest part of right center field at Minute Maid.  I don’t care if even the great Roberto Clemente was in right field, that is a long relay throw and there is plenty of time for Evan Gattis to lock down his 9th triple of the season.  Gattis is immediately pulled from the game as he is probably completely out of juice at this point in the season, but fans rejoice over his exploits and even Evan Gattis can’t believe his recent output of triples:

7fx3An

 

 

 

So we are hitting .800 after the home stand, but now let’s take on the triples hit away from home.  Here are the triples that we have left to identify:

chart (3)

The media, for whatever reason, has started to get smaller, so I will point out the locations of the triples: there is one to deep, left center; one to deep center, one to right-center, and one down the right field line.

For reference, here are the stadium layouts for Comerica (where he’s hit 2 triples), AT&T Park, and Kauffman Stadium.

Comerica

ballpark

AT&T

triple7

Kauffman Stadium: has the largest outfield in major league baseball as measured by total square feet.

KauffmanStadium

Let’s start with the one triple hit to deep center that did not take place at Minute Maid and say that one took place at Comerica Park, since, like Minute Maid, Comerica has a cavernous center field.

1st Triple of the Year: 05/21 vs DET @ Comerica

triple6

Huzzah! That was kind of obvious and maybe shouldn’t have elicited a Tobias Funke jubilation, but the fact that we’re five for six does.

gattis_half_1.0

Let’s jump ahead to what should be considered the other obvious pick, his triple hit at AT&T park.  There’s a triple that was hit to right center and we’ll say this triple it was a throwback piece; inspired by his first triple in the bigs, in that it was hit to Triples Alley.

8th Triple of the Year: 08/11 vs SFG @ AT&T Park

triple8

This one is wrong and that stings because I felt like this one would’ve been obvious.

qzp-eX

I’m not sure how much of the ball Gregor Blanco gets when he leaps – he may have ultimately sandwiched the ball between his back and the wall – but it looks like he prevented an Evan Gattis HR; but still can’t prevent yet another Evan Gattis Triple.  We’re 5 out of 7.

So of the two triples left, there is one that goes to deep right-center, and one that scurries down a right-field line.  The ballparks left are Kauffman and Comerica.

We’ll play the odds and guess that the one down the right-field line is hit at Kauffman Stadium because it would make sense for the one to right-center to have ended up in that little enclave at Comerica.

6th Triple of the Season: 07/26 vs KCR @ Kauffman Stadium

Oddly enough there is no data for this on Brooks Baseball and there is also no GIF for this triple; Who’s padding the stats?? At least that builds some suspense…

2nd Triple of the Season: 05/24 vs DET @ Comerica

triple9

Wrong – which also makes us wrong on the triple hit at Kauffman so we miss the final 2 – “my bad”.

gattis_2.0

It looks like Rajai Davis was positioned towards the gap and therefore had to hunt this ball down while El Oso Blanco set the base paths aflame.

So our (my) final score is 5/9, which is good but not great considering my 100% accuracy prediction.  While I’m completely aware of the vast, expansive magnitude of my ignorance, I really did believe I could pick out where each of these 9 triples happened…it’s probably this same hubris that causes me to lose $3 daily over at Draft Kings.

Trying to elicit some meaning out of this article would be contrived, so I’ll just say (tongue-in-cheek-ly), Gattis is likely to experience some regression to the mean (whatever that mean is in regards to triples).  I can’t imagine a reality where Evan Gattis highlights aren’t home runs and continue to be centered around him tearing around the basepaths – his massive, rippling thighs simultaneously inspiring awe, terror, and a few chuckles among his teammates – but what do I know?  The last time I tried to predict something about Evan Gattis, I was only 55.6% right.


Analytics Are Good, But Psychometrics Can Make Them Great

This is not about a relief pitcher resting horizontally on a comfy couch as he spills his deepest darkest secrets to a furrowed, bearded psychologist, nor is this about prescribing medication to a team’s severely depressed kicker who just missed the game-winner. We’re talking about sports psychology, but not the kind of stereotypical psychology you’re used to. Instead, we’re talking about psychometrics – how to measure the ways that a player’s psyche (thoughts, feelings, opinions) relates to the most important thing imaginable for sport teams: performance.

Seeing is believing

Counting the yards that a running back gains after contact or the runs prevented by pitching independent of defense are advanced numerical methods of breaking down a player’s performance. Most of the traditional analytics work the same way; a player’s previous performance is charted, observed, and dissected to make a projection about how that player will perform in the future. A team’s forecasted performance is usually the sum of the individual players’ projected performances. This is (generally) the state of analytics in a nutshell.

Not only have analytics shown that previous performance predicts some level of future performance, it also just makes sense. Watching a player hit a 3-point shot, scoring pad-side against the goalie, and hitting a home run are visible to everyone; it’s what makes sports, sports. You know that Mike Trout is a good baseball player because you can see his performance. You can see him make ridiculous plays in the outfield and then watch him hit a home run into a fishing net in the center-field bleachers. You can check the box score the next day and you can see the numbers immediately reflect his awesomeness. You can visit FanGraphs and read about a sabermetric stat that further corroborates Trout’s awesomeness, and then you can use that same stat to find out about another obscure player’s performance and realize he’s kind of awesome as well. Analytics makes sense because most of it is overtly visible – above the surface, leaving everything else that can’t be seen as “intangible”.

What lies beneath

 Even if analysts were to measure more “intangible” characteristics, like a player’s leadership, grit, or mental toughness, they don’t seem to amount to the same numerical accessibility as traditional performance metrics, nor do they seem to be relatable to future performance. However, with carefully designed tools, psychometrics can not only measure these “intangible” characteristics, but can help predict future performance in the same way as traditional analytics. Ideally, psychometrics from players and teams can help complement performance analytics that are now readily being used.

In fact, measurement of the human mind and behavior isn’t anything new – over 100 years of psychological research has shown that the human psyche is quantifiable in the same way that previous performance is quantifiable. Psychologists have measured and quantified aggression across different cultures[1], charismatic leadership in managers[2], intrinsic motivation in children[3], and team cohesion within collegiate and recreational sports teams[4]. What’s more, these numbers can even fit nicely into the same models, projections, and predictions that have been used with traditional analytics. Yet despite the depth and breadth of this research, professional sports teams have been slow to tap into this area of study, pooh-poohed by pundits as “intangibles,” unseen and unrecognized by professional sport team brass.

You won’t know unless you try

If the results of these measurements help to win more games, what do teams have to lose? Teams should not fear the minuscule amount of time that their players would spend filling out a carefully designed survey if it means understanding more about them – and, ultimately, understanding more about their team. Teams should not fear the analysis of dugout, sideline, team bus, or hotel conversations between players, all of which include rich amounts of data that can help to explain the relationships between players. Teams should not fear the measurement of a player’s comments, quotes, tweets, or posts, their spoken or written words might reveal hidden emotions or intentions. The analytics movement is far from over, and if teams are looking for more numerical insights, look no further than psychometrics.

 

[1] Ramirez, J.M., Fujihara, T., & Van Goozen, S. (2001). Cultural and Gender Differences in Anger and Aggression: A comparison between Japanese, Dutch, and Spanish students. Journal of Social Psychology. 141, 119-121.

[2] Conger, J.A., Kanugo, R.N., & Menon, S.T. (2000). Charismatic leadership and follower effects. Journal of Organizational Behavior. 21, 747 – 767.

[3] Marinak, B.A. & Gambrell, L.B. (2008). Intrinsic motivation and rewards: What sustains young children’s engagement with text? Literacy Research and Instruction, 47(1), 9 – 26.

[4] Carron, A.V., Colman, M.M., Wheeler, J., & Stevens, D. (2002). Cohesion and performance in sport: A meta analysis. Journal of Sport and Exercise Pscyhology. 24, 168 – 188.

 


Does the Home Run Derby Affect Batted Ball Distribution?

Last week on RotoGraphs’ The Sleeper and the Bust podcast, Eno and Paul briefly discussed the possibility that Todd Frazier’s second half swoon in 2014 and again here in 2015 might have something to do with his participation in the Home Run Derby. While de-bunking the Derby Curse has been a popular topic of many data-driven pieces in recent years, research has largely focused on outcomes. For example, looking at changes in first and second half OPS and HR% for participants. Eno considered that the effects of the Derby might reveal themselves in other more subtle manifestations like batted ball data. Looks like he was onto something.

Most of the research on the subject that I’ve read takes a binary approach to participation – comparing splits of those who participated to those of players who didn’t. However, the Derby Curse’s narrative is that dozens of max-effort and mostly pull-side swings ruin a player’s 2nd half approach at the plate. So why would Bret Boone’s 2003 zero-homer first round exit lead to a 6% decrease in HR/FB rate in the 2nd half? After all, his *cough* economical Derby performance required he take only the minimum number of swings possible. Could it be plausible that changes in batted ball distribution are correlated with Derby performance rather than mere participation?

To find out, I exported the 1st and 2nd half Batted Ball data from the FanGraphs leaderboards for all Derby participants dating back to 2002, the earliest that batted ball data is available. I then added a column for home runs hit by each participant and regressed changes in batted ball rates for each BIP type against the number of home runs hit in each Derby performance.

In doing so I found 3 statistically significant relationships: ΔOppo%, ΔMed%, and ΔHard%, with the first two negatively correlated with HR hit and the latter positively correlated.

Coeff R2 p-value
ΔOppo% -0.10531 0.06042 0.01149
ΔMed% -0.11301 0.04041 0.03977
ΔHard% 0.10106 0.03567 0.05365

As one might expect running only simple regressions, the R2 values are low, intimating that other factors explain the majority of the variance. And I’m not sure that even a great performance at the Derby that requires those repeated max-effort and mostly pull-side swings has that significant of an effect on the RoS batted ball data. That said, a participant who hit 20 HR could expect on average to see a 2% decrease both in Oppo% and Med Hit % and a 2% increase in Hard Hit %.

It’s interesting that if anything, the data suggests that a better Derby performance correlates to an increased Hard% in the 2nd half, although it seems to come at the expense of Med% not Soft%. Nevertheless, an increase in hard-hit balls runs contrary to the notion that success at the Derby leads to a second half swoon.

ΔMed%

 Derby HR vs. Change in Med%

ΔHard%

 Derby HR vs. Change in Hard%

And while there’s no statistically significant increase in Pull%, it’s worth noting that the opposite hit type, Oppo%, decreases for those who do well at the Derby. Is that because players think more about pulling the ball and favor the inside pitch post-Derby or is there some temporary loss of skill in going the other way? Perhaps looking at how Pull/Center/Oppo distributions and heatmaps change in the weeks following the Derby might shed more light on that.

ΔOppo%

Derby HR vs. Change in Oppo %

So while we may not necessarily have proved or disproved the existence of a Derby Curse, we at least discovered that an exciting Derby performance is, if anything, more likely to precede an increase in the amount of hard contact a participant makes in the second half. Unfortunately for Bret Boone, this news may have come 12 years too late.


A Theory and A Challenge

I love this site. It covers the full spectrum of baseball, from classical scouting all the way to the most esoteric of baseball analysis. At times I envy the analytical abilities of our writers, as well as their access to granular data, that I likely lack the technical competence to gather. Today, I would like to propose a a theory, as well as a challenge to the numerous writers on this site to put the theory to the test. It is also likely that this has been proposed before and answered before, in which case, point me in that direction please.

THE THEORY:

We can measure command by compiling a pitcher’s xISO and xBABIP based solely on where they locate their pitches, in the context of the hitter’s preference to location. In other words, the ability to “pitch to the corners” is only valuable if one is pitching to corners that the hitter can’t get to, which is batter-specific. An 80-command pitcher will be able to maximize the xISO of his pitches, simply by pitching to “cold” areas of the hitter’s strike zone.

There are a few of ways to approach this (I’m sure more than three, but I digress). The first question is what sample size to use to estimate the player’s preference within the strike zone? Evidence suggest certain players make rapid adjustments (Trout) which would indicate a SSS would be ideal, whereas other players exhibit strong long-term tendencies (Dozier? just a guess, not founded in data) that would indicate a LSS would be ideal.

The second axis would be to evaluate a player’s effective strike zone, i.e. if we looked at the hitter’s swing probabilities, what type of strike zone would we construct, given only data concerning the hitter’s propensity to swing. We could then tease out whether the pitcher is maximizing the player’s effective strike zone (pitchers only throwing balls to Vladdy Guerrero comes to mind). This analysis may be redundant, as this can probably be captured if we are able to incorporate the third axis:

What are the thresholds for considering a pitch well-located? I.e. if a pitcher throws a ball way outside, but the hitter swings, then this is a well-placed pitch, thus at what probability of swing% is a ball a well-commanded pitch?

THE CHALLENGE

Test it! (or show me where this has already been fully fleshed out.) I’ve always wondered if there was a way to build up a command ERA to see if a pitcher is able to put it where hitters have to swing but don’t want to and I look forward to reading about it.


Examining Three True Outcome Percentage

Take a look at Chris Davis’s stat line in August: 11 games, 45 PA, 14 Ks, 7 BBs, 6 HRs. Nothing really jumps out; it’s pretty typical for Chris Davis. Looking deeper though, this selection of plate appearances is actually quite remarkable. 27 out of the 45, or 60% of them, ended with a strikeout, walk, or home run, known as the “three true outcomes” where the ball does not end up in play.

As Baseball Prospectus explains in its definition of TTO, the statistic actually gained relevance with the introduction of DIPS, FIP, and other pitching estimators that ignored the outcomes of balls in play. While still not commonly used, it’s certainly interesting to take a look at once in a while to see what players are taking luck into their own hands.

Chris Davis is actually not the most extreme three true outcome player. Despite his 60 TTO% August, his season-long percentage through August 13 stands at 48.9%, good for 5th in baseball of those who have at least 300 plate appearances. The rest of the top-10 leaderboard features both good names and bad. On the good side, we have Giancarlo Stanton, the only player to feature a HR% over 8% (his is 8.5% , and he actually leads second-place Nelson Cruz by 1.4%). Other names you might associate with quality players are Bryce Harper, Joc Pederson, and George Springer, all of whom have a K% under 30% and a HR% of over 4%. The players who might not be as happy to be on this list include the aforementioned Chris Davis, Chris Carter, Steven Souza, Kris Bryant, and Colby Rasmus, who all feature a K% of 31% or higher. Mike Zunino, who comes in at 10th, sports a walk rate and home run rate of just 5.6% and 2.8%, respectively, but more than makes up for it with a 34.2% strikeout rate, second only to Souza.

Now that we’re done with the fun facts, let’s get into what it really means. TTO players are swing-for-the-fence players, those who aim to hit the ball over the wall every time they make contact. This is the cause behind their multitude of strikeouts. It also accounts for their walks, with the reasoning that pitchers are simply afraid to throw them hittable pitches.

The real question becomes “Are these TTO players valuable?” Looking at a graph comparing TTO% to wRC+ over the past 15 years, there is little correlation. It seems as though it is slightly more productive to be a TTO player, mainly because of the home runs and walks. This is far from a correlation though, as many bad players have a high TTO% and vice versa.

If we split it up into its parts, we might get a better view. League average TTO% has risen over the last decade, from 27.3% in 2005 to 30.3% this year (with a high of 30.5% in 2012).

We know the overall percentage has risen, but what’s driving it? If you’ve been following baseball, you know that the quality of pitchers has improved in recent years. Predictably, this has led to a decrease in walk rate and home run rate.

 

If 2/3 of the TTO% has decreased, but TTO% has still increased, that must mean the change in the third category must be drastic. This happens to be exactly the case. While BB% and HR% have fallen approximately a combined 1% over the past 10 years, league wide K% has risen by 4%.

What this means is that nowadays, if you are a TTO player, it’s likely much of that is coming from your strikeouts. In fact, out of the top-25 TTO% players with at least 200 PAs, only Paul Goldschmidt has a K% under 20%. Does this make high TTO% players bad? As I said before, there really isn’t a correlation, You’ll see players like Bryce Harper and Mike Trout with a high TTO%, while Buster Posey has one of the lowest because of his low K%.

The reality is, there are many different kinds of players. Some have adopted this TTO mentality, but others have stayed with a more conservative contact-focused approach. Without further information, it’s difficult to say which strategy is better. As a fan of statistics, I prefer the TTO players because it’s much easier to predict their performance. I don’t think they care much about that though.

Also, if you were curious, here’s a list of the top TTO% players with 200 PAs, created using FanGraphs data through August 13.


Don’t Sleep On These Post Hypers

NL West Edition

We’ve all been there and done that, our dynasty/keeper league(s) haven’t gone as planned. Perhaps you went for it in the offseason, ditched your prospects for grizzled productive vets and it all went south from there. No matter your story, the rebuild can be difficult in the sense of valuing the players you want. You could fall into the “shiny new toy trap” and end up with a bust or broken player (envision a Joc Pederson type in an AVG league instead of OBP). In this upcoming series, I will be highlighting players based on positions and pointing out whether I’d go for them in separate leagues (NL/AL only) or mixed.

So without further ado, here’s the first segment.

Read the rest of this entry »


BABIP Aging Curves

At age 35, Albert Pujols is having somewhat of a resurgent season. Many wrote him off last year after he posted his second straight, for him, subpar season. This year, though, he has hit 30 home runs through 108 games with ZiPS projecting him to get to 40 on the season. But there remain two big differences between 2015 and prime Pujols. One, he is walking less, at 7.5% vs. his career average of 11.8%. And two, his BABIP is a minuscule .228, continuing a declining trend:

Pujols BABIP

It certainly makes sense that with a loss of footspeed, BABIP would decline as well. After doing a quick mental recall, I decided to look up Mo Vaughn as another power hitter who seemingly lost it overnight. And sure enough, he experienced a big BABIP decline late in his career as well:

Vaughn BABIP

He still put up a .314 BABIP in his last full season, but it was a step change from the average .365 (!!!) BABIP he put up from 25-30.

So, is this a larger trend that we should be paying attention to? Or are Pujols and Vaughn just confirmation bias. Thanks to FanGraphs’ excellently downloadable data, I expanded the datatset to include every season and every player. Grouping by age reveals:

BABIP by Age

Well seemingly a lot of nothing. The BABIP for all 20 year olds in that time was .301, while the BABIP for all 39 year olds was .295. Definitely a decline, but with a p-value of 0.7 is not statistically significant. So that’s disappointing for my thesis, but encouraging for all the old folks out there! Back to the drawing board.

Pujols and Vaughn were big, hulking guys. Maybe when they lost a step, it was a step that they could less afford to lose and the impact on their BABIP of a marginal slowing down was magnified. So what if we restrict the group to only power hitters? For this, I defined power hitters as players with career ISOs over .200. The results appear to support my hypothesis better:

BABIP by Age, Power Hitters

This is plotted on the same scale as the previous chart so we can appreciate the relative differences. For this sample, the BABIP for power hitters declined from .313 at age 22 to .296 at age 36. Interestingly enough, power hitters had higher BABIPs earlier in their careers than the general population (including the power hitters), which then dip lower than the general population later in their careers. Apparently hitting the ball hard does have some benefits.

This time, the science backs up the hypothesis! My engineering professors would be so proud. With a p-value of 0.0165, the difference in BABIP between a 36 year old power hitter and a 22 year old power hitter is statistically significant. Pujols and Vaughn were indeed the victims of a real trend.

There could be a number of factors behind this. The first one I highlighted is the loss of footspeed. Second, it could just be that as you get older you don’t hit the ball as hard. Looking at exit velocity or ISO by age would help us judge that. Finally, age and a loss of bat speed or reflexes could lead to a change in batted ball in a way that leads to less balls falling for hits. It would make sense that as his bat speed slowed, Pujols tried to hit more fly balls to recover some of the home run power. That is the next thing I will look at.


An Overview of Prospect Production by Minor League Plate Appearances

Prospects are the lifeblood of any baseball organization. They have the ability to provide large amounts of value for their team while making a fraction of what they could earn on the open market. This provides a huge competitive advantage for teams that have a superior player development system. Every organization has a different plan for their prospects and the purpose of this research was to attempt to determine which development plan yields the most production in a team’s cost controlled years for each group of players.

The Data

The first step in gathering the data was to find every hitter that debuted from 1995-2009. I stopped at 2009, because this covers most of the prospect’s cost controlled years. I chose to start in 1995, because it gave me a big sample size and I got to avoid the strike year of 1994. Next, I omitted anyone who debuted at the age of 29 or older. I did this, because players that are over 28 are usually not considered prospects and their clubs would not consider them to be future building blocks for their organization.

The final step was to eliminate anyone who did not exceed their rookie limits. I decided to omit these players, because any player that cannot amass 130 at bats in their career was probably never considered a serious prospect. If they were, at least one team would have given them more opportunities to earn a starting job.

Methodology

To determine a player’s production during his cost controlled years, I found when every player exceeded their rookie status and added the next five years of WAR to their total. If the player had previous major league experience prior to the season they lost their rookie status, I included those numbers as well. For a player’s minor league plate appearances total, I included all of their plate appearances from the start of their professional career up to and including the year they lost their rookie status.

I then broke up the data by player groups. I split up the data by players who attended college, American born players that did not attend college and international born players that did not attend college. Throughout the rest of this article, I will simply refer to these groups as college players, high school players and international players.

Next, I partitioned the data by minor league plate appearances. I decided to split the plate appearances into groups of 500. I chose this amount of plate appearances, because it is a nice proxy for a full season of production and it splits the data into a fairly even distribution of players among the groups.

Overall Performance

I’ll start by giving a simple overview of total player production over their cost controlled years. The table below shows the median WAR for each grouping. I decided to use median instead of average throughout this article, because the WAR measurement is right skewed instead of normally distributed.

Median WAR for All Players

View post on imgur.com

College Observations

As you can see in the table above, college players need the least amount of plate appearances to produce a high level of WAR, but there is a sharp decline in production when a college player amasses over 2500 plate appearances. It makes sense that this player group is the quickest to develop, because they have had several more years of amateur competition to help hone their skills for professional baseball. This should create a smoother transition period for these players and reduce the amount of plate appearances needed to become a valued member of the major league club.

High School Observations

Unlike their college counterparts, American high school players take an extra 500 plate appearances before they reach their peak value of 15.4 WAR. However, high school players also have a wider range of success than either college or international players. High school players also produce more than the other two groups of players. This result may seem counter-intuitive, since it is commonly accepted that high school players are riskier prospects than college players. It is important to remember that this process does not account for all of the high school prospects that never receive an at bat in the majors. We therefore create a selection bias where we only look at the players that were good enough to make it to the majors in the first place. This means that if a high school player is good enough to make it to the majors; he’s probably going to be a productive major leaguer.

International Observations

The international player group offers the least amount of production. I believe there are several factors that contribute to this result. One of the main factors could be that many of these players have not played as much organized baseball as their counterparts. I also think that there could potentially be a language barrier issue that makes it more difficult for an organization to teach foreign players as opposed to their English speaking teammates. Of course that conclusion is just pure speculation on my part, but I believe that it is a reasonable assumption to make.

Total Player Summary

As the table above shows, the longer a prospect is in the minor leagues, the less chance they have of making an impact in the major leagues. This makes sense, because if a prospect is outperforming everyone in the minor leagues, they will be called up much sooner to help the major league club than everyone else. This leads me to believe that this table may not be the most informative for every minor leaguer. Perhaps, if we segment the data between Baseball America’s top 100 prospects and every other prospect, we will get a more accurate depiction of minor league development. It is essential to remember that the more we split the data, the less accurate our individual values may be. Therefore, we should not take the numerical value of WAR for each grouping too seriously. It is more important to take an overall view of the values in the tables below before drawing any conclusions about player development.

Median WAR for Top 100 Prospects

View post on imgur.com

Top 100 Prospects Summary

Yet again, we see that college players develop the quickest and that high school players take a little longer to develop. College players also have a quick drop in production after 1000 plate appearances, but they still yield the highest production of the three groups. International prospects are a bit of a mystery here. There does not seem to be a pattern in their production. I assume this is because there are major differences in baseball development between South American prospects, Japanese prospects and Canadian prospects, and any other nation’s prospects you can think of. In the future I may revisit this issue, but for now I’ll have to make do with what I have.

Median WAR for Non-Top 100 Prospects

View post on imgur.com

Non-Top 100 Prospects Summary

As expected, we see a dramatic drop in overall WAR across the board. This means that Baseball America is usually correct when identifying the most impactful future major league players. Kudos to you Baseball America. We also observe that these groups of players develop a bit more slowly than their more heralded prospects. These college players continue to peak early, but they are still 500 plate appearances in development behind the top prospects. High school players take even longer to develop now with a peak of 2.8 WAR in the 2001-2500 plate appearances group as opposed to 15.4 WAR in the 1001-1500 plate appearances group for the top high school prospects. International players are much more consistent in this table than the previous one. Unfortunately, they also have the worst total median WAR of 0.1.

Conclusions

So let’s do a quick recap. Usually the less time a player spends in the minors, the more productive they will be in the majors. High school prospects offer the most production, while international prospects offer the least production and college prospects fall somewhere in-between. We also observed that college prospects develop the quickest, high school prospects develop a little slower and international prospects are a bit of a mixed bag. I attributed this to simply combining all foreign born players into one group instead of by nation or continent.  I hope this article has been informative and that it provides some guidance on when teams should consider calling up their most prized assets.