Author Archive

Ichiro Might Have Been Able to Be a Power Hitter

Earlier this month, Eno Sarris posted an article called “Could Ichiro Have Been a Power Hitter?,” which began with a launch angle and exit velocity analysis of Ichiro himself, and developed into a wider examination which led to the interesting proposition that “players may have their own ideal launch angles based on where their own exit velocity peaks.”  In this article, I’ll look at a larger sample of players whose fly-ball rates increased from 2015 to 2016 and see if their peak exit velocity range changed or stayed constant.  First I’ll re-examine Elvis Andrus, then I’ll look at Jake Lamb, Xander Bogaerts and Salvador Perez.

Elvis Andrus

As mentioned by Eno, Andrus’ average launch angle went from 8.1 in 2015 to 8.6 in 2016, but his fly-ball rate actually decreased.  It seems like he started the change in 2015, but was only able to translate it into results (a 112 wRC+) in 2016.  Regardless, let’s look at the data again, and see what we can find.

Instead of just qualitatively looking at the distribution and giving an approximate range of maximum exit velocity, I split the data set into launch angle buckets, and found the bucket with the highest median exit velocity.  For example, if I set the bucket size at 5 degrees and applied it to Elvis Andrus in 2015, I got a range (-2°, 3°) (I’ll omit the degree symbol from now on).  If I set the size at 10 degrees, I got a range (-2, 8).  For the rest of the article, I’ll keep it set at a range of 5 degrees.

The peak range for Andrus’ 2016 was (-3, 2).

Using the method outlined, the peak range for 2015 was (-2, 3), and for 2016 it was (-3, 2), so Andrus’ peak exit velocity range did not change much from 2015 to 2016, just as Eno pointed out, and as we can see with the two years overlaid.

Jake Lamb

Comparing 2015 and 2016, Jake Lamb raised his average exit velocity from 89.7 to 91.3 MPH, and his fly-ball rate from 32.4% to 36.7%.  His adjustments were chronicled by August Fagerstrom during his breakout (http://www.fangraphs.com/blogs/jake-lambs-revamped-swing-made-him-an-all-star-snub/).

The peak 5 degree range for Jake Lamb’s 2015 was (3, 8).

The peak 5 degree range for Lamb’s 2016 was (15, 20)!

Unlike Andrus, Jake Lamb’s peak exit velocity range increased along with his launch angle distribution!  This seems to be the kind of effective swing change that players attempting to join the fly-ball revolution strive for.  Lamb managed to revamp his swing to not only elevate the ball more, but to hit the ball harder at high launch angles, and actually increase the angle at which he hit the ball the hardest.  However, as the next two cases show, this is far from a guaranteed outcome.

Salvador Perez

Perez’s peak 2015 range: (9, 14).

Perez’s peak 2016 range: (0, 5).

From 2015 to 2016, Perez increased his fly-ball rate from 37.4% to 47.1%, and increased his average exit velocity from 87.3 to 88.8 miles per hour.  He also increased his average launch angle from 13.7° to 19.1°.  But curiously, his peak exit velocity range actually went down from (9, 14) to (0, 5)!  When I saw this, I thought I’d have to change my methods, because it didn’t make sense to me at first.  But if you look at Perez’s exit velocity vs. launch angle graphs for 2015 and 2016, these ranges actually seem to qualitatively fit.  Somehow, the Royals backstop managed to hit the ball harder and higher, but become more effective at lower launch angles.  This could be a rising tide lifts all ships situation, whereby his swing adjustments let him hit tough low pitches hard at lower angles, or it could just be a sample size issue.  By splitting the data set into buckets, the sample size gets dangerously small, and prone to strange results.  But I think the results fit the picture, and either Sal Perez needed to hit more balls for us to get reliable results, or he just had a strange batted-ball distribution.  We have a similar, more extreme situation with Xander Bogaerts next.

Xander Bogaerts

Bogaerts’ peak 2015 range: (5, 10).

Bogaerts’ peak 2016 range: (-6, -1).

Bogaerts, like the other three players here, hit the ball harder in 2016 than in 2015.  He raised his fly-ball rate and his average launch angle, and was rewarded with a 113 wRC+, a slight improvement on his 109 wRC+ from 2015.  But his peak exit velocity range for 2016 was, like Perez, lower than in 2015.  Looking at his plots, it looks like he hit his ground balls harder in 2016, while not changing the exit velocity of his line drives and fly balls as significantly.  I’m not sure what else to say about Xander, other than that he’s kind of a weird player, as already noted by Dave Cameron (http://www.fangraphs.com/blogs/xander-bogaerts-is-a-very-weird-good-player/).

Summary

The following table summarizes the findings for each player.

Avg EV Fly Ball % Avg Launch Angle Peak EV range wRC+
2015 2016 2015 2016 2015 2016 2015 2016 2015 2016
Elvis Andrus 85.2 86.9 31.8% 28.5% 8.1 8.4 (-2, 3) (-3, 2) 78 112
Jake Lamb 89.7 91.3 32.4% 36.7% 11.4 10.4 (3, 8) (15, 20) 91 114
Salvador Perez 87.3 88.8 37.4% 47.1% 13.7 19.1 (9, 14) (0, 5) 86 88
Xander Bogaerts 87.6 88.8 25.8% 34.9% 6.6 11.3 (5, 10) (-6, -1) 109 113

It seems like Andrus improved by simply hitting the ball harder and staying within his peak exit velocity range of launch angles (which fits Eno’s hypothesis), whereas Jake Lamb improved by hitting the ball harder, raising his average launch angle, and shifting his peak exit velocity range (which runs contrary to Eno’s hypothesis).  Perez and Bogaerts didn’t really improve, and their Statcast data yielded some strange results, which suggests that this method is far from foolproof, and that there may have been better choices of players to investigate.

Many thanks to Eno for the inspiration for this article, and to Baseball Savant for all of the Statcast data.


A Model of Streakiness Using Markov Chains

In the modern MLB, the record for the longest losing streak sits at 23 games, set by the 1961 Philadelphia Phillies, while the longest winning streak sits at 21 games, set by the 1935 Chicago Cubs.  In recent memory, the 2002 Oakland Athletics come to mind, with their Moneyball-spurred 20-gamer, taking them from 68-51 to 88-51 and first in their division.  Winning streaks captivate a fan base, and attract league-wide attention, but little is understood about their nature.  How much luck is involved?  Are certain teams or players more inclined to be streaky?  Are teams really more likely to win their next game if they’ve already won a few in a row?  In this piece, I’ll outline a simple model for what legitimate team-level streakiness might look like, and see if any interesting behaviour arises.  I was able to do this after reading the section on Markov Chains in Linear Algebra by Friedberg, Insel and Spence.

The Model

This model only requires two inputs: the probability of a team winning a game given that they won the previous game (hereafter P(W|W)), and the probability of a team losing a game given that they lost the previous game (hereafter P(L|L)).  Admittedly, this assumes ballplayers have very short memories, but.  The first thing we need to generate is what’s called a transition matrix:

The first row contains the probabilities that a team will win a game based on what happened in the previous game, and the second row contains the probabilities of losing.  Notice that the entries of each column sum to 1, so we can rewrite this as

Without going into too much detail, all we need to do is multiply matrix A with itself a lot, and find the limit as we do this infinitely many times.  This will give us another matrix which will contain two identical columns, each of which will correspond to the long-term probabilities of winning given a team’s P(W|W) and P(L|L) values.

For example, if our team has P(W|W) = 0.6 and P(L|L) = 0.5, we’ll have
,
and the limit of Am as m goes to infinity is
.
So our long-term probability of winning will be around 0.56.  Over the course of a full season, then, this team would expect to win around 90 games.

Now we can examine various cases.  It may not be surprising to find that if we have P(W|W) + P(W|L)  = 1, we’ll have a long-term probability P(W) = P(L) = 0.5.  That is, no matter how streaky a team is, if their probabilities of winning after a win and after a loss sum to 1, their expected win total over a 162-game season is 81.  But what if we look at a given long-term probability P(W), and see what conditional probabilities P(W|W) and P(L|L) give us P(W)?  In the table below, pay special attention to the boxes with P(W) values of 0.5, 0.667 (our incredible team) and 0.333 (our really really bad team).

P(L|L)\P(W|W) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0 0.500 0.526 0.556 0.588 0.625 0.667 0.714 0.769 0.833 0.909
0.1 0.474 0.500 0.529 0.562 0.600 0.750 0.818 0.900
0.2 0.444 0.471 0.500 0.533 0.571 0.615 0.667 0.727 0.800 0.889
0.3 0.412 0.438 0.467 0.500 0.538 0.583 0.636 0.700 0.778 0.875
0.4 0.375 0.400 0.429 0.500 0.600 0.667 0.750 0.857
0.5 0.333 0.357 0.385 0.417 0.455 0.500 0.556 0.625 0.714 0.833
0.6 0.286 0.308 0.333 0.364 0.400 0.444 0.500 0.571 0.667 0.800
0.7 0.231 0.250 0.273 0.300 0.333 0.375 0.429 0.500 0.600 0.750
0.8 0.167 0.182 0.200 0.222 0.250 0.286 0.333 0.400 0.500 0.667
0.9 0.091 0.100 0.111 0.125 0.143 0.167 0.200 0.250 0.333 0.500

(Pardon the gaps in the table — my code had a bug that made it output zeros for those parameters, and I didn’t feel like the specific numbers were integral to this article so I didn’t calculate them manually.)

For P(W) = 0.5, we notice a straight line down the diagonal – which makes sense, given that we know P(W|W) + P(W|L) = 1 for these entries.  For P(W) = 0.667 and P(W) = 0.333, we have the following pairs of P(W|W) and P(L|L):

P(W) = 0.667 — (P(W|W), P(L|L)) = (0.5, 0) or (0.6, 0.2) or (0.7, 0.4) or (0.8, 0.6) or (0.9, 0.8)

P(W) = 0.333 — (P(W|W), P(L|L)) = (0, 0.5) or (0.2, 0.6) or (0.4, 0.7) or (0.6, 0.8) or (0.8, 0.9)

So our two-thirds winning team could just never lose two games in a row and play at a .500 clip in games following a win.  Or they could lose a full 80% of their games after a loss, but be just a little bit better at 90% in games after they win!  How could a team that never loses two games in a row be the same as a team that is so prone to prolonged losing streaks?  It’s because we selected this team for its high winning percentage, so even though P(W|W) and P(W|L) actually sum to less in this case (1.1 instead of 1.5), the fact that this team wins more games than it loses means it’ll have more opportunities to go on winning streaks than losing streaks.

Likewise, our losing team could never win two games in a row but play at .500 in games following a loss, or they could be the streaky team who wins 80% of games following a win but loses 90% of games following a loss.

These scenarios are illustrated below.  The cyan dots correspond to the following pairs of points (P(L|L), P(W)) from top left in a clockwise direction: (0,0.667), (0.8, 0.667), (0.9, 0.333), (0.5, 0.333).  These are exactly the scenarios discussed above.

 

(Insert caption here)

 

These observations indicate a more general property, which will sound trivial once we put it in everyday baseball terms.  If your long-term P(W) is above 0.5, and you have to choose between two ways of improving your club – you can improve your performance after wins, or you can improve your performance after losses – you should choose to improve your performance after wins.  And if your long-term P(W) is below 0.5, you should choose to improve your performance after losses (up until you become an above-average team through your improvements, of course).  In other words, if you expect to win 90 games (and hence lose 72), you want to improve your performance in the 89 or 90 games following your wins rather than in the 71 or 72 games following losses.

Conclusions, Future Steps

I don’t have anything groundbreaking to say about this experiment.  It’s obviously an extremely simplified model of what real streakiness would look like – in the real world, the talent of your starting pitcher matters, your performance in more than just the immediately preceding game matters, as well as numerous other factors that I didn’t account for.  However, I feel comfortable making one tentative conclusion: that the importance of the ace of a playoff contender being a “streak stopper” (i.e. one who can stop losing streaks) may be overstated, simply because the marginal benefit from such a trait is smaller than the marginal benefit from being a “streak continuer.”  I have never heard of an ace referred to as a “streak continuer,” even though this model indicates that on a good team, this is more beneficial than being a “streak stopper”.

I don’t think it’s worth examining historical win-loss data to compare with this model, as this was not intended to be an accurate representation of what actually happens; rather more of a fun mathematical exploration of Markov chains applied to baseball.

Thank you for reading!  Questions, comments, and criticisms are welcome.