Archive for Research

Who Are the Top “Pound-for-Pound” Power Hitters?

We all know that Aaron Judge hit for more power this year than Jose Altuve. But, whose power was more impressive? Aaron Judge, who is 6’7 and 282 pounds, has a considerable size advantage over Jose Altuve, at 5’6 and 164 pounds. Perhaps Altuve is actually a better power hitter for his size than is Judge. Let’s expand this idea to the entire league: who is the pound-for-pound top power hitter?

Role of Height and Weight in Batter Power

Using simultaneous linear regression, I estimated the effects of two physical characteristics — height and weight — on batter power. Measures of batter height and weight were taken from MLB.com. For batter power, I used Isolated Power.

As shown in the figures below, weight and height have positive relationships with power.

Height and Weight

Weight has a stronger relationship with power than height, though it is difficult to see in the figures alone. (It’s also not intuitively clear exactly how height affects power.) In subsequent analyses, I consider both weight and height.

Who are the top pound-for-pound power hitters?

Using the model, one can predict a batter’s expected power (based on height and weight) and compare it to their actual power.

Who are the top pound-for-pound power hitters? See below for the results.

Top 10 hitters

Khris Davis, formerly the #9 top power hitter, emerges as the #1 pound-for-pound power hitter in baseball. In 2017, Davis, who is three inches and over 30 pounds below average for a Major League hitter, hit a remarkable 43 home runs in 2017, with an ISO of .281. Nolan Arenado and Josh Donaldson made similar jumps in the rankings, from #7 to #2, and #10 to #3, respectively.

Notable power hitters have fallen slightly on this list, though remain in the top 10. For example, Aaron Judge fell from the top spot to #8, while Giancarlo Stanton dropped three spots (#2 to #5). It is important to note here that these power hitters are still impressive – continuing to hold spots in the top 10, regardless of their size.

Biggest improvements in rankings

Which players showed the most improvement in the list? Below are results from the top 50 players on the list.

Top 3 improved rank players

Andrew Benintendi showed the largest increase in rankings (from 184 to 43). Jose Altuve nearly broke into the top 10, jumping from 132 to 12. Lastly, Eddie Rosario improved 68 spots (100 to 32). Altuve, in particular, has recently shown increases in power (from .146 to .194 to .202 in 2015-2017); as a result, his pound-for-pound status may continually increase in upcoming years.

Who was more impressive?

To reference the initial question in this article: was Jose Altuve’s or Aaron Judge’s power more impressive? Results from the above analyses were compiled from 2015 to 2017 seasons. To compare Altuve and Judge’s recent season, take a look below.

Altuve vs Judge

Aaron Judge tops Jose Altuve in the pound-for-pound hitter rankings – by a very thin margin – in 2017. Judge’s power performance exceeded expectations (as predicted by his height and weight) to a slightly higher degree than Altuve.

Full Rankings

If you want to see the full list of hitters for this dataset, including the worst pound-for-pound power hitters (poor Jason Heyward!), click here.

Analysis

Read the rest of this entry »


Second Half Fly Ball Escalators – Part 1

The fly-ball revolution is upon us.  We all know this; it’s been happening since the second half of 2015 and has continued through 2017.  This doesn’t seem to be a fluke or blip on the radar.  Until MLB changes the ball or does something to shift favor to the pitchers, fly balls aren’t going away.  The ratings are up and there’s a great young crop of major league players who play with a ton of passion and they are embracing this revolution.

First, let’s start with the parameters I set for this statistical analysis.  It’s easier to see which hitters change their approach year to year but I wanted to focus on players who have increased their fly balls in the 2nd half of 2017.  I split the data between the 1st half and the 2nd half of 2017 with a minimum of 200 PA in each half.  I was only going to include hitters who increased their fly-ball rates by 4% of more between the 1st half and 2nd half but it would have excluded Byron Buxton (2.4% increase) and Giancarlo Stanton (3.4%).  I want to talk about both of them, so I went a little lenient to include those two.

Now that I have my crop of fly-ball escalators, I also included Infield Fly%, BABIP, HR/FB, and Hard Hit%.  I wanted to see the increase in fly balls affected these statistics and see whether of not they make sense or if luck played a role (I mean, it’s baseball, luck is always involved).  Keep in mind, not everyone is benefiting from hitting more fly balls.  Here’s the table of players I believe should benefit in 2018 with the increase fly balls if their approach remains the same, via Google Docs.

Eugenio Suarez

Suarez had a nice little breakout year in 2017 with a wRC+ of 117.  In the 2nd half of 2017 he significantly increased his FB% while decreasing his IFFB%.  That’s huge because of course infield fly balls are essentially an automatic out.  He did all that while increasing his LD% and hard hit%!  This to me looks like a conscious change for Suarez coming into 2018.  His overall numbers look pretty good in 2017 with a triple slash of .260/.367/.434 with 26 HRs (career high), and he’ll be entering his age-26 season.  All that being said, I think there’s still upside there.  Here is his slash for the 2nd half of 2017: .268/.378/.490 with a wRC+ of 126!  For reference, here are few players with similar wRC+ in 2017: Gary Sanchez (130), Nolan Arenado (129), Domingo Santana (126), and Chris Taylor! (126) (more on him later), and Brian Dozier (124).  You get the idea.  But can Suarez do it for a full season?  If he does, we are looking at a 30-100 player in 2018 hitting 4th or 5th behind Joey Votto and Adam Duvall.  In my opinion, he’s a better hitter than Duvall and should be slotted behind Votto.

Of this group of 2nd half fly-ball surgers, Suarez is one of the more intriguing for fantasy purposes.  Suarez is and has been the starting 3rd baseman for the Reds, but he’s also one of only two players on the roster who have logged significant time at SS within the last three seasons (the other being Jose Peraza) now that Zack Cozart is gone.  Nick Senzel, who finished the season in AAA, is knocking on the door and 3rd base is his main position, but they are giving him reps at 2nd (which should tell you they like Suarez at 3rd).  This creates a logjam at 2nd with Scooter Gennett but still doesn’t solve the shallow SS position.  Maybe the Reds address it or maybe Suarez plays some shortstop and on those days, Senzel moves to 3rd.  If this happens and Suarez gains SS eligibility, he could be at top 8-10 shortstop right behind Corey Seager.

Manuel Margot

Coming into 2017, Margot was a consensus top 50 prospect and was ranked 24th overall by Baseball America.  Eric Longenhagen of FanGraphs graded him at a 70 speed score out of a possible 80. So far, it checks out per Baseball Savant, as he ranks 8th in average sprint speed in all of baseball.  Something else you may notice on Margot’s FanGraphs page is the potential for a 55 raw power grade.  You can’t totally ignore the 40 game power grade, but these are the types of guys who have proved to benefit the most from the “juiced ball.”  Keep in mind that Margot played all of 2017 at age 22.  This kid is still learning the game and developing power.

That being said, his batted-ball profile leaves a lot to be desired.  He made a lot of soft contact and, of course, not a whole lot of hard contact.  However, based on the 1st half / 2nd half splits, he made adjustments with not only more fly balls and line drives but harder contact.  That’s a good sign, but yet his BABIP dropped in the 2nd half.  Sure, a speedster like Margot can benefit from weakly-hit ground balls (part of the reason Billy Hamilton doesn’t hit below the Mendoza line), but the increase in line drives should have certainly increased his BABIP.  The point is, even with the slight improvement in wRC+ between the 1st and 2nd halves, he was still unlucky.

I expect Margot to continue to make improvements with the bat in 2018.  I don’t expect him to reach the 55 raw power grade, but he’s moving in the right direction.  I also expect him to improve on the bases and utilize his speed a little more while he’s still at his peak (as far as speed in concerned).  There’s an intriguing window with young players who possess speed and untapped raw power where the speed is still at (or near) its peak and the raw power begins to materialize.  Margot will be approaching that window in 2018 at age 23, so you need to jump in now before he’s fully reached that window and becomes a premier power/speed threat that is so rare in fantasy baseball these days.  Jump in now while his ADP is around 200 and you could be rewarded with around 15-18 HRs and 20+ steals in 2018.  His upside could be somewhere around Mookie Betts’ 2017 without the runs and RBI numbers.  Will he ever reach those heights?  I can’t say for sure, but it’s intriguing.  In keeper/dynasty leagues, he’s a great asset to have at his current value.

Logan Forsythe

Forsythe was hampered by injuries in 2017; he broke his toe in April of 2017 and only appeared in 119 games.  In those games he had 439 PA, and hit .224 with six HRs and three steals.  Woof.  Why is he a thing for fantasy baseball in 2018 at age 31?  Well, first the Dodgers traded Jose De Leon to the Rays for him last off-season and exercised his option for 2018. With Utley now gone, second base is his to keep or lose.  So playing time is there unless they sign another 2nd baseman this off-season.  On the plus side, he walked at a career high 15.7% clip and had some big at-bats in the post-season, carrying at least some momentum into 2018.

You would expect Forsythe’s numbers to improve in the second half due to the toe injury in April, and the numbers in the 2nd half look awfully good.  Yes, his line drive rate did drop by 2.8%, but the net positive on FB% + LD% is 12.6% and his hard-hit rate increased by 10.9% in the 2nd half!  That massive BABIP drop of 0.082 seems way out of whack to me.  That’s the reason he hit .201 in the 2nd half.  Now, I’m not saying he’s going to go nuts, but he also cut his SwStr% to 6.6% and his O-Swing% to a career-low 18.7%.  So there are a lot of potential positives with Forsythe in both the average and power departments, based on my research.  I expect the K% to go back down to about 20%, the BABIP to go up about .020 points, and the HR/FB% to be back in the double digits.  His value is going to depend on playing time.  If he platoons, he’s an NL-only bat.  If he doesn’t and gets, say, 550 PA, he could go something like .258/.339 with 14 HRs and seven steals, becoming a solid deep-league MI.

Jacoby Ellsbury

Over the last year or so I had left Jacoby Ellsbury for dead until this research piece.  All of his batted-ball data in the second half of 2017 point to improved results. While his 2nd half 107 wRC+ was an improvement on his 95 wRC+ in the 1st half, I’d argue he was extremely unlucky and it should have been much higher.

Let’s look at the positives: his K% dropped, BB% went up, FB% went up, IFFB% went down, and hard hit% went up.  So then why did his BABIP, HR/FB, and BA (albeit minimally) all go down?  I don’t know.  How’s that for an answer?  In my opinion, it can be chalked up to straight-up bad luck.

Since the Yankees are clearly moving in another direction, Ellsbury may not have a starting spot with Judge, Gardner, and now Hicks listed as starters, with Clint Frazier ready to be a full-time major-league starter when healthy.  The best chance for Ellsbury is to be traded where he can start.  Of course with his huge contract, that could prove to be difficult.  Hypothetically, though, if it happens, he’s good for 20+ steals; he was 22-for-25 last year so his speed is still there, and steals are becoming more and more infrequent.  For fantasy in 2018, he could be a solid 4th or 5th outfielder, going .270 and 10-20 next year.


The Problem With the Shift

The concept of “the shift” has become more widely used throughout major-league baseball. While some teams shift more than most, others are shifted against more than most. The Shift Era is still relatively new as teams dive deeper and deeper into the analytical realm to increase winning percentage. However, is using the shift actually effective?

I believe that there are certainly situations where the shift should be utilized. Players such as David Ortiz, Albert Pujols, Brian McCann, etc. generally are the style of players to shift against. Older players generally rely more on pulling the ball because they are able to generate more power. These styles of pull-only hitters are usually prime targets for shifting against. My question is, why haven’t these players adapted their swing against the shift?

When learning swing mechanics, you’re taught to square up the baseball and drive the ball where it’s pitched. When shifting, pitchers are forced to make very selective pitches to avoid batters driving the ball the other way through the shift. This is hard for pitchers because it takes away some of their effectiveness. Hitters are beginning to find ways to beat the shift and steal easy hits. If a batter is in a shift situation, they can essentially eliminate pitches towards the outside half of the plate. Knowing the pitcher’s pitch arsenal, the batter can then be selective in his approach. Depending on the count, the batter can determine the next pitch, whether it’s offspeed or a fastball. Obviously a tailing fastball in on the hands is hard not to roll over into the shift, but that’s just good pitching.

Batters are finally beginning to grasp that they can beat the shift by simply putting down a bunt down the line. Or, they can create longer bat lag from their hands letting the ball travel deeper in the zone and taking the ball to the opposite field. The best hitters in baseball are those who can hit to all areas of the field. Charlie Blackmon was shifted against 121 times this year; he hit .412 against the shift. Why in the world would teams shift against him 121 times? Kris Bryant was shifted against 210 times; he hit .364. Players like this who are able to adapt their swing progressions at the plate should not be shifted against this often. Teams are simply giving them easy hits, which lead to runs. The whole point of the shift is to avoid baserunners, right?

Again, there are some batters against whom shifting works. Brian McCann was shifted against 248 times and still hit .243 against the shift, which is still pretty good considering it’s towards the bottom of the league. Lucas Duda was shifted against 241 times, hitting .243; still not terrible. Again, there are situations you can get away with shifting. The only time teams should shift should be with no runners on, strict pull hitters, and with a pitcher who’s comfortable with pitching inside.

When teams shift with runners on, I believe it’s a terrible strategy. It’s considerably difficult turning a routine double play with players out of their positions. Also, it’s difficult to catch runners stealing when you have a third baseman trying to find the bag and make the tag. Players like Dustin Pedroia have taken advantage of teams using the shift with runners on to take the extra base with the third baseman out of position. Players are beginning to find holes in the shift and are taking advantage, leading to runs.

When shifting, I believe the best option is to leave the shortstop between 2nd and 3rd, the second baseman shaded up the middle towards the bag, and the third baseman moving into right field between 1st and 2nd. With the third baseman in this position, he can create the same angle to 1st as when he’s at 3rd. This way players are in more comfortable standard positions, keeping the double play a more viable option. Shifting works in certain situations, but teams need to be more careful as hitters begin to adapt their approaches and steal easy hits, using the shift against the enemy.


Cody Bellinger’s Ability to Be Great

Cody Bellinger was called up by the Dodgers to the big leagues on April 25th of this year. Coming in at only 21 years of age, Bellinger was looking to make a name for himself. Toward the beginning of the season he would split starts between left field and first base. Eventually Adrian Gonzalez would go down to injury, giving Bellinger the opportunity of being an everyday first baseman. Bellinger rose to the occasion, cementing himself in the history books, as he will be the National League Rookie of the Year. Not only will he achieve this award, but he helped bring his team to the World Series. Before Bellinger’s arrival to the team, the Dodgers were 9 for their first 20 games. The Dodgers would go on to win 104 of their 162 games.

During the course of the season, Bellinger put up incredible numbers. He played in 132 games throughout the year, driving in 97 runs, scoring 87 times, and belting an astonishing 39 home runs, finishing only behind the powerful Giancarlo Stanton (with 59). Bellinger had a respectable .267 batting average while maintaining a .352 on-base percentage and .581 slugging percentage. He was a force at the plate, putting fear into the eyes of many pitchers. Although he didn’t walk so much — only 11.7% of the time — he still managed to have a wOBA of .380, staying in the top 30 for the MLB. On average, he would draw a walk for about every two strikeouts; not the best, but still better than most players belting over 30 homers. His plate discipline was above average for power hitters throughout the season, but come postseason, this would all change.

Throughout much of the postseason, most people were reflecting on Aaron Judge’s struggles, after having himself a historic season at the plate. Judge would break the record for strikeouts in a postseason until Bellinger would then beat this unfavorable record with 29. Through Bellinger’s 15 postseason games, he would belt three home runs, driving in nine runs and scoring 10 times while walking only three times. Most of these statistics happened during the NLDS and NLCS. His wOBA would fall to .295, with a .219 batting average, walking 4.5% of the time, while striking out in an astounding 43.3% of his plate appearances. In fact, in the World Series alone, he would achieve 17 of his 29 strikeouts. Bellinger would struggle immensely at the plate throughout the World Series, with the exceptions of Games 4 and 5.

During the series, the Astros pitching staff would focus on beating Bellinger in on the hands with curveballs falling out of the zone, and with fastballs tailing up and away. Amazingly, Bellinger during the regular season only chased pitches out of the zone 29.7% of the time. This would change immensely as the Astros pitching staff’s effective deception would often pull Bellinger’s bat out of the zone.

In Game 4, Bellinger would face Astros pitcher Charlie Morton in the top of the 5th with no outs in a 1-2 count. Bellinger’s stance is in a more upright position with his bat also in a vertical position. This makes creating torque through his hands a little more awkward, as he rolls his hands into a hitting position. When this curveball begins to spin further in on his hands, it becomes too difficult to bring his hands in further, leading to this awful swing and follow-through shown. His approach on this pitch looks as if he’s trying to hit the ball 500 feet over the right-field wall; not an optimal mindset in a 1-2 count when you know the curveball is coming. His head was nowhere near the zone; he may as well have swung with his eyes closed. This is the position we often saw Bellinger in throughout the World Series when thrown an inside curveball. However, Bellinger would use this at-bat for his next plate appearance.

Now we see later in the game Bellinger is in a 1-1 count facing Morton in the top of the 7th. He knows he’s going to see a curveball in on his hands and adjusts accordingly. His body is in a lower position with his bat in a more angled approach, with his hands staying back, anticipating curveball, looking to stay in on the ball with his hands and drive it to right field. Bellinger manages to fight this pitch off, fouling it back, showing his adjustment helped. His follow-through is also in a significantly better position, with his head staying back looking at the ball, and his body stays in a more balanced stance. This approach, showing that he’s able to make even a small adjustment to making contact with the low and in curveball, led pitchers to start targeting the outside upper half of the zone with the fastball again.

Here we see in Game 4, Bellinger faces Astros pitcher Charlie Morton with a 1-1 count and 0 outs in the top of the 5th. Bellinger’s body is not in an effective hitting position for hitting this outside fastball. His body is falling out away from the zone, his pivot foot is not providing any power, and his hands reach out from his body too far. Bellinger would acknowledge this issue and had this to say before Game 4:

“I hit every ball in BP today to the left side of the infield,” Bellinger said. “I’ve never done that before in my life. Usually I try to lift. I needed to make an adjustment and saw some results today. I’m pulling off everything. Usually in BP I just try to lift, have fun in BP. But today I tried to make an adjustment. I needed to make an adjustment, and so I decided I’m hitting every ball to left field today.”

This is exactly what Bellinger would do.

In the top of the 9th in Game 4 with a 1-0 count and no outs, Bellinger faces Astros closer Ken Giles with runners on. Bellinger has his eyes locked in on the ball as he’s seen this pitch before. He’s using his approach from batting practice earlier to drill this ball into the gap. He keeps his body in an athletic hitting position, keeping his hands in and generating all his power through his lower half, creating torque through his strong hands. We see him drive this ball into the left-center gap, keeping his eyes on the ball the whole way and maintaining a strong follow-through. Bellinger did exactly what he said he would do and helped his team win this game. He would then carry on this adjustment into Game 5, showing people why he will be this year’s NL RoY.

Although Bellinger would fall into his old habits in Games 6 and 7, his ability to recognize where the problem is and the ability he has to adjust is what makes him an effective hitter. Through this, Bellinger will only continue to become better and will continue to become one of the most feared hitters in the league this next season. At only 22 years old now, Bellinger will become the next big star in this great sport we call Baseball.


Thinking Like an MLB MVP Voter

Photo: Yi-Chin Lee/Houston Chronicle

Baseball season is coming to a close and the Baseball Writers’ Association of America (BBWAA) will soon unveil its votes for AL and NL MVP. The much-anticipated vote is consistently under the public microscope, and in recent years has drawn criticism for neglecting a clear winner *cough* Mike Trout *cough*. This being one of the closest all-around races in years, voters certainly have some tough decisions to make. This might be the first year since 2012 where it’s not wrong to pick someone other than Mike Trout for AL MVP.

Of course, wrong is subjective. The whole MVP vote is subjective. Voter guidelines are vague and leave much room for interpretation. The rules on the BBWAA website read:

There is no clear-cut definition of what Most Valuable means. It is up to the individual voter to decide who was the Most Valuable Player in each league to his team. The MVP need not come from a division winner or other playoff qualifier. The rules of the voting remain the same as they were written on the first ballot in 1931:

1.  Actual value of a player to his team, that is, strength of offense and defense.

2.  Number of games played.

3.  General character, disposition, loyalty and effort.

4.  Former winners are eligible.

5.  Members of the committee may vote for more than one member of a team.

It won’t do any good for me to saturate the web with another opinion piece on who deserves to win. It won’t change the vote, and I don’t think I could choose. My goal is rather to illustrate how BBWAA voters have interpreted these rules over time. Have modern sabermetrics driven any shifts in voter consideration? Do voters actually consider team success? Do voters unconsciously vote for players with a better second half?

I thought the best (and most entertaining) way to answer these questions would be to create a model that would act as an MVP voter bot. Lets call the voter bot Jarvis. Jarvis is a follower.

  1. Jarvis votes with all the other voters.
  2. It detects when the other voters start changing their voting behavior.
  3. It evaluates how fast the voters are changing behavior and at what speed it should start considering specific factors more heavily.
  4. It learns by predicting the vote in subsequent years.

I created two different sides to Jarvis. One that is skilled at predicting the winners, and one that is skilled at ordering the players in the top 3 and top 5 of total votes. The name Jarvis just gives some personality to the model in the background: a combination of the fused lasso and linear programming. And it also saves me some key strokes. If you are interested in the specifics, skip to the end, but for those of you who’ve already had enough math, I will spare you the lecture.

Jarvis needs historical data from which to learn. I concentrated on the past couple decades of MVP votes spanning 1974 to 2016 (1974 was the first year FanGraphs provided specific data splits I needed). I considered both performance stats and figures that served as a proxy for anecdotal reasons voters may value specific players (e.g., played on a playoff-bound team). For all performance-based stats, I adjusted each relative to league average — if it wasn’t already — to enable comparison across years (skip to adjustments here).  Below are some stats that appeared in the final model.

Position player specific stats: AVG, OBP, HR, R, RBI

Starting pitcher (SP) specific stats: ERA, K, WHIP, Wins (W)

Relief pitcher (RP) specific stats: ERA, K, WHIP, Saves (SV)

Other statistics for both position players and pitchers:

Wins Above Replacement (WAR) Average of FanGraphs and Baseball Reference WAR

Clutch – FanGraphs’ measure of how well a player performs in high-leverage situations

2nd Half Production – Percent of positive FanGraphs WAR in 2nd half of season

Team Win % – Player’s team winning percentage

Playoff Berth – Player’s team reaches the postseason

Visualizing the way Jarvis considers different factors (i.e. how the model’s weights change) over time for position players reveals trends in voter behavior.

Immediately obvious is the recent dominance of WAR. As WAR becomes socialized and accepted, it seems voters are increasingly factoring WAR into their voting decisions. What I’ll call the WAR era started in 2013 with Andrew McCutchen leading the Pirates to their first winning season since the early 90s. He dominated Paul Goldschmidt in the NL race despite having 15 fewer bombs, 41 fewer RBI, and a lower SLG and OPS. While Trout got snubbed once or twice since 2013, depending on how you see it, his monstrous WAR totals in ’14 and ’16 were not overlooked.

As voters have recognized the value of WAR, they have slowly discounted R and RBI, acknowledging the somewhat circumstantial nature of the two stats. The “No Context” era from ’74 to ’88 can be characterized perfectly by the 1985 AL MVP vote. George Brett (8.3 WAR), Rickey Henderson (9.8), and Wade Boggs (9.0) were all beaten out by Don Mattingly (6.3), likely because of his gaudy 145 RBI total.

Per the voting rules, winners don’t need to come from playoff-bound teams, yet this topic always surfaces during the MVP discussion. Postseason certainly factored in when Miggy beat out Mike Trout two years in a row, starting in 2012. See that playoff-berth bump in 2012 on the graph below? Yeah, that’s Mike Trout. What the model doesn’t consider, however, are the storylines, the character, pre-season expectations: all the details that are difficult for a bot to quantify. For example, I’ve seen a couple of arguments for Paul Goldschmidt as the front-runner to win NL MVP after leading a Diamondbacks team with low expectations to the playoffs. I’ll admit, sometimes the storylines matter, and in a year with such a close NL MVP race, it could push any one player to the top.

What can I say about AVG and HR? AVG is a useless stat by itself when it comes to assessing player value, but it’s ingrained in everyone’s mind. It’s the one stat everyone knows. Hasn’t everyone used the analogy about batting .300 at least once? Home runs…they are sexy. Let’s leave it at that.  Seems like these are always on the minds of MVP voters and that is not likely to change any time soon.

I’m sure some of you are already thinking, “What about pitchers!?” Don’t worry, I haven’t forgotten — although it seems MVP voters have. Only three SP and three RP have won the MVP award since 1974, and pitchers account for only about 7.5% of all top-5 finishers. As you can see in the factor-weight graph below, their sparsity in the historical data results in little influence on the model; voter opinions don’t change often, and their raw weights tend to be lower than position players. Overall, it seems as though wins continue to dominate the SP discussion, along with ERA and team success. While I would expect saves to have some influence, voters tend to be swayed by recency bias and clutch performance along with WHIP and WAR.

What would an MVP article be without a prediction? Using the model geared to predict the winners, here are your 2017 MLB MVPs:

AL MVP: Jose Altuve    Runner Up: Aaron Judge

NL MVP: Joey Votto   Runner Up: Charlie Blackmon

Here are the results from the model tuned to return the best top-3 and top-5 finisher order:

It’s apparent that I adjusted rate and counting stats for league and not park effects given both Rockies place in the top 2. Certainly, if voters are sensitive to park effects, Stanton and Turner get big bumps, and Rockies players likely don’t have a chance. Larry Walker was the only Colorado player to win the MVP since their inception in 1993, but in a close 2017 race it might make the difference.

Continue reading below for the complete methodology and checkout the code on github.

A previous version of this article was published at sharpestats.com.


Statistical Adjustments

Note: lgStat = league (AL/NL) average for that stat, qStat = league average for qualified players, none of the adjusted stats are park adjusted

There were two different adjustments needed for position player rate stats and count stats.

Rate stat adjustment:  AVG+ =  AVG/lgAVG  

Count stats: HR, R, RBI

Count stat adjustment:  HR Above Average =  PA*(HR/PA – lgHR/PA)

There were three different adjustments needed for starting pitcher (SP) and relief pitcher (RP) rate stats and count stats.

Rate stats: ERA, WHIP

Rate stat adjustment:  ERA+ =  ERA/lgERA  

Count stats I: K

Count stat I adjustment:  K Above Average =  IP*(K/IP – lgK/IP)

Count stats II: Wins (W), Saves (SV)

Count stat II adjustment:  Wins Above Average = GS*(W/GS – qW/GS)


Fused Lasso Linear Program

I combined two different approaches to create a model I thought would work best for the purpose of predicting winners and illustrating change in voter opinions over time. Stephen Ockerman and Matthew Nabity’s approach to predicting Cy Young winners was the inspiration for my framework for scoring and ordering players. A players score is the dot product of the weights (consideration by the voters) and the player’s stats.

The constraints in the optimization require the scores of the first place player to be higher than the second place, and so on and so on. This approach, however, doesn’t allow for violation of constraints. I add an error term for violation of these constraints, and minimize the amount by which they are violated.

Instead of constraining the weights to sum to 1, I applied concepts from Robert Tibshirani’s fused lasso which simultaneously apply shrinkage penalties to the absolute value of weights themselves as well as the difference between weights for the same stat in consecutive years. This accomplishes two things: 1) it helps perform variable selection on statistics within years helping combat collinearity between some performance statistics, and 2) it ensures that weights don’t change too quickly overreacting to a single vote in one year.

However, this approach and formulation cannot be solved by traditional linear optimization methods since absolute value functions are non-linear. The optimization can be reformulated as follows:

To select the lambda parameters, I trained the model using the first 10 seasons of scaled data increasing the training set by 1 season each time and tested with the subsequent year’s vote.After in season statistical adjustments, I scaled the stats by mean and standard deviation of training data to enable comparison across coefficients. All position player stats were replaced with 0 for pitchers and vice versa.

References:

1. Ockerman, Stephen and Nabity, Matthew (2014) “Predicting the Cy Young Award Winner,” PURE Insights: Vol. 3, Article 9.

2. R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society Series B, 67(1):91–108, 2005.

 


What if a Team Bullpens an Entire Season?

We saw the Yankees basically bullpen the AL wild-card game. Sure, it was on accident, but their bullpen pitched 8.2 innings. And they did it well. This made me think about whether a team could put together a pitching staff that is almost completely used for bullpenning for the entire season.

To see if this would be possible, we will look at the Yankees since they are the team most closely equipped for it already. In the wild-card game, they essentially used four relief pitchers (let’s not count the one out Luis Severino had). Chad Green, David Robertson, Tommy Kahnle, and Aroldis Chapman combined for 8.2 innings and one earned run. Clearly, if a team could do this all the time, they would. In that game they did not use other relievers Dellin Betances and Adam Warren, as well as regular starting pitchers Jordan Montgomery and Jaime Garcia, who would have been available that night.

Since we now know what happened in that bullpen game, can we find out if it is possible to do it over a full season? First off, and MLB roster is comprised of 25 men for any given game and an additional 15 that can be called up if needed. An AL team can get by with 12 position players: one for every starting position (including DH) plus a fourth outfielder, utility infielder, and backup catcher.  Let’s say a team’s backups can field multiple positions, like many can. We can get rid of the everyday DH and use one of the backups or starters in that role for a needed day off. That leaves us with 11 position players and room for 14 pitchers.

Many of the Yankees’ own relievers can go multiple innings. Among those pitchers are Chad Green, David Robertson, Tommy Kahnle, Adam Warren, and occasionally Aroldis Chapman and Dellin Betances. Each are effective in their own right. The problem we have to face is the amount of rest needed for these pitchers. The four from the wild-card game each pitched with two days of rest, so we’ll set that as a bench mark. I also don’t want to assume a team needs five pitchers each game like they did in the wild card.

I don’t want to completely get rid of the starting pitcher. It would be dumb to just throw away what Luis Severino and other starters bring to that team. Instead, I want to put a hard limit on how much they pitch each game and how often they pitch. Theoretically, a team could go with a three-game cycle of pitchers. Games are played almost every day during the season, so the two days of rest benchmark will be used here. If we are using four pitchers per game every three games, we need 12 pitchers.

Game 1 Game 2 Game 3
L. Severino M. Tanaka S. Gray
C. Green A. Warren D. Robertson
T. Kahnle D. Betances C. Shreve
A. Chapman J. Holder G. Gallegos

I didn’t make this with any set reason, just the best options the Yankees would have in my view. There are many other options available for them and some may be even better. But, if this is the set of pitchers being used, that leaves two extra spots for our 14 available pitchers. Those two extra spots can be utilized for guys needed for extra innings that can pitch multiple innings, or a guy needed for an inning or two in case one of the above gets into trouble.

If a team were to go by this set of pitchers, the regular starting pitchers would be throwing 162 innings over a season. That would be seen as pretty normal for a starting pitcher over the course of a season and in some cases much less. Severino pitched 193 innings himself. The relievers, however, would see a pretty big bump in action. They would pitch 108 innings in a season, more than any of the pitchers above did last year. However, some of those pitchers were starters to begin their careers. Green, Warren, Betances, and Holder have each pitched more than 108 innings in a season. Now, that could be a reason for their increased effectiveness as relievers, but they would still only be pitching two innings in a game, not five or six.

It is possible to ask these relievers to stretch their arms out to be able to throw that many innings in a season. Relievers do transition to starting and this wouldn’t be quite the workload necessary. If a pitcher needs a break during a cycle through this set of pitchers, that could be what the additional two pitchers on the roster are for, or some of the 40-man pitchers could be called up to give a guy a break. They could also call up an actual starter from the minors to take over for four or five innings after the three-inning “starter” in this example. My point here is that if the relievers get tired over the course of a season, there are ways to give them breaks. Plus, the Yankees have so many resources and available pitchers that they have that capability to give breaks.

If the Yankees wanted to, they could keep Severino, Tanaka, Gray, Green, Warren, Robertson, Kahnle, Betances, and Chapman all on the roster for the whole season. That makes up 3/4 of the necessary pitchers. Shreve, Holder, and Gallegos could each be cycled up and down from AAA with other pitchers like Ben Heller, Domingo German, etc. in order to give breaks to the core nine pitchers. Another solution is to go out and get more relievers who can pitch multiple innings on a regular basis. They certainly have the prospects to do that. Pitchers like Brad Hand, Yusmeiro Petit, and Mike Minor each pitched over 77 innings and were very effective doing so.

Clearly there is much more that would be needed to make this a reality, and I don’t have the resources to know if it is even possible. Maybe these guys simply couldn’t pitch that many innings over a full season or they would lose too much velocity of break on their pitches from fatigue. But I saw David Robertson pitch 3.1 masterful innings in the wild-card game and pitch another 1.2 innings three days later. Obviously that is only two outings, but he was nevertheless effective in doing it, and I believe if any team could make this happen, it would be the Yankees.


The Effect of Rest Days on Starting Pitcher Performance

Since the dawn of baseball, fans and coaches alike have debated whether or not pitch count and days of rest affect a pitcher’s health status and performance. This ongoing discussion has led to a close examination of how to best manage the health status of a pitcher. Should you give your starting pitcher that extra day of rest or can you pitch him in the big game today? The question of how to manage your starting pitcher can make or break a season, and, therefore, certainly merits the amount of attention and debate it has received.

Major League Baseball’s adjustment to the age of big data has reshaped the way in which we view these age-old debates. Nowadays, there are public databases that allow hobbyists and students of the game to query their own data and investigate their own theories. Baseball Savant and Baseball Reference are the two main public databases in use, and are the two databases that will be utilized for this study. The data being queried is rest days and runs scored per inning pitched for starting pitchers in Major League Baseball in the last five full seasons.

Problem Definition

In this study, I will look at the effect that the number of days of rest has on the performance and health of a starting pitcher in Major League Baseball. More specifically, I will investigate whether or not fewer rest days are correlated with poor performance and poor health status. Not only does this study have the potential to save millions of dollars for the baseball industry, but it could also provide starting pitchers with more knowledge on how rest days between starts affects their health and performance. The predictor “Runs Scored per Inning Pitched” will be evaluated to determine performance. Although there is a significant amount of noise (i.e. many factors contribute to the outcome) in the runs scored predictor, it seems like the best way to determine a pitcher’s performance on a game-by-game basis. Ultimately, the number of runs scored is the difference between winning and losing, and therefore should be the main criteria used to judge the performance of a starting pitcher.

Results

I determined that there is a significant difference between a pitcher’s performances on a specific number of rest days versus the others. However, there is no significant difference in starting a pitcher on “short rest” (1-3 days) versus “normal rest” (4-6 days) versus “extended rest” (7+ days).

This is an extremely important result considering that starting pitchers are usually employed on three, four, or five days of rest. Currently, starting pitchers are believed to perform at the highest level without the added possibility of injury with this amount of “normal rest.” However, this study shows that there is no significant difference in starting your pitcher on short rest vs. normal rest vs. extended rest. While there is a correlation in the specific number of rest days and performance of a pitcher, there is no significant difference in starting your pitcher on short rest vs. normal rest vs. extended rest.

This study shows that each of those extra off days could not only make a significant difference in pitching performance but also could make a difference in health status for pitchers. There is a fine line between getting the most out of your starting pitcher, and overusing him.

Data Analysis and Tests

In order to determine if there is a significant difference between runs scored per inning pitched and the number of rest days, a non-parametric ANOVA test is needed. The results are as follows:

Reject Ho at alpha=. 05, the Runs Scored per Inning Pitched rate is significantly different for at least one of the number of days of rest. The number of runs scored per inning pitched is significantly different for at least one of the numbers of rest days.

However, we want to know if having your starting pitcher pitch on “short rest” is significantly different than having your starting pitcher on “normal rest.” In order to do this, the data was split into number of days of rest 1-3 and days of rest 4-6. Zero days of rest was eliminated, as these numbers typically only apply to relief pitchers. Then, a non-parametric rank sum test was conducted to determine if performance on “short rest” is significantly different than performance on “normal rest.” The results are as follows:

Do not reject Ho at alpha=. 05, the Runs Scored per Inning Pitched rate is not significantly different for “short rest” and “normal rest.” There is no significant difference in performance between pitchers on “short rest” and “normal rest.”

Last, “extended rest” was looked at to determine if runs scored per inning pitched was significantly different than “short rest” and “normal rest.” “Extended rest” includes all rest days of 7 and over. The results are as follows:

Do not reject Ho at alpha=. 05, the Runs Scored per Inning Pitched rate is not significantly different for short rest, normal rest, and extended rest. Therefore, there is no significant difference in performance between short rest, normal rest, and extended rest.

Recommendations

The first recommendation I would make would be to look at pitchers coming off the disabled list and starting. Starting pitchers can definitely be skipped in a rotation when a team has an off day. This causes there to be much more time between starts.

If possible, data that tracks rest time between pitcher’s starts up to the hour as a continuous variable would be ideal. This could provide more insight into the effect of rest on performance of starting pitchers, and it would provide more of a continuous variable for analysis instead of treating all rest days equally.

Another recommendation for the study would be to use a different predictor for performance. Finding a public database that included days of rest data for each start was tough, and finding one that had days of rest data for each start along with the predictors that were sought after was even tougher. Ideally, an advanced statistic like FIP or weighted On-Base Average would be used, but these predictors are very difficult to calculate for over 1300 data points.

As long as there are starting rotations in baseball, the question of how off-days affect the performance and health of starting pitchers will be studied. Another potential study would be to look at the pitch count of starting pitchers. This could have a similar effect as rest days when looking at performance. With the recommendations made in this study, a future study to determine if performance is affected by pitch count and days of rest would be extremely beneficial.


Dryness in Paradise: On Humidors in Spring Training

Spring-training games in the Cactus League are a unique joy, especially for baseball fans (like me) who hail from colder climes. Unlike the Grapefruit League, which features stadiums separated by hundreds of miles of humid Florida air, the Cactus League consists of a compact cluster of stadiums bathed in sunshine and desert-dry air. Spectators and players alike can enjoy the spring conditions (and for some, including myself and Carson Cistulli, Barrio Queen guacamole and sangria) in the Valley of the Sun for weeks before teams return to their home stadiums across the country in late March.

Figure 0: Your author enjoying the 82-degree sunshine (and probably a juicy IPA, not pictured) at Hohokam Stadium, March 2017

Some teams will return to relatively warm and dry climates (Arizona Diamondbacks, who have to trudge the 20 freeway miles to Chase Park), but others will return to retractable domes (Seattle Mariners) or cold conditions where snowed-out games are certainly not out of the question (Cleveland). Given that the point of spring training is to get players ready for 81 games at their home ballpark, are two months of baseball in dry, sunny paradise the best way to prepare players for opening day at home? Short of building exact climate-controlled replicas of Kauffman Stadium and Wrigley Field in the Phoenix Metro, how could teams better prepare their players for the start of the season at their own home ballpark? Enter an unlikely hero, the great “Rocky Mountain equalizer”: the humidor.

Figure 1: Climatology of Phoenix, AZ (Feb-Mar) and the home locations (ICAO Airport codes) of the 15 Cactus League teams (Apr-May)

Just by eyeballing the graphs in Figure 1, without wading into the different lines and the specific airports (some lines switch to larger airports with RH), no stadium’s meteorological conditions are close to those in the Phoenix area. With the exception of the Rangers, no team plays in a stadium with an average May high temperature greater than the average March high temperature in Arizona. And only the “high desert” of Colorado comes close in RH to the dry air in Arizona March. Clearly, the opening day meteorological conditions will be significantly different from those Cactus League players see during spring training (Figure 2).

Figure 2: Changes in climate between April (major airport nearest home stadium) and March (PHX), with larger markers indicating larger temperature differences (dotted markers indicate increased T) and blue markers indicating more humid conditions (orange being drier)

This drastic change in temperature and humidity (Figure 2) is likely to have a major impact on how the ball plays once teams leave Arizona. Like many baseball physics researchers before me, I will once again heavily rely on the work previously done by Dr. Alan Nathan to inform my physical exploration herein. As shown in Nathan, et al. (2011), the two crucial meteorological factors of temperature (T) and relative humidity (RH) have a strong impact on both aerodynamic factors (such as drag) AND contact factors (such as coefficient of restitution, COR) that determine how far a batted ball travels. Rather than run afoul of the copyright of the American Journal of Physics by reproducing the figures here, I highly encourage you to check out Figures 2-4 in Nathan, et al. (2011) to see these relationships.

Equation Block 1: Calculating the effect of COR changes on “effective” exit velocity of a batted ball

The eternally relevant Baseball Trajectory Calculator developed by Alan Nathan has the ability to adjust aerodynamic factors associated with stadium altitude, barometric pressure, temperature, and relative humidity. Combined with the equations from Block 1 above, the changes in COR as a result of meteorological changes can be simply approximated in the Nathan Calculator as a manual change in the rebound (exit) velocity of the ball off the bat.

Great, simply smash aerodynamic and COR changes together and we’re in business, right? Well, almost…it seems every baseball physics article could have all the baseball-specific details stripped out and what would remain is a meditation on linearity and covariance. This example is no different. While we might expect meteorologically-induced aerodynamic and contact factors to vary independently, in real on-the-field situations, balls will be affected by not only their current conditions but also their recent history of past conditions. Absent experimental data on the time scale of such internal ball changes, we can still get a general sense of what could happen when multiple changes overlap. Let’s dive into some colorful 3-D contour plots of results using the default batted ball parameters of the Trajectory Calculator (100 mph pitch, 100 mph exit velocity, 30 degree launch angle) and see what happens!

Figure 3: Effects of meteorological T and RH on fly ball distance, including COR effects equal to ambient conditions (as if balls were kept in the same conditions)

 

We aren’t too far afield from the basic variables one can change in the Nathan Calculator, so the results from Figure 3 aren’t terribly surprising. Baseballs travel further through warm and dry air. In addition, dry/warm baseballs are bouncier than cold/wet baseballs. It’s unlikely that equipment managers are keeping baseballs outside, so they probably aren’t going to actually experience changes in COR associated with extreme conditions due to the time necessary for water vapor to diffuse into the guts of the baseballs and soften them. But absent a sense of how equipment managers store baseballs, let’s explore the possible impact that a spring training humidor could have.

Figure 4: Effects of humidor-like T and RH on fly-ball distance, with aerodynamic effects equal to PHX March average but COR changing with humidor conditions

Figure 4 shows what would happen if we changed the internal ball T and RH but continued to play in the average Phoenix-area meteorological conditions in March. The weakness of the temperature effect compared to the strength of the humidity effect can be predicted with the slope of each experiment in Nathan, et al. (2011). It’s unlikely, though, that T and RH both have, when combined, a linear effect on COR. For example, it’s unclear whether this linear model captures the hot/wet and cold/dry combinations correctly. This indicates the need to inspect the covarying relationship between T and RH on COR (and therefore, fly-ball distance) more deeply than the simple linear combination I used in this model.

Table 1: Monthly climate, elevation, default fly ball distance using the Nathan Calculator and monthly climate, and scale factors for conversion of March fly ball distance (at PHX) to April fly ball distance (at home).

With the data from Figures 3-4, we can figure out an appropriate scaling factor (Table 1) to translate the dimensions of each team’s spring training stadium and compare them to the dimensions of their home stadium (Figure 5).

Figure 5: Surprise Stadium (KC) and Scottsdale Stadium (SF) scaled to April climatology in KC and SF (no humidor)

After comparing the “effective dimensions” of the Cactus League stadiums to the home stadiums of each team, one can’t help but wonder if the teams had a hand in the way the stadiums in Arizona were constructed. Some teams, such as the Royals, share a stadium with another team (Texas Rangers); therefore, this clearly can’t explain all of the similarities between stadium shapes.

Figure 5 shows that in Arizona during the month of March, the spring training stadiums play much “smaller” compared to other stadiums than their physical dimensions might indicate. By slightly lowering the COR of the ball by using a humidor, teams could cause their spring training stadiums to play with effective dimensions approximately equal to those of their home stadiums. If the Royals were to store their spring training baseballs in a humidor at approximately 70% RH, the differences between the distance up the lines (longer at Surprise than Kauffman) and the distance to straightaway center (shorter at Surprise than Kauffman) would yield around the same “effective surface area” of the scaled outfield.

This analysis, much like my earlier piece on fly-ball precession, neglects many physical variables that would impact the actual games being played. In this example, I have neglected the effects of wind and day-to-day changes in barometric pressure. Prevailing winds due to stadium orientation and location would make this experiment much more realistic. For variations in pressure due to synoptic weather systems (cold fronts, warm fronts, etc.), however, “averages” over an entire month inform us less in terms of the baseline environments of each stadium than monthly averages of temperature and relative humidity. The model also assumes that the balls are essentially stored in temperatures and humidities equal to the ambient conditions in the home stadiums; equipment managers likely store them in some indoor location, but it’s unclear whether they are treated to the exquisite RH control seen with the humidor at Coors Field. Such confounding factors will be explored in future follow-ups to this piece.

In addition to physical assumptions made here, it’s quite possible that baseball operations departments in teams have goals in spring training other than closely approximating the hitting conditions in their home stadiums. But if they want to see who will have power that plays well in their home stadium, the humble humidor could play a key role in moderating the enhanced fly-ball distance that comes naturally with the warm, dry spring air of paradise (Cactus League baseball, that is).


Can Wobble Rob(ble) Hitters? Fly Ball Distance and Baseball Precession

In the chase to break the story of the “smoking gun” behind the recent surge in MLB home runs, many a gallon of digital ink hath been spilt exploring possible modifications to the MLB balls, home-run-optimized swing paths, and even climate change. In my field of Earth Science (atmospheric chemistry, to be more exact), it’s rare that a trend in observations can be easily attributed to a single causal factor. Air quality in a city is driven by emissions of pollutants, wind conditions, humidity, solar radiation, and more; this typically leads to a jumble of coupled differential equations, each with a different capacity to impact overall air quality. To my untrained eye, agnostic to the contents of the confidential research commissioned by MLB and others, this problem is no different: a complex mixture of factors, some compounding each other and some canceling others, is likely fueling the recent home-run spike.

This article will examine the potential for a change in the MLB ball minimally explored thus far: reduction of precession due to decreased internal mass anisotropy. What a mouth full! “Precession” and “anisotropy” don’t have the same ring as “juiced ball” or “seam height” (though they may be on par with “coefficient of restitution”). But these words can be replaced with a more familiar (though funny-sounding) word: wobble. This wobble can occur for many reasons, but the most probable explanation in baseball is that the internal baseball guts are slightly shifted from the center of the ball. This could be due to manufacturing imperfection, or in the course of a game, contact-induced deformation of the ball.

Precession, in general, occurs when the rotational axis of an object changes its own orientation, whether due to an external torque (such as gravity) or due to changes in the moment of inertia of the rotating object (torque-free). Consider a spinning top: the top spins about its own axis (symmetrically spinning about the “stem” of the top) while the rotational axis itself (as visualized by the movement of the stem) can trace out a coherent pattern. If imparted with the same initial “amount” of spin in different ways, the total angular momentum (from both rotation and precession) of the top will be the same whether it’s spinning straight-up or precessing (wobbling) in an elliptical path.

Figure 0: Perhaps the most hotly debated spinning top in the world

As with other potential explanations relating to a physical change in the ball, a change in mass distribution could have occurred unintentionally due to routine improvements in manufacturing processes. By getting the center of mass (approximately, the cork core of the baseball) closer to the exact geometric center of the ball, backspin originally “lost” to precession (in the form of wobble-inducing sidespin) could remain as backspin while conserving total angular momentum; increased backspin has been shown to increase the “carry” of a fly ball, therefore increasing the distance (potentially extending warning-track shots over the fence). A deeper discussion of angular momentum can be found in any mechanics textbook or online resource (such as MIT OCW handouts), but the key takeaway when considering a particular batted fly ball is that productive backspin gets converted to non-productive precession (roughly approximated as sidespin in one axis) when mass is not isotropically (uniformly from the center in every direction) distributed. This imparts a torque-free precession on the spinning ball, causing the rotational axis to trace out a coherent shape.

Precession in baseball has not been deeply studied; in fact, when explicitly mentioned in seminal baseball physics resources, it is noted as a potential factor that will be ignored to simplify the set of physical equations. Together, dear reader, we shall peek behind the anisotropic veil and explore how precession might impact fly-ball distance, and by extension, home-run rates.

***

For those of us with some experience throwing a football, even just in the park, we can picture the ideal “backyard Super Bowl” pass: a tight spiral that neatly falls into the outstretched hands of the intended receiver. The difficulty of executing such a perfect throw is evident in the number of nicknames for imperfect throws that wobble (precess) on their way up the field short of their intended target (see “throwing ducks” re: Peyton Manning). In football, the wobbly precession of a ball in flight is typically blamed on the passer or credited to a defender for deflecting it (or in some cases, allegedly, a camera fly wire). It’s not as easy to imagine such behavior in baseball: even in slow-motion video shots of fly balls, the net spin of the ball is dominated by backspin. In addition, the nearly-spherical shape of a spinning baseball has significantly different aerodynamics than the tapered ellipsoid used in football. However, even a small amount of precession has the potential to shave yards off the distance of a football pass; therefore, impacts of precession are certainly worth exploring in the game of baseball.

As a sometimes-teacher (I have taught two laboratory classes at MIT), I strongly believe in the power of simple physical models to qualitatively inform trends in the not-so-simple real world. Therefore, for the first step of exploring the effect of ball precession in the game of baseball, I have turned to the wonderful Trajectory Calculator developed by Dr. Alan Nathan. The Calculator numerically solves the trajectory of a batted ball by computing key physical properties in discrete time steps. While many physical attributes of the ball are calculated in the various colored fields, any of them can be overwritten with custom values.

Figure 1: Fly Ball Distance with Nathan Trajectory Calculator defaults, conversion of backspin to sidespin

In Figure 1, I use the Trajectory Calculator to explore the effect of sidespin conversion on a single fly ball with the same initial contact conditions as the default (100mph exit velocity, 30-degree launch angle, default meteorological conditions), with the total spin set to 240 radians per second. Backspin is not converted to sidespin in a one-to-one fashion: because of the Pythagorean relationship between these factors, total spin is equal to the square root of the sum of the squares of sidespin and backspin. Therefore, to conserve angular momentum, a 10% reduction in backspin (216 rad/s) yields 104.6 rad/s of sidespin, which together lead to a ~1% decrease in fly ball distance from 385.3 ft to 381.3 ft.

With all of the assumptions made here, notably that introduction of precession can be simulated as pure conversion to sidespin to conserve angular momentum, the effect of precession on the flight path is clear but rather modest in this simple approach. However, the Calculator results show that by reducing the “wobble” in a ball’s trajectory, it will carry further. A league-wide reduction in precession would mean that balls would, on average, travel further, leading to an uptick in home runs. If decreased precession would also decrease the effective drag the ball experiences in flight, the effect of increased fly-ball distance could be even further enhanced.

A more realistic exploration of precession will require further modification to the modeling tools at hand. Following Brancazio (1987), which studied the effects of precession on the trajectory of a football, and additional follow-on work, a precession-only physical model can be developed to explore more complex aspects of the problem posed here. Elements of this precession-only model can be fed back into the Nathan Trajectory Calculator, but without a full understanding of some unconstrained physical constants and mechanical aspects of the pitch-contact-trajectory sequence, a tidy figure in the style of Figure 1 will be difficult to produce.

Again, as I mentioned above, I find simple models to be effective tools for teaching concepts. Therefore, let’s consider a “perfect” baseball to be a completely uniform, isotropic sphere, as in Figure 2. This perfect ball is axially symmetric and should not have any precession in its trajectory due to changes in its moment of inertia (I). Now, let’s add a small “spot mass” (that doesn’t add roughness to the surface) on the surface of the ball along the axis of rotation corresponding to pure backspin (the x-axis here). This ball with a spot mass should approximately represent an otherwise-perfect sphere whose center of mass is slightly shifted in the x-direction.

Figure 2: (A) real baseball, (B) perfect sphere, (C) sphere with a point mass at the surface, and (D) sphere with slightly offset center of mass approximately equivalent to (C)

If the model ball has a mass m1 that is isotropically distributed through the entire sphere, and a point mass with mass m2 that is located on the surface along the x-axis, the moment of inertia can be calculated in each direction, summing the contributions from the bulk mass m1 and the point mass m2 (Figure 3).

Figure 3: Moments of inertia for isotropic ball (mass m1) with a point mass (m2) at the surface

Of course, the mass of a real baseball isn’t isotropically distributed, and there is no such thing as a “point mass” in reality; however, by exploring different combinations of m1 and m2 that sum to to mass of an actual MLB baseball (5.125 oz, as used in the Nathan Trajectory Calculator), the ball can be distorted in a controlled manner to explore the effects on precession and fly-ball distance.  Using a set of equations derived from Brancazio (1987) Equation #7, the initial backspin of a ball (omega_x0) can be calculated given an initial total spin (omega), the variable B (the “spin-to-wobble” ratio indicating the number of revolutions about the x-axis per precession-induced “wobble”, a function of the moments of inertia I_x and I_yz), and the angle of precession (built into the variable C, with theta being the angle between the x-axis and the vector of angular momentum when precessing, similar to the angle between a table and the “stem” of a spinning top).

Equation Block 1: Derivations from Brancazio (1987) used in a simple model of baseball precession

The limitation of this approach is that in order to explore the theta-m2 phase space, we must prescribe a priori an angle theta at which the precession occurs. By instead solving for theta from equation 5 above (Figure 4), we can get a sense of the possible values for theta by prescribing the fraction of omega that is converted to precession (the variable A, a mixture of omega_y and omega_z, also called “effective sidespin”).

Figure 4: Contour plot of theta (degrees) with respect to ranges of m2 and variable A (effective total sidespin)

Figure 4 shows that angles between 0 and 6 degrees are reasonable for the conditions explored using the approach from Brancazio (1987) as translated to baseball. So let’s turn to equation 6, using a range of angles from 0 to 6 degrees, to explore the effects of precession on backspin omega_x (Figure 5).

Figure 5: Contour plots of backspin (omega_x) and effective sidespin (variable A) with respect to m2 (as % of m) and theta (degrees)

Great, the effect of a point mass along the x-axis of the ball can be quantified in this model! The effect is modest, but has the potential to slightly decrease the distance of an identically struck isotropic ball. But there is one major limitation to the model as currently shown: when the angle theta is chosen a priori, there is no capacity of the model to correct to a more physically stable angle. In fact, along the entire x-axis of the plots in Figure 5, where m2 = 0, the ball should be completely isotropic and therefore no precession would occur; a small initial theta would likely be damped out over a small number of time steps. In addition, the contours of constant omega_x in Figure 5a curve in the opposite sense than might be expected: increasing m2 should lead to more pronounced procession. On the other hand, this very simple model does not take into account the possible effects of torque-induced precession caused by gravity (extending the effect of mass anisotropy alone), nor does it account for additional drag impacting a precessing ball. More study is needed to further elucidate the possibility of precession having a considerable impact on fly-ball distance; however, unlike the sometimes-empty calls for “further exploration” of minimally promising leads in academic journal articles, I intend to execute such investigation.

All of these limitations are inherent in the fact that, without outside data to constrain the physics of precession as it applies to baseball, the problem we are trying to solve with this simple model is an ill-posed problem in which there is not a unique solution for a given set of initial conditions. Luckily for us, we live in the Statcast age where position, velocity, and spin of the baseball are all continuously measured (if not fully publicly available). In addition to benefits gained from Statcast data, this problem can also be further constrained by experimental data on MLB balls. Finally, an opportunity to put my skills as an experiment-first, computational-modeling-second scientist, to use! Stay tuned to these pages for follow-up experiments and data analysis in this vein.

The conspiratorial allure of an intentional ball modification directly induced by Commissioner Rob Manfred is visible on online comment sections far and wide; however, many of the most credible explanations for ball changes are benign in Commissioner intent and perhaps attendant with improvements in ball-manufacturing processes. In any case, there are likely multiple facets to the current home-run surge. Ball trajectory effects due to precession have traditionally been ignored to simplify the problem at hand; this initial exploration shows that due to the difficulty of the problem, that was likely a good trade-off given the data available in the past. In the future, however, past work in diverse areas from planetary dynamics to mechanics of other sports can be used alongside new and emerging data streams to help determine the impact of precession on fly-ball distance.

 

Python code used to generate Figures 4-5 can be found at https://github.com/mcclellm/baseball-fg

Special thanks to Prof. Peko Hosoi (MIT) and Dr. Alan Nathan for providing feedback on early versions of this idea, which was born on a scrap of paper at Saberseminar 2017.


Jonathan Lucroy, the Rockie, Is Baseball’s Best Contact Hitter

It’s no secret that Jonathan Lucroy is having a subpar season.

The two-time NL All Star was projected to be a top-three catcher in 2017.  Before the start of the season, Steamer pegged his value at 3.6 wins above replacement, while ZiPS had him at 3.2.  His .242/.297/.338 line and 66 wRC+ in 306 plate appearances as a member of the Texas Rangers produced 0.2 WAR.  No one really expected that.

Lucroy was eventually traded to the Colorado Rockies.  The Rockies, who had the worst catching tandem in baseball, instantly viewed Lucroy as an upgrade, while many other playoff-bound teams would have viewed him as a liability.  With the hitter-friendly environment of Coors Field and poor pitching staffs among the San Francisco Giants and San Diego Padres, the team figured that Lucroy would return to his All-Star form once again.  Although he has not returned to being the power threat that he once was, he has changed his game ever so slightly, such that he might have become the game’s best-hitting catcher.

His basic stat line is not reflective of his plate discipline as a member of the Rockies.  His slash line has gone back up to near his career average (.279/.384/.377), but what is most impressive about him is his actual hitting ability.  Always a good contact hitter, he has changed his game to be more selective, get more contact, and put the ball in play.  His 92 percent contact percentage ranks first in baseball since the trade, and his 88 percent contact percentage of pitches outside the strike zone also ranks first.  The result: a high walk rate (12.3 percent) and fewer swinging strikeouts (6.3 percent of plate appearances resulting in a strikeout).  All of this while swinging at fewer pitches outside the strike zone (18.6 percent) and fewer swings in general (38 percent).  You may be asking “Why isn’t he leading the league in hitting with numbers like that?”  Well, the answer is rather simple.

While he is making more contact than anyone in baseball, most of the balls in play are hit to the defense.  This season, he is hitting more ground balls than ever before.  As a Rockie, 50 percent of the balls he has hit in play have been ground balls, well above his career average of 42.8 percent.  As a result, he has hit fewer fly balls (28.7 percent) which has led to fewer home runs (3.2 percent HR/FB).  This explains his lack of power this year.

He has hit the ball in the wrong place more this season than any other.  For his career, Lucroy has had a tendency to drive the ball up the middle — that has not changed much this season — but this season he has hit the ball softer than in any previous season.  His average exit velocity (85.0 miles per hour) is more in line with middle infielders and outfielders than catchers.  In fact, he has the fourth-slowest average exit velocity among all qualified catchers.  His average exit velocity last season was 87.6 miles per hour, and it was 88.6 in 2015.  Without the wheels of a speedy outfielder or infielder capable of beating out a ground ball (or at the very least forcing the defense to rush the throw), a ground ball for Lucroy is as good as an out.  Just as the saying “baseball is a game of inches,” it’s a game of miles per hour, too.

Fewer ground balls are going through the holes in the infield, and fewer ground balls are becoming hits.  His batting average of balls in play as a Rockie is similar to his career average (.308 as a Rockie and .306 for his career), but his RBBIP — percentage of balls in play that go for a hit or an error — is .318.  While it is above league average, it is well below his RBBIP numbers of both his All Star seasons and 2012, when he hit .320.  Has Lucroy been entirely unlucky with his balls in play?  No; pitchers have pitched to him largely down and away, which has resulted in a horrible contact percentage on those pitches, and he has also regressed slightly in every season since 2015.  But if Lucroy can keep his contact percentage up, hit fewer ground balls, and stay selective at the plate, then he could be one of the best-hitting catchers in the game again.