Archive for Research

The Evolution of Stealing Bases at the College Level

Since the end of the BESR era, there was a downward trend of runs per game, home runs per game, and stolen bases per game in college baseball. After introducing flat-seam balls, home runs per game and runs per game have been on an upward trend. Both of these rule changes would seem to have no impact on stolen bases per game, and why would they? Analytics suggests that stealing bases is not worth the risk. I still believe there is value in stealing bases in today’s game, and the decline of it has hurt teams’ performance, especially squads that are at a disadvantage to Power 5 Conference teams. Programs such as Wright State, UCF, UCONN, and Campbell are able to stay competitive year after year by implementing the run game in their offense.

In 2018, 38 of the top 50 teams in stolen bases had a record above .500, while 38 of the bottom 50 teams in stolen bases have a record below .500. Out of the 35 non-Power 5 teams in the 2018 NCAA Tournament, 14 of those teams were ranked in the top 50 in stolen bases.

Read the rest of this entry »


Does Warm Weather Create Better Players?

My high-school-aged son sits at home yet again. Why? Because another of his baseball games has been canceled due to the wet and cold Ohio spring, and my thoughts turn again to our days playing baseball in Florida. Before we moved to this less-agreeable northern climate, it was a rarity to have a game canceled due to weather. Not only that, but games were scheduled year-round, which of course meant more baseball on the calendar. This situation reminded me of the familiar equation known to baseball fans:

Good weather leads to more playing.
More playing means better players.

But is this true? After all, it’s well-known that the best player in baseball, Mike Trout, is from cold-weather New Jersey. Many quickly point to the fact that California, Texas, and Florida are at the top of the list for states with the most MLB draftees, but they’re the three most populous states. Perhaps proportionally they don’t stack up to colder states after all.

I decided to look at the data from the last two drafts — 2017 and 2018 — to see if there is a relationship between a state’s average temperature and how well its players do in the draft. Do warmer-weather states really produce more MLB draftees than average?

To do this, I first gathered population data from each state to determine what percentage of the overall US population it contains. Then I did the same for each states’ MLB draft population. Finally, I compared those two figures and determined the percentage difference between their population proportion and their draft proportion. I call this figure the “Draft Difference”.

For example, let’s say State X makes up 10% of the US Population, but the State X’s draft class makes up only 8% of the overall class. Its Draft Difference is calculated as:

(Draft-Population)/Population = Draft Difference

In this case,

(8-10)/10 = -.20 = -20%

A state with 10% of the US population should, all things being equal, contribute 10% of all players in an MLB draft. But, in this case, State X did 20% worse than should be expected just from its population size. Read the rest of this entry »


Pitch Selection and the 3 Pitch Paths Tool

Pitch selection is like Cold War game theory.

The pitcher/catcher (battery) and the hitter are trying to balance a guessing game of what their counterpart is thinking with their own capabilities to develop a decision or expectation about the next pitch thrown.

The battery is trying to strike the delicate balance of a pitch that will result in a strike or an out (usually by being put into play) and give the hitter the least opportunity to get on base. The hitter is trying to anticipate that decision to maximize their ability to react successfully. This becomes circular, since the hitter’s ability to anticipate correctly improves their ability to get on-base, which changes the calculus and pitch decision for the battery, which changes the hitter’s ability to anticipate correctly. Just like the nuclear stand-off of the Cold War, a low-and-inside slider hit into the gap or a Soviet Sarmak from Siberia shot down by Star Wars lasers. Same thing, right?

Pitcher: I should throw this.

Hitter: I will anticipate this.

Pitcher: Then I should throw that.

But it’s not – because baseball is fun and the Cold War was humans (not) trying to murder each other by the millions. Instead let’s say pitch selection is just like keeping secrets from your Friends:

Given this stand-off of anticipation, the battery can take one of two approaches:

1.) Complete randomness, or…

2.) Sequencing pitches that build on each other to keep the hitter off balance.

This is the old pitching-coach speak of “changing the hitter’s eye level, keeping him on his heels, and mixing speeds.” Read the rest of this entry »


A Peek into the Astros’ Secret Sauce for Pitching

The Franklin Institute is a science and research museum located in Philadelphia, Pennsylvania. Among its many draws are a giant heart you can walk through, the SportsZone where you can sprint the 40-yard dash and compare your time to professional athletes, and a Changing Earth exhibit made entirely of sustainable materials that focuses on the ways the planet has transformed over time. Through all of that, plus rotating feature exhibits, it’s easy to lose sight of a tried and true experiment: The Ruler Drop Test.

If you never performed the experiment in middle school, the Ruler Drop Test is exactly as it sounds. Take a ruler — or, in the case of the Franklin Institute, a yardstick — and hold it vertically between your index finger and thumb on your dominant hand, about one-fourth from the bottom. Then release it and see where you can catch it. The shorter the distance between where you let go and where you catch it, the faster your reactions are. Science!

It’s a simple experiment, but it is illustrative. And with how it’s centered on vertical drop and expectations, it could help us understand how the Houston Astros have used advanced technology and data to tweak pitchers’ repertories to reach new levels of success. Read the rest of this entry »


Introducing WPA-Win: A Better Pitcher Decision Statistic

Baseball fans have seen it time and again: a starting pitcher will twirl a masterpiece, but because his team doesn’t score, he’ll be tagged with a loss. Or a reliever will come into a game, pitch to one or two batters, and end up with the win.

The vagaries of assigning wins and losses to pitchers are a well-known irritant to serious baseball fans (though perhaps not to old-timers like Bob Costas or John Smoltz). Here is the pitching decision statistic explained:

The winning pitcher is defined as the pitcher who last pitched prior to the half-inning when the winning team took the lead for the last time.

The losing pitcher is the pitcher who allows the go-ahead run to reach base for a lead that the winning team never relinquishes.

Often timing — particularly the timing of a team’s offense — affects the statistic more than a pitcher’s actual contribution to his team’s win or loss. In other words, the decision frequently fails to reflect which pitcher made the biggest difference for the winning team (or was most detrimental for the losing team). In these cases, it simply tags the pitcher lucky or unlucky enough to pitch at a certain time in the game.

In an effort to create a more accurate stat to reflect a pitcher’s contribution to his team’s win or loss, I’d like to propose new stats, which I’ll call the “WPA-Win” and “WPA-Loss.” Let’s start with the WPA-Win:

The “WPA-Win” is given to the pitcher on the winning team with the highest WPA for that game.

I’ll address how to calculate the “WPA-Loss” (which is more complicated) later in the article. For now, we’ll just assume it goes to the pitcher on the losing team with the lowest WPA. Read the rest of this entry »


You Wouldn’t Have Noticed If MLB Had Ties in 2018

There are a few articles, including one by Travis Sawchik, arguing that tie games might not be as bad for baseball as you think. The truth is that not only would ties have had no impact on who reached the postseason in 2018, but they would have shaved off four minutes from the average time time.

Using regular expression to parse box score data from RetroSheet, I’ve looked at how the 2018 season would’ve been different without extra innings. Here’s a look at the postseason standings as they were compared to how they would’ve looked with ties (scored 3 points for a W, 1 point for a T, and 0 for a L):

With ties, the 2018 postseason still has the same cast of characters, although the Dodgers and the Rockies would have swapped places in the NL West, causing the Dodgers to go to the Wild Card game.

That’s only looking at 2018. When examining the past five seasons, I found that the postseason implications of tie games would be pretty minimal.

In the plot below, each point represents one team’s season. The X-axis is the number of games that would end in ties and the Y-axis is the number of places a team would’ve moved in their division.

For simplicity, I’m defining postseason implications (PS Implications) as a team missing or making a Division No. 1 or Wild Card No. 1 or No. 2 with the scoring system described above.

Read the rest of this entry »


Shifting Expectation: Analysis of the Shift in 2018

The infield shift is a much-maligned defensive strategy, hounded as one of the worst analytics-based changes to baseball. Multiple times each season there will be some conversation about banning the shift, and each time pros, ex-players/managers, commentators, and analysts will chip in with their two cents. But for now, the shift is here, and it is as popular (with the fielding teams) as it has ever been. Just under 26% of all pitches were thrown with some form of infield shift in place in 2018, 22% of at-bats had a shift for the entirety of it, and 30% had at least one pitch shifted.

As you can see, left-handed hitters are far more likely to be shifted than their right-handed counterparts, with 46% of left-handed ABs seeing a shifted pitch versus 19% for righties. This makes rudimentary sense as the shifted players for a left-hander are closer to first base, so they have a greater chance of impacting the play to first and therefore stopping a potential single.

I have taken players who have 100-plus at-bats in 2018 both against a shifted and non-shifted infield, then I compared the outcomes. There were 132 such players, and their combined number of at-bats was 72,389 (39% of the seasons total). I have split these up into four categories based on the handedness of the batter and the pitcher.

Read the rest of this entry »


The Reds May Have Andrew Miller 2.0

Andrew Miller has an undeniably nasty slider. As a Red Sox fan, I remember it far too well from the 2016 postseason. Big Papi’s farewell tour didn’t seem all that fair when you consider the way the Red Sox ran into the buzz-saw that was Miller and the Cleveland Indians. Sure, I’m grateful for Miller helping the 2013 version of the Red Sox win a third world title since 2004, but come on Andrew, you had to ruin Papi’s goodbye?

With Miller’s recent signing with the St. Louis Cardinals, I found myself exploring his FanGraphs page. I stumbled upon this article, Andrew Miller on the Evolution of his Slider, and I instantly began to wonder if pitchers had similar experiences developing their sliders in the 2018 season. The first step in this analysis was to evaluate the evolution of Miller’s slider.

What jumps off the page is the change in velocity. Miller saw a 4.6 mph increase in his slider from 2011 to 2012, then another 3 mph added from 2012 to 2013. This in large part had to do with Miller moving from a starting role to a relief role during his time with the Red Sox. Given that information, however, an increase in velocity that drastic not only shows a pitcher’s willingness to adapt, but also a pitcher’s ability to adapt. By observing Miller’s slider splits, we see that ability to adapt almost immediately.

Read the rest of this entry »


Advocating For A Different Type of Swing Change

When Statcast was launched, we were graced with incredible new stats such as Exit Velocity and Launch Angle, which revolutionized how we evaluate hitting. This new information confirmed obvious things like that Giancarlo Stanton hits missiles, but it also gave us a new breed of hitter. Daniel Murphy, Justin Turner, J.D. Martinez, and others looked at the data and made adjustments that started maximizing their power outputs. The standard evaluation method has become to look at EVs mixed with LAs to determine who is one tweak away from stardom. Hitting is a complex beast, with pitchers throwing 95-plus with nasty hooks to go with shifting defenses. Ultimately, a hitter is looking to produce solid contact regardless of where the ball goes. The goal of this analysis is to identify hitters who have an inefficient spray chart and see how they could optimize their profile by hitting more balls in a different direction to maximize production. Luckily with Statcast, we can now try to find these answers.

To do this analysis, I used Baseball Savant to gather 2018 Exit Velocity and xwOBA to Pull Side, Straight Away, and Oppo Side for all hitters with at least 50 plate appearances. I then used FanGraphs to pull the 2018 data for Pull%, Mid%, and Oppo% to discern how often a hitter attacks that field. I used 50 PAs as a filter since this is about where exit velocities become stable and helps weed out pitchers and other noise. This does create gaps in the data because some players didn’t register 50 PAs of a batted-ball direction. This dataset gives us the ability to look at how hard a hitter hits the ball to a field, what was their expected damage (xwOBA) to that field, and how often they went that way.

The first category I looked at was players who could use the opposite field more often. To do this, I looked at players who had an above average Oppo Side xwOBA and a below-average Oppo%. I used exit velocities to each field as a proxy to justify the directional swing change. Read the rest of this entry »


Created Statistic: Run Value

With so many complex statistics out there, I wondered if there was an easier way to project winning percentage or runs, a way that is simple yet more complex than Bill James’ classic Pythagorean Win Expectancy. To create a statistic like that, I would have to create one comprehensive stat for offense and one for pitching. Ultimately, I came up with the following and named them “Run Value” and “Pitching Run Value,” respectively.

RVAL = ( ( TB + BB – SO )/4) + RBI + HR  

PRVAL = ( ( ( H + BB – SO )/4 ) + HR) x FIP

These two metrics are used for teams. In the batting RVal formula, the higher the better. I tried to get down to the pure number of runs that a player or team produces by using the very relaxed definition of a run being four bases. In the pitching PRVal formula, the lower the better. I did something very similar to the batting stat by trying to get the pure run total. I then put the two stats into the win expectancy formula:

RVALWinExp = RVal^1.83 / ( RVal^1.83 + PRVal^1.83)

I then ran a program in R to see how closely this stat correlates to actual team win percentage for all teams from the 1998 season through the 2018 season. In addition, I tested to see how Bill James’ win expectancy formula correlates to team win percentage over the same period of time. The results are below. Read the rest of this entry »