Author Archive

Help With the Physics Behind PITCHf/x

I’ve been digging into the PITCHf/x data over the past few weeks and stumbled across something I can’t quite figure out. When I first started using the data, I didn’t realize that px and pz were where PITCHf/x is mapping the final location of the ball; undeterred I set out to Google to jog my memory on the basics physics formulae that can map time using initial velocity, final velocity, distance and constant acceleration.

Step 1 was to calculate final velocity for every pitch from -50 feet to 0 feet. This was a simple formula that is SQRT(vy0^2-2*50*ay). Initial velocity squared less acceleration * yo2 * distance. Based on y0 being 50 feet from home plate.

Step 2 was to calculate time based on initial velocity and final velocity. I cross-checked my numbers to using the Start_Speed and End_Speed (which don’t match up to to vy0 for some reason) and got basically the same number.

Step 3 was to calculate xFinal based on Time, ax and x0 (ditto for zFinal). Strangely, my zFinal was a little lower (about .17 feet) than the PITCHf/x pz value and .015 more to the right than the px value. That might mean that they are measuring z and x 50 feet from release point, rather than at home plate.

I need to know if (a) my math is wrong (b) pz and px are wrong (c) ax and az are wrong.

Any help would be appreciated!


Give Me a Rise

It is well established that having more rise on your four-seam fastball is a good thing. The question then becomes, can we identify the optimal amount of rise as compared to the league-average fastball. For the purposes of this analysis, we will look at swinging-strike rate, from all four-seam fastballs thrown since the dawn of the PITCHf/x era, in regular-season action.

We in the sabermetrically-inclined community tend to pooh-pooh popular baseball concepts, particularly ones where the science, on the surface, doesn’t appear to jive with the age-old baseball wisdom. Don’t worry, this is not a DIPS discussion, nor a discussion on a pitcher’s ability to manage contact. I bring up this concept in relation to the term “late life” as in movement later in the pitches trajectory. Physics tell us that the ball will have a very predictable trajectory from the moment the ball leaves the pitchers hand, until it reaches the front of the plate. That, however, is merely half the story. There are two important points I want to bring up:

  1. Batters cannot compute vertical trajectory explicitly; they essentially tap into a huge vault of experience telling them how far a pitch will drop based on their experience with pitches of similar velocity.
  2. A hitter’s swing is largely ballistic (very difficult to change mid-swing) and takes about 0.18 seconds to execute. That means that a hitter has roughly 0.2 seconds post-release of the ball to gather information and form an educated guess as to where the ball will end up.

Based on these assumptions, I computed late movement, in both the vertical direction and horizontal direction. I then compared this to the expected vertical movement based on the velocity (more velocity, less drop obviously). This to me is the optimal way to look at movement, since presumably they cannot gather any more information. A great hitter may be able to factor in their knowledge of the pitcher’s ability to rise the fastball, but they are fighting their memories of all the other fastballs they’ve seen, so more difficult than you would think.

Which brings us to a very interesting graph: The height and colours in the histogram reflect the magnitude of the swinging-strike rates, shown in sequential order of velocity. If you scroll all the way to the bottom, you’ll see that the center of the histogram is somewhere around -.6, or 0.6 feet more rise than the average four-seam fastball when looking at the pitch 0.2 seconds after release until it crosses home plate.

We see a very clear normal curve, with more “normal” at higher n. Thus we can now compute the value of rise in a four-seam fastball, as distributed by a normal curve centered around 0.6 feet above the mean drop. Not really a stats guy, so not sure how to do that exactly. What I find interesting is that the 7 inches or so of rise is pretty consistent across the velocity spectrum. I’m not sure why it peaks at this point, though I would surmise that it’s probably the sweet spot where the hitter feels like they can make contact, but can’t, as opposed to extreme rise which would freeze the hitter.

This leads us to our last graph (warning: this one scrolls for a while). You’ll see the same graph as above, but you’ll see Whiff%, GB% and HR% stacked one on top of the other.

This actually paints a very intuitive picture. If there is more rise than average, you’ll get swinging strikes. If it drops more than average, you’ll get groundballs and if it drops about what you’d expect, you’ll get some groundballs, but also homers. Ignore the SSS noise with homers at the higher velocities. Again what is interesting with the GB% and Whiff% histograms are how consistent they are irrespective of velocity. So… if velocity doesn’t impact this analysis, let’s collapse it all into one final graph:

Paints a very clear picture: if your four-seam fastball isn’t getting at least 5 inches of late rise, you are going to be giving up a lot of homers. Note that swing% (swings/total pitches) is normally distributed around a mean of .2 feet of rise and appears to track pretty closely to HR%, implying that hard contact is not affected within 1 standard deviation.

Looking forward to the feedback.


Vertical Command – Or Lack Thereof

I read a great book by Mike Stadler called the Psychology of Baseball. In it he referenced that it is far more difficult for humans to control where a ball ends up vertically (due to the need for advanced spatial reasoning) compared to horizontally. You can find his discussion starting on page 86. Amazon Link

I’m going to show you three pictures which will illustrate this quite well. Data is inclusive of all pitches thrown in regular season games since 2010. The first is a heat map of sorts which maps vertical distance from the center of the zone (from PITCHf/x data sz_top and sz_bottom) on the y axis and velocity on the x axis. What we see quite clearly is that it is *much* better to throw a four-seam fastball up in the zone than down in the zone, almost irrespective of velocity. In fact, a 92 MPH four-seam fastball thrown 0.8 feet above the center of the zone will get about 13% swings and misses; a 98 mph four-seam fastball thrown below the center of the zone will get 12% swings and misses. Behold the graph, from a fan:

Four Seam Fastball, Depth x Velocity
Four-Seam Fastball, Depth x Velocity

The question then becomes, if a pitcher throws the ball up in the zone, how will the probability of a HR change? This brings us to picture #2, where we have the same x and y axes (apparently that’s the plural of axis, thanks google), but instead we have HR% (# of HRs/Total Pitches). I’ve removed 99+ MPHs from the graph as they were displaying SSS noise.

HR% by Depth and Velocity
HR% by Depth and Velocity

So interestingly, if you look at the totals on the right, it paints a visual that HRs are NOT hit on high fastballs, but rather on fastballs closer to the heart of the zone (vertically). In fact (and a story for another day) there is a 97% R-squared correlation between distance from the center of the zone and HR%. On an aside, this also reproduces other research which indicate that faster fastballs yield fewer home runs. The trend is also quite linear (don’t have a computed R2 for that, but that’s old news anyway).

Now, if you are far more likely to get a swinging strike and you aren’t putting yourself at risk for a home run by throwing up in the zone, if we looked at a distribution of four-seam fastballs, we should see a higher proportion of four-seamers up in the zone, ideally right at the top 0.8 to 1.0 feet above the zone, where whiffs are plentiful and HRs are scarce. Beware SSS in some of the higher velocities, but note that a 95 MPH fastball only .4 feet above the center of the zone will yield more HRs than an 88 MPH fastball thrown at the top of the zone (the 95 MPH fastball will still yield more whiffs, but just goes to show how important command is). This is what we actually see:

A nearly uniform distribution across all velocities, slightly skewed to below the center of the zone. I’m not ready to conclude that pitchers are not capable of pitching up in the zone with four-seam fastballs, it may just be old school “pitch down in the zone” thinking. I still find it astonishing how consistent the data is across the velocity spectrum. It almost appears to me that if a pitcher can simply pitch higher in the zone with a four-seam fastball, they can make their stuff play up a lot, sort of like MadBum:

Still not pitching at the top end of the zone, but definitely skewed higher, with his distribution centered around .3 feet above the heart of the zone.


GB% by Pitch Type and Location

Red = High GB% rate (ground balls / total pitches)
Yellow = Medium ; Green = Low

The size of the circle also represents the magnitude.

Numbers are in Feet, with -X being inside (handedness neutral) and Z being height in feet above the center of the strike zone (as per PITCHf/x strike zone top and bottom). The X is flipped for left handed batters. After I’ve published a few of these, I’ll work on publishing a version to Tableau Public, though not sure how it will perform given the huge underlying data set.

Some observations:

1) The cutter, which appeared to have two hot zones for swings and misses, appears to have only one hot zone for groundballs, of about .5 feet to 1 foot below the center of the zone and between .4 feet away and .4 feet in from the center of the plate. In the previous post we saw that as you went farther away from the plate horizontally and about .5 foot lower, you get swinging strikes.

2) Changeups down and away get groundballs. They also get swings and misses. Groundbreaking stuff here…

3) Two-seamers and sinkers have a very large area that get groundballs (another shocker), though what surprises me is how high it starts (almost at the center of the plate). It makes me wonder if I need to double-check my methodology. As you get lower in the zone, you get fewer swings and more takes, so the GB% goes down dramatically.

4) Curveballs only get groundballs if they are in the strike zone when crossing the plate (down and away). If you bury it, you basically trade the GB for a swing and a miss. I’m thinking I need to rebuild this chart with fewer grids, but a bunch of pie charts, to somehow visualize how results morph based on location.

Finally figured out how to get PITCHf/x data into Tableau (used Alteryx to scrape MLB) — having lots of fun and appreciate the feedback!


How to Get a Swinging Strike by Pitch Type and Location

Red = High swinging-strike rate (swing and a miss / total pitches)
Yellow = Medium ; Green = Low

The size of the circle also represents how high the whiff rate is

Numbers are in Feet, with -X being inside (handedness neutral) and Z being height in feet above the center of the strike zone (as per PITCHf/x strike zone top and bottom)

Some observations (and probably repetition of prior research):

1) Four-seam fastballs are great between 0.8 to 1.4 feet above the middle of the zone and between -.5 and .5 across the plate (i.e., if you want to get a swing and a miss on a four-seamer, throw it high and right down the middle). Will have similar views for GB% and HR% soon.

2) Sliders, changeups and curveballs all need to be thrown low in the zone; doesn’t appear to matter inside or outside, though changeups need to be around the plate (or they don’t get swings).

3) There is almost nowhere you can throw a two-seamer to get swings and misses, though down and in and basically high appear to be the best places to get strikes.

 

More to come if you think this is interesting!


Towards an Inside Edge Runs Saved

There is a treasure trove of data sitting on FanGraphs which to my (limited) knowledge is little used. These data are the Inside Edge fielding stats. We have UZR and DRS, but no IERS (Inside Edge Runs Saved), despite the general availability of the data.

UZR essentially guesses, based on batted-ball profile, what the probability is that a play will be made, which given the lack of true batted-ball data will take time to stabilize. Inside Edge has the benefit of stacking each play in a probability bucket. Here is a list of the probabilities by POS, Probability Bucket and Year that we have IE data:

I then took each player’s stats and based on their position and year, computed the expected number of plays they should have made and compared that to the actual number of plays they had made. In other words, a RF in 2014 should make 86% of his plays in the 60-90% range, so if he had 100 plays there and made 92, he made 6 extra plays. Here’s what 2015 Top 30 looks like in that lens:

Note that IE seems to like Arenado and Longoria a lot more than DRS, however the list is pretty consistent with DRS, esp with Simmons and Hechavarria in the 2/3 spots. I didn’t control for team bias in the results, so it may be favouring certain teams (Jays players seem to be getting a large boost, see Martin, Revere, Pillar, Tulo and Donaldson all on the top 30). Go Jays Go! Yankees Suck!

The next step is a little less mathematical, in that I attempt to ascribe an average run saved based on position. Based on linear weights, a single is worth roughly .5, a double roughly .75 and a triple roughly 1. Thus, a catcher and pitcher can save at most .5 runs each play they make. Second basemen and shortstops will save .5 on most plays, but will get a bump when they convert a double play. A third baseman will prevent some doubles as well as convert some double plays. Outfielders will be preventing some mix of singles, doubles and triples (and the occasional home run). So, based just on my gut feelings on the matter, I ascribed the following run values to each position:

C/P: .5 Runs Saved

1B/2B/SS: .6 Runs Saved

3B: .65 Runs Saved

OF: .75 Runs Saved

Based on these values (estimated runs saved), these are the top fielders (catchers excluded) from 2012-2015 and 2015, respectively:

And the Worst (2012-2015 followed by 2015):

 


A Theory and A Challenge

I love this site. It covers the full spectrum of baseball, from classical scouting all the way to the most esoteric of baseball analysis. At times I envy the analytical abilities of our writers, as well as their access to granular data, that I likely lack the technical competence to gather. Today, I would like to propose a a theory, as well as a challenge to the numerous writers on this site to put the theory to the test. It is also likely that this has been proposed before and answered before, in which case, point me in that direction please.

THE THEORY:

We can measure command by compiling a pitcher’s xISO and xBABIP based solely on where they locate their pitches, in the context of the hitter’s preference to location. In other words, the ability to “pitch to the corners” is only valuable if one is pitching to corners that the hitter can’t get to, which is batter-specific. An 80-command pitcher will be able to maximize the xISO of his pitches, simply by pitching to “cold” areas of the hitter’s strike zone.

There are a few of ways to approach this (I’m sure more than three, but I digress). The first question is what sample size to use to estimate the player’s preference within the strike zone? Evidence suggest certain players make rapid adjustments (Trout) which would indicate a SSS would be ideal, whereas other players exhibit strong long-term tendencies (Dozier? just a guess, not founded in data) that would indicate a LSS would be ideal.

The second axis would be to evaluate a player’s effective strike zone, i.e. if we looked at the hitter’s swing probabilities, what type of strike zone would we construct, given only data concerning the hitter’s propensity to swing. We could then tease out whether the pitcher is maximizing the player’s effective strike zone (pitchers only throwing balls to Vladdy Guerrero comes to mind). This analysis may be redundant, as this can probably be captured if we are able to incorporate the third axis:

What are the thresholds for considering a pitch well-located? I.e. if a pitcher throws a ball way outside, but the hitter swings, then this is a well-placed pitch, thus at what probability of swing% is a ball a well-commanded pitch?

THE CHALLENGE

Test it! (or show me where this has already been fully fleshed out.) I’ve always wondered if there was a way to build up a command ERA to see if a pitcher is able to put it where hitters have to swing but don’t want to and I look forward to reading about it.