FIP, xFIP, SIERA are all very good ERA estimators, and their predictability is well documented. It is well known that SIERA is the best ERA estimator over samples that occur from season to season, followed very close by xFIP, with FIP lagging behind. FIP is best at showing actual performance though, because is uses all real events (K, BB, HR). Skill is commonly best attributed to either xFIP or SIERA. ERA is also well known to be the worst metric at predicting future performance, unless the sample size is very large <500IP with the pitcher remaining in the same or a very similar pitching environment.
FIP, xFIP, and SIERA are supposed to be Defense Independent Metrics, and they are. Well, they are independent of field defense, but there is one small error in the claim of defense independent. K’s and BB’s are not completely independent of defense. Catcher pitch framing plays a role in K’s and BB’s. Catchers can be good or bad at changing balls into strikes and this affects K’s and BB’s. Umpire randomness and umpire bias also play a role in K’s and BB’s. It is unknown how much of getting umpires to call more strikes is a skill for a pitcher or not. Some pitchers are consistent at getting more strike calls (Buehrle, Janssen) or less strike calls (Dickey, Delabar), but for most pitchers it is very random (especially in small sample sizes). For example Jason Grilli was in the top 5% in 2013 but was in bottom 10% in 2012.
I wanted to come up with another ERA estimator that eliminates catcher framing, umpire randomness and bias, and eliminates defense. I took the sample of pitchers who have pitched at least 200IP since 2008 (N=410) and analyze how different statistics that meet this criteria affect ERA-. I used ERA- since it takes out park factors and adjusts for the changes in the league from year to year. I looked at the plate discipline pitchf/x numbers (O-Swing, Z-Swing, O-Contact, Z-Contact, Swing, Contact, Zone, SwStr), the six different results based off plate discipline (zone or o-zone, swing or looking, contact or miss for ZSC%, ZSM%, ZL%, OSC%, OSM%, OL%), and batted ball profiles (GB%, LD%, FB%, IFFB%). *Please note that all plate discipline data is PitchF/X data, not the the other plate discipline on FanGraphs, this is important as the values differ*
The stats with very little to absolutely no correlation (R^2<0.01) were: Z-Swing%, Zone%, OSC%, ZSC%, ZL% (was a bit surprised as this would/should be looking strike%), GB%, and FB%. These guys are obviously a no-no to include in my estimator.
The stats with little correlation (R^2<0.1) were: Swing%, LD%, and IFFB%. I shouldn’t use these either.
O-Contact% (0.17), Z-Contact%, (.302), Contact% (.319), OSM% (0.206), and ZSM% (.248) are all obviously directly related to SwStr%. SwStr% had the highest correlation (.345) out of any of these stats. There is obviously no need to include all of the sub stats when I can just use SwStr%. SwStr% will be used in my metric.
OL% (0.105) is an obvious component of O-Swing% (0.192). O-Swing had the second highest correlation of the metrics (other than the components of SwStr%). I will use it as well. The theory behind using O-Swing% is that when the batter doesn’t swing it should almost always be a ball (which is bad), but when the batter swings, there are a two outcomes, a swing and miss (which is a for sure strike) or contact. Intuitively, you could say that contact on pitches outside the zone is not as harmful to pitchers as pitches inside the zone, as the batter should get worse contact. This is partially supported in the lower R^2 for O-Contact% to Z-Contact%. It is more harmful for a pitcher to have a batter make contact on a pitch in the zone, than a pitch out of the zone. This is why O-Swing is important and I will use it.
Using just SwStr% and O-Swing%, I came up with a formula to estimate (with the help of Excel) ERA-. I ran this formula through different samples and different tests, but it just didn’t come up with the results I was looking for. The standard deviation was way too small compared to the other estimators, and the root mean square error was just not good enough for predicting future ERA-.
I did not expect/want this estimator to be more predictive than xFIP or SIERA. This is because xFIP and SIERA have more environmental impacts in them that remain fairly constant. K% is always a better predictor of future K% than any xK% that you can come up with. Same with BB% Why? Probably because the environment of catcher framing, and umpire bias remain somewhat constant. Also (just speculation) pitchers who have good control can throw a pitch well out of the zone when they are ahead in the count, just to try and get the batter to swing or to “set-up” a pitch. They would get minus points for this from O-Swing, depending on how far the pitch is off the plate, but it may not affect their K% or BB% if they come back and still strike out the batter.
So I didn’t expect my statistic to be more predictive, but the standard deviation coupled with not that great of RMSE (was still better than ERA and FIP with a min of 40IP), caused me to be unhappy with my stat.
I then started to think about if there were any stats that were only dependent on the reaction between batter an pitcher that are skill based that FanGraphs does not have readily available? I started thinking about foul balls and wondered if foul ball rates were skill based and if they were related to ERA-. I then calculated the number of foul balls that each pitcher had induced. To find this I subtracted BIP (balls in play or FB+GB+LD+BU+IFFB) from contacts (Contact%*Swing%*Pitches). This gave me the number of fouls. I then calculated the rates of fouls/pitch and foul/contacts and compared these to ERA-. Foul/Contact or what I’m calling Foul%, had an R^2 of .239. That’s 2nd to only SwStr%. This got me excited, but I needed to know if Foul% is skill based and see what else it correlates with.
This article from 2008 gave me some insight into Foul%. Foul% correlates well to K% (obviously) and to BB% (negative relationship), since a foul is a strike. Foul% had some correlation to SwStr%, this is good as it means pitchers who are good at getting whiffs are also usually good at getting fouls. Foul% also had some correlation to FB% and GB%. The more fouls you give up, the more fly balls you give up (and less GB). This doesn’t matter however, as GB% and FB% had no correlation to ERA-. Foul% is also fairly repeatable year to year as evidenced in the article, so it is a skill. I will come up with a new estimator that includes Foul% as well.
I decided to use O-Looking% instead of O-Swing%, just to get a value that has a positive relationship to ERA (more O-looking means higher ERA), because SwStr% and O-Swing are negatively related. O-Looking is just the opposite of O-Swing and is calculated as (1 – O-Swing%).
The formula that Excel and I came up with is this: (I am calling the metric TIPS, for True Independent Pitching Skill)
TIPS = 6.5*O-Looking(PitchF/x)% – 9.5*SwStr% – 5.25*Foul% + C
C is a constant that changes from year to year to adjust to the ERA scale (to make an average TIPS = average ERA). For 2013 this constant was 2.68.
I converted this to TIPS- to better analyze the statistic. FIP, xFIP, and SIERA were also converted to FIP-, xFIP-, and SIERA-. I took all pitchers’ seasons from 2008-2013 to analyze. The sample varied in IP from 0.1 IP to 253 IP. I found the following season’s ERA- for each pitcher if they pitched more than 20 IP the next year and eliminated any huge outliers. Here were the results with no min IP. RMSE is root mean square error (smaller is better), AVG is the average difference (smaller is better), R^2 is self explanatory (larger is better), and SD is the standard deviation.
Wow TIPS- beats everyone! But why? Most likely because I have included small samples and TIPS- is based off per pitch, as opposed to per batter (SIERA) or per inning (xFIP and FIP). There are far more pitches than AB or IP so TIPS will stabilize very fast. Let’s eliminate small sample sizes and look again.
|Min 40 IP|
|Min 100 IP|
Now, TIPS is beaten out by xFIP and SIERA, but beats ERA and and is close to FIP (wins in RMSE, loses in R^2). This is what I expected, as I explained earlier K% and BB% are always better at predicting future K% and BB% and they are included in SIERA and xFIP. SIERA and xFIP take more concrete events (K, BB, GB) than TIPS. I didn’t want to beat these estimators, but instead wanted a estimator that is independent of everything except for pitcher-batter reaction.
TIPS won when there was no IP limit, so it obviously is the best to use in smaller sample sizes, but when is it better than xFIP and SIERA, and where does it start falling behind? I plotted the RMSE for my entire sample at each IP. Theoretically these should be an inverse relationship. After 150 IP it gets a bit iffy, as most of my sample is less than 100 IP. I’m more interested in IP under 100 anyhow.
Orange is TIPS, Blue is ERA, Red is FIP, Green is xFIP, and Purple is SIERA. If you can’t see xFIP, it’s because it is directly underneath SIERA (they are almost identical). This is roughly what the graph should look like to 100 IP:
Looking at the graph, at what IPs is TIPS better than predicting future ERA than xFIP and SIERA? It appears to be from 0 IP to around 70 IP.
Here is the graph for 1/RMSE (higher R^2). Higher number is better. This is the most accurate graph as the relationship should be inverse.
The 70-80 IP mark is clear here as well.
I’m not suggesting my estimator is better than xFIP or SIERA, it isn’t in samples over 75 IP, but I think it is, and can be, a very powerful tool. Most bullpen pitchers stay under 75 IP in a season. This means that my unnamed estimator would be very useful for bullpen arms in predicting future ERA. I also believe and feel that my estimator is a very good indicator of the raw skill of a pitcher. It would probably be even more predictive if we had robo-umps that eliminated umpire bias and randomness and pitch framing.
2013 TIPS Leaders with 100+IP
And Leaders from 40IP to 100IP