Combining Arsenal Scores and Stuff to Evaluate Pitcher Performance

Introduction

The Arsenal score is a metric which can examine how effective a pitch currently is, or how effective it could be. This metric is compiled from z-scores (a statistical measure of how far above, or below the mean a specific value is) of ground ball and swinging-strike rates (Sarris, 2016). Eno Sarris put this metric together to see which players might be on the verge of a breakout, should they figure out control issues, improve their fitness and last longer in games. Eno has used the Arsenal score to rank pitchers from the 2015 season, proposing that pitchers like Chad Bettis, Rich Hill, and Raisel Iglesias are on the verge of a breakout.

My colleague Dan and I built the Stuff metric for a couple of different reasons. The first, and yet to be completed, was to look at how a pitcher’s stuff could influence their risk of injury. The second was for a similar reason as to the development of the Arsenal score – how can we possibly find players who have electric “stuff”, yet are a mere tweak away from major-league success. The Stuff metric is developed in a similar fashion to the Arsenal score – we look at the z-scores of a pitcher’s velocity, change of velocity, velocity of breaking pitches, and amount of break (Sonne & Mulla, 2015). However, unlike the Arsenal score, we have no indication as to how these pitchers are influencing the hitter – if they are causing swings and misses, or if they are inducing ground balls. In a sense, this is a weakness of the Stuff metric compared to the Arsenal scores, but it could possibly be used sooner than the Arsenal score – as minor-league parks install PITCHf/x systems and other tools for measuring pitch movement and velocity. Using the Stuff metric, we’ve proposed possible 2016 breakout pitchers like Chris Bassitt and Mike Foltynewicz.

These two metrics try to get at similar answers, but go about it in a different manner. For this analysis, I wanted to see how these two metrics could be combined to predict pitcher success.

Methods

I used the Stuff metric calculated for 2015 pitchers (found here) and the Arsenal scores for pitchers in 2015 (found here). In both evaluations, a pitch had to be thrown 100 times to be eligible for further analysis. In total, 138 different pitchers were included in this analysis. To see how both new pitching metrics performed (Arsenal scores and Stuff), I calculated the R2 between the metric and ERA, xFIP, K/9, and WAR. These result values were obtained from FanGraphs. To see how the combined metrics worked to predict pitcher performance, I used a multiple regression analysis, and developed separate equations for each of the FanGraphs result values, using the sum of Arsenal scores and Stuff value as inputs.

For further analysis of the combined metric model, the difference between predicted values and actual values was calculated for ERA, xFIP, and K/9. This analysis did not include WAR, as to allow for equal comparison between players who played different numbers of games.

Results

Model Performance

In general, the Arsenal score was a better predictor of pitcher performance than Stuff. Arsenal scores had higher R2 values when predicting xFIP, WAR and K/9, with Stuff having a slightly higher R2 value for ERA (Table 1). The new combined model was a better predictor than either metric alone, with the greatest improvement seen for WAR (an 11% increase in explained variance compared to a single input variable).

The combined Arsenal-Stuff model performed the best when predicting xFIP (accounting for 46% of the variance in xFIP). Predicted vs. actual values can be found in figure 1 for all result variables.

Table 1. R2 values between the input variables of Stuff / Arsenal Score, and result values of ERA, K/9, WAR, and xFIP. R2 values are also presented for the combined model, which uses both Arsenal Score and Stuff as an input.

  ERA K9 WAR xFIP
Stuff 0.14 0.17 0.27 0.13
Sum Arsenal 0.12 0.37 0.33 0.44
Combined Model 0.19 0.41 0.44 0.46

stuff and arsenal

Figure 1. Relationships between predicted K/9, ERA, WAR, and xFIP and actual values. All predicted values are determined from a model that uses both Arsenal scores and the Stuff metric.

Player Identification

As a post-hoc analysis, I calculated the difference between predicted values and actual values. For ERA and xFIP, a lower value indicated the player’s predicted ERA or xFIP was lower than their actual results, which, could indicate that the player may perform better in 2016. A higher value may indicate that the pitcher may not have as favourable of results in 2016. The analysis is the opposite for K/9 – with higher values indicating that the pitcher should be expected to strike out more batters in 2016.

Table 2. The top 10 and bottom 10 predicted ERA errors. The top 10 represents pitchers who can be expected to have better results in 2016, with the bottom 10 predicted to perform with less success in 2016.

  Rank Pitcher ERA Difference Predicted ERA ERA Arsenal Score Stuff
Room for Improvement 1 Chris Capuano -0.80 4.44 7.97 0.19 -0.62
2 Bud Norris -0.74 3.85 6.72 1.15 0.81
3 Keyvius Sampson -0.67 3.92 6.54 0.11 0.89
4 Hector Noesi -0.61 4.28 6.89 -2.06 0.41
5 Carlos Carrasco -0.48 2.45 3.63 14.33 1.43
6 David Hale -0.47 4.15 6.09 2.36 -0.35
7 Archie Bradley -0.46 3.97 5.80 1.51 0.38
8 Matt Garza -0.45 3.88 5.63 -0.92 1.25
9 Matt Moore -0.38 3.92 5.43 0.90 0.66
10 Michael Lorenzen -0.38 3.90 5.40 -0.59 1.10
Due for Regression 121 Jerad Eickhoff 0.29 3.76 2.65 2.05 0.85
122 Josh Tomlin 0.31 4.36 3.02 0.90 -0.58
123 Jake Arrieta 0.31 2.56 1.77 7.22 2.95
124 Jaime Garcia 0.33 3.63 2.43 4.14 0.67
125 David Price 0.34 3.70 2.45 1.61 1.11
126 Dallas Keuchel 0.34 3.76 2.48 6.04 -0.19
127 Brandon Morrow 0.36 4.28 2.73 -1.89 0.37
128 John Lackey 0.38 4.46 2.77 -2.30 -0.04
129 Steven Matz 0.44 4.02 2.27 1.02 0.36
130 Zack Greinke 0.52 3.45 1.66 3.04 1.48

Table 3. The top 10 and bottom 10 predicted xFIP errors. The top 10 represents pitchers who can be expected to have better results in 2016, with the bottom 10 predicted to perform with less success in 2016.

  Rank Pitcher xFIP Difference Predicted xFIP xFIP Arsenal Score Stuff
Room for Improvement 1 Allen Webster -0.40 4.30 6.02 -0.95 -0.95
2 Archie Bradley -0.34 3.85 5.15 1.51 0.38
3 Henry Owens -0.33 3.77 5.01 1.93 0.62
4 Carlos Carrasco -0.32 2.02 2.66 14.33 1.43
5 Hector Noesi -0.30 4.33 5.61 -2.06 0.41
6 Jarred Cosart -0.25 3.57 4.46 3.15 0.99
7 Keyvius Sampson -0.24 3.99 4.97 0.11 0.89
8 Garrett Richards -0.24 3.06 3.80 6.44 1.69
9 Matt Moore -0.23 3.91 4.81 0.90 0.66
10 Chi Chi Gonzalez -0.21 4.36 5.26 -1.98 0.00
Due for Regression 121 Chris Sale 0.15 3.08 2.60 6.49 1.49
122 Joe Blanton 0.16 3.56 3.01 3.99 -0.15
123 Jose Quintana 0.16 4.18 3.51 -0.91 0.33
124 Dallas Keuchel 0.16 3.29 2.75 6.04 -0.19
125 Tyler Duffey 0.16 4.35 3.64 -2.35 0.56
126 Clay Buchholz 0.17 3.98 3.30 0.40 0.57
127 Brett Anderson 0.18 4.29 3.51 -2.10 0.92
128 Jose Fernandez 0.19 3.24 2.62 5.38 1.33
129 Michael Pineda 0.19 3.65 2.95 3.07 0.26
130 Stephen Strasburg 0.20 3.35 2.69 4.40 1.61

 

Table 4. The top 10 and bottom 10 predicted K/9 errors. The top 10 represents pitchers who can be expected to have better results in 2016, with the bottom 10 predicted to perform with less success in 2016.

  Rank Pitcher K9 Difference Predicted K9 K9 Arsenal Score Stuff
Room for Improvement 1 Tyler Wilson 0.52 6.76 3.25 -0.76 -0.55
2 Chi Chi Gonzalez 0.39 6.61 4.03 -1.98 0.00
3 Jose Urena 0.39 6.70 4.09 -1.99 0.24
4 Cody Anderson 0.38 7.01 4.34 -0.47 -0.12
5 Scott Feldman 0.36 7.91 5.07 1.52 0.71
6 Jarred Cosart 0.29 8.49 6.07 3.15 0.99
7 Aaron Sanchez 0.26 8.09 5.95 1.25 1.37
8 Archie Bradley 0.25 7.78 5.80 1.51 0.38
9 Kyle Ryan 0.25 6.39 4.79 -0.85 -1.42
10 Allen Webster 0.25 6.54 4.94 -0.95 -0.95
Due for Regression 121 Stephen Strasburg -0.20 9.10 10.96 4.40 1.61
122 Chris Archer -0.21 8.83 10.70 3.77 1.39
123 Tyler Duffey -0.22 6.72 8.22 -2.35 0.56
124 Chris Sale -0.22 9.66 11.82 6.49 1.49
125 Ian Kennedy -0.23 7.55 9.30 0.18 0.79
126 Vincent Velasquez -0.24 7.55 9.38 -0.11 1.00
127 Nate Karns -0.27 7.01 8.88 -1.35 0.54
128 Lance Lynn -0.28 6.70 8.57 -2.27 0.45
129 Drew Smyly -0.34 7.75 10.40 2.16 -0.17
130 John Lamb -0.62 6.49 10.51 -2.09 -0.24

Discussion

This new model which incorporates both the Stuff metric and the Arsenal score improves predictions of ERA, xFIP, K/9 and WAR. By combining both of these metrics, the new model incorporates both the action of a pitch, plus the ability of a pitcher to induce swings and misses and ground balls.

Examining the player rankings to determine which pitchers are both under-performing and over-performing based on the new model’s predictions, there are some interesting names that show up. Carlos Carrasco appears to be due for improvement based on ERA and xFIP. Matt Moore is slowly returning from injury, but could see improvements in 2016 based off of his Stuff and Arsenal scores.

While pitchers like Zack Greinke, David Price, and Dallas Keuchel appear on the list of pitchers who could see regression in 2016, this is more due to the fact that they had otherworldly, perhaps outlier seasons, than it is a commentary on them pitching above their ability. Zack Greinke has gone on the record saying that his 2015 season was an outlier, and “that he may not actually be that good (Rodgers, 2016)”.  For Blue Jays fans, it is exciting to see how Aaron Sanchez’s stuff predicts he will have a better K/9 next season – though it’s to be seen whether he will pitch as a starter or reliever.

This model, much like the previous evaluations of Stuff and Arsenal scores, does not factor in control, deception or pitch sequencing. While model performance is strong, there is room for improvement of greater than 50% of explained variance. Pitching is complicated, and to achieve better predictions, models will need to grow increasingly complicated.

Conclusion

The combined Stuff/Arsenal score model improves predictions of ERA, xFIP, K/9 and WAR over the individual metrics on their own. This model was used to identify possible candidates for improvement and regression in the 2016 season. Future work should include a variety of more complicated measures to account for control, deception and additional game factors.

References

Rogers, J., 2016.  Zack Greinke on furthering his 2015 domination: ‘I’m probably not that good’. Retrieved from:

http://www.sportingnews.com/mlb-news/4695603-zack-greinke-stats-diamondbacks-projection-cy-young-chances, on February 21, 2016.

Sarris, E., 2016. The Change: Arsenal Scores. Retrieved from: http://www.fangraphs.com/fantasy/the-change-arsenal-scores/, on February 2, 2016.

Sonne, M.W., and Mulla, D., 2015. Revisiting the “Stuff” Metric. Retrieved from http://www.mikesonne.ca/baseball/22/, on December 21, 2016.

Additional Information

Difference between predicted and actual values – all pitchers included in the analysis.





Ergonomist (CCPE) and Injury Prevention researcher. I like science and baseball - the order depends on the day. Twitter: @DrMikeSonne

Comments are closed.