2015 Fantasy Bust: Johnny Cueto

I was planning on covering several overvalued starting pitchers in this next article but after analyzing Reds ace Johnny Cueto, I realized I might have enough material to fill an encyclopedia. Read the rest of this entry »


The Effectiveness of the Speed and Movement of a Four-Seam Fastball

Introduction

A few weeks ago I posted a proposal for a regression analysis for an econometric class I am taking with the promise I would post the full analysis when it is complete. Well, its been completed and here is the full analysis, as promised. Its a lot of words so if you don’t care much for how a Probit model works or how to perform a t-test I will go ahead and tell my findinds now.  I found that the speed of a four-seam fastball does help determine the outcome of the pitch–the faster the pitch the lower quality of contact. I also found that movement of a four-seam fastball is statistically insignificant–a four-seam fastball can have zero movement and the outcome will be the same for that pitch. This could be because a four-seam fastball just doesn’t move that much relative to other pitches, I’m not sure though. Also, the model I created has very low goodness of fit measures, which means speed and movement of a four-seam fastball only play a small part in determining the outcome of the pitch. This makes sense: baseball is a complicated game and a lot of variables go into determining an outcome. Without adding even more words to this post below is the paper, in its entirety.

It could easily be said Major League Baseball is in an arms race. Teams have been putting a greater emphasis on finding and developing pitchers who can throw a baseball faster than their peers. Indeed, the average velocity of a fastball has increased every year from 2004 to 2013, with a slight downtick in 2014. From 1990 to 1999, 37 pitchers threw 25 percent or more of their fastballs at 95 MPH or faster; in 2013, 149 pitchers did so. From 2003 to 2008, seven pitchers threw a fastball 100 MPH or faster 20 or more times in a season; from 2009 to 2013, 38 pitchers did so. Teams are trying to find flame-throwers because they believe the faster a ball travels towards home plate, the harder it is for a hitter to make the type of contact resulting in a hit. On the other hand, other factors not emphasized, such as the amount of movement of a fastball may play a role. When a pitcher throws a fastball, it moves. Just as some pitchers can throw a fastball with more velocity, some pitchers can throw a fastball with more movement than others. The relationship between velocity and contact should be the same for movement—the more movement there is, the harder it is to make good contact.

Due to this assumed relationship between velocity, movement, and outcome, I would like to answer the following questions: is it more difficult to hit a fast-moving four-seam fastball than one moving more slowly? Also, is it more difficult to hit a four-seam fastball the more movement it has? Therefore, my hypothesis is twofold: A fast pitch will be more difficult to hit than a slower moving pitch, and the more movement a pitch has, the harder it will be to hit. If my hypothesis is true, more speed and more movement will make a pitch more difficult to hit. The ball from a specific pitch is difficult to hit if a batter swings his bat and fails to make contact with the ball, or the contact made is poor and results in the batter making a strike, if he swings and misses, or an out, if he puts the ball in play.

The body of this paper is organized into six categories: the economic model, the econometric model, the data, the procedures of estimation and inference, the empirical results, and the conclusion. The economic model section explains the composition of the independent variables, the dependent variables, and the error term. It also explains the assumptions as well as provides a general framework for the type of model required for the estimation. The econometric model lays out the functional form of the economic model by formalizing the variables and creating the equations; it also establishes a method to test the statistical significance of the independent variables. The data section explains how the data was gathered, any issues that had to be resolved, and any hesitations about the quality of the data. The procedures of estimation and inference section describes the tools, software, and the specific models chosen to derive the results, why they were chosen, and the characteristics of the model. The empirical results section reports the means of the independent variables, the discrete profile for the outcomes, the parameter estimates, interval estimates, the value of the test statistics, and the goodness of fit measures; it also puts the parameters into the equations. Finally, the conclusion section analyzes the implications from the empirical results and offers possible explanations for the results.

The Economic Model

Independent Variables

A pitcher can throw many types of pitches. The pitcher can try to deceive the batter by throwing a pitch that has a lot of movement, such as a curveball or slider, or a pitch that is slower than it looks like it will be when it leaves the pitcher’s hand, such as a change-up. But no pitcher tries to deceive a hitter when throwing the four-seam fastball. When a pitcher throws a four-seam fastball he is simply trying to throw it as hard and accurate as he can. And this is what teams are searching for—the maximum velocity of a pitcher’s four-seam fastball and the higher the velocity, the better. Even though a pitcher is not trying to induce movement when he throws a four-seam fastball, the ball still moves either horizontally or vertically, which can affect the outcome of the pitch, just as velocity can. This means there will be two independent variables: velocity, measured in MPH, and total movement, which is horizontal plus vertical movement, measured in inches.

Dependent Variables

The dependent variables will be all of the possible per-pitch outcomes that involve the batter attempting to hit the pitch by swinging his bat; this excludes pitches an umpire calls a strike or a ball. These two outcomes are excluded because the batter did not swing his bat, which means the speed or movement of the pitch having any effect on avoiding contact, or inducing poor contact, cannot be discerned.

In addition, because the outcomes are per-pitch, walks and strikeouts are excluded because those outcomes are already accounted for. More specifically, if the batter walks, then he did not swing at the pitch, and it is therefore excluded. If the batter strikes out by swinging and missing, which is accounted for with the swinging-strike outcome, or by being called out by the umpire, then it is excluded because the batter did not swing his bat.

The included outcomes are: swinging strike, foul ball, ground-out, pop-out, fly-out, line-out, single, double, triple, and home run. The difference between a pop-out and a fly-out is who catches the ball: if an infielder catches a ball in the air then it is a pop-out, if an outfielder catches a ball in the air then it is a fly-out. Many types of outs have been included because each type of out can indicate what type of contact was made. For example, if the contact was poor, then the result will either be a ground-out or a pop-out. If the contact was solid, but the batter still made an out, then the result will be a line-out or a fly-out. If the contact did not result in an out, then it will be assumed the contact was good.

From a pitchers perspective the most desirable outcomes are, from most to least desirable: swinging strike, pop-out, ground-out, fly-out, line-out, foul, single, double, triple, and home run. This ranking also reflects a continuous spectrum of contact from softest to hardest. An argument can be made that a swinging strike does not belong on the spectrum because no contact was made. But no contact is still a type of contact; it is the absence of contact, which is the lowest quality of contact and the lowest point on the contact spectrum.

Error Term

The error term will capture the sequencing of the previous pitches, the count, the base-out state, the location of the pitch, and the quality of the defense.

Each pitch will be context neutral; the pitches that preceded it will not be accounted for. This can affect the outcome of the pitch because the absolute speed of the pitch may not matter as much if the previous pitches that a batter has seen in an at bat have been much slower than the four-seam fastball.

The count of the at bat can affect the outcome of the pitch because batters know, in some counts, pitchers are more likely to throw a four-seam fastball. In this case, the batter may be anticipating the four-seam fastball, which will give the batter an advantage. The base-out state can affect the outcome of the pitch because it can dictate what pitch a pitcher is more likely to throw. The location can also affect the outcome of the pitch because some locations are more difficult for a batter to reach with his bat when he swings. Also, pitchers generally know there are certain locations where most hitters of a certain handedness have difficulty hitting a four-seam fastball if thrown in the particular location, and the location is less sensitive to speed and movement.

The quality of the defense can affect the outcome of the pitch as well because it can turn hits into outs, if the defense is good, or it can turn outs into hits, if the defense is poor. This can cause the ranking of outcomes to be less predictive of the type of contact made for each outcome. For example, a ground ball that gets past an infielder is a single. But the contact made was the type of contact consistent with the contact for a ground-out, not a single. Since a ground-out is ranked third and a single is ranked seventh, the difference in quality of contact between the two outcomes is substantial.

Estimation Methods

Since the dependent variable can take only one of ten possible values the relationship between the independent and dependent variable is not linear and the Ordinary Least Squares model would not be appropriate for our purposes. The best type of model to predict one of the possible outcomes for a pitch given an initial value of velocity and movement is a Limited Dependent Variable model. A Limited Dependent Variable model is used when the value of the dependent variable is restricted to a range of possible outcomes that can be ranked in a meaningful manner. The estimation of the relationship between the independent and dependent variable requires the method to take into account the restriction and ranking. This model was chosen because the range of possible outcomes is restricted and the values are discrete—each pitch can only result in one of ten possible outcomes—and the outcomes are ordered by their value to the pitcher. Also, the relationship between velocity, movement, and the outcome of the pitch requires the ranking of the outcomes to be accounted for because it is assumed velocity and movement influence the type of outcome.

Since the outcomes are also ranked by type of contact, an outcome occurs only if the contact for a particular outcome is greater than the contact required for the outcome located below it and less than the contact required for the outcome located above it. For example, if the contact made was greater than the contact required for a ground-out, but less than the contact required for a line-out, the outcome would most likely be a fly-out.

This type of reasoning implies interval estimates will need to be created for each outcome. Each interval estimate will have a lower limit and an upper limit; if the value the model calculates, given an initial value of velocity and movement, lies between the upper and lower limit, then the outcome the interval estimate represents will be the outcome to most likely occur.

The Econometric Model

Regression Equation   

Formalizing the independent variables, dependent variables, and error term results in the following equations:

Oi= β1 + β2*V + β3*M + ε                           (1)

Where ε ~ (0, σ2)                                         (2)

The right side of equation 1 contains the dependent variable, outcome, and the subscript i represents the type of outcome. The left side of equation 1 has two parts: a structural component and a random component. The structural component contains the independent variables where β1 is the intercept, β2 is the estimated coefficient for velocity, V is velocity in MPH, β3 is the estimated coefficient for movement, and M is horizontal movement plus vertical movement in inches. The random component is the error term, ε; it is the residual that cannot be explained by the variables in the model. The error term is assumed to have a standard normal distribution, which is indicated by equation 2.

Interval Estimates

If equation 1 is less than the lower limit of an outcome ranked two outcomes higher of the upper limit that equation 1 is greater than, the outcome is the one located between these two outcomes. This can be said in terms of quality of contact as well: if the quality of contact a particular amount of velocity and movement is likely to induce is less than the lower limit for the quality of contact required for an outcome located immediately above a particular outcome, but the quality of contact is greater than the upper limit for the quality of contact required for an outcome immediately below a particular outcome, the quality of contact results in the outcome located between the quality of contact required for the upper and lower limit of the particular outcomes. This means equation 1 can be used to create an interval estimate for a particular outcome:

LOf < β1 + β2*V + β3*M + ε < -LOc = Oi                (3)

LOf is the upper limit for outcome f and -LOc is the lower limit for outcome c. Outcome f’s quality of contact is located immediately above the maximum amount of contact required for outcome i and outcome c’s quality of contact is located immediately below the minimum amount of contact required for outcome i. With that being said, interval estimates can be created for all of the outcomes and can be written as:

 OSS if Oi > Lpo  (4)                                     
OPO if -Lss > Oi > Lgo (5)
OGO if -Lpo > Oi > Lfo (6)
OFO if -Lgo > Oi > Llo (7)
OLO if -Lfo > Oi > Lfl (8)
OFL if -Llo > Oi > Lsl (9)
OSG if -Lfl > Oi > Ldb (10)
ODB if -Lsl > Oi > Ltp (11)
OTP if -Ldb > Oi > Lhr (12)
OHR if -Ltp > Oi (13)

To make sense of equations 4 through 13, the outcomes have been assigned the following categorical values and subscripts in Table 1: Categorical Values & Subscripts

Outcome Value Subscript
Swinging Strike 10 SS
Pop Out 9 PO
Ground Out 8 GO
Fly Out 7 FO
Line Out 6 LO
Foul 5 FL
Single 4 SG
Double 3 DB
Triple 2 TP
Home Run 1 HR

Using equations 3, and 4 through 12, the interval estimates can be derived for each outcome, those equations are:

LPO < β1 + β2*V + β3*M + ε = OSS                             (13)

LGO < β1 + β2*V + β3*M + ε < -LSS = OPO             (14)

LFO < β1 + β2*V + β3*M + ε < -LPO = OGO             (15)

LLO < β1 + β2*V + β3*M + ε < -LGO = OFO             (16)

LFL < β1 + β2*V + β3*M + ε < -LFO = OLO              (17)

LSG < β1 + β2*V + β3*M + ε < -LLO = OFL             (18)

LDB < β1 + β2*V + β3*M + ε < -LFO = OSG             (19)

LTP < β1 + β2*V + β3*M + ε < -LSG = ODB             (20)

LHR < β1 + β2*V + β3*M + ε < -LDB = OTP             (21)

-LTP > β1 + β2*V + β3*M + ε= OHR                           (22)

  Hypothesis Testing

Once the estimates for the coefficients are reported, their level of significance can be tested. To do this a null and alternative hypothesis was created:

Ho: β2 = 0, β3 = 0                      (23)

H1: β2 ≠ 0, β3≠ 0                        (24)

Equation 23 is the null hypothesis and it states the coefficients for velocity and movement equal 0. This means if one of the coefficients is 0, the predicted outcome and quality of contact will not change. Equation 24 is the alternative hypothesis and it states the coefficients for velocity and movement is not equal to 0. This means the coefficients do influence the outcome and quality of contact. The next step in hypothesis testing is calculating a test statistic. Since the assumption is the error terms have a standard normal distribution and they are homoscedastic—all of the error terms have the same variance—the t-test will be used for the test statistic. The next step is to establish a rejection region. Because the alternative hypothesis is “not equal to” then a two-tail test needs to be used. This is done with the following equation:

t(α/2, N-3) < t < t(1-α/2, N-3)                           (25)

Where α is the critical value for the level of significance, N is the amount of observations, and N-3 is the degrees of freedom—3 is being subtracted because 3 degrees have been used by the coefficients and intercept. The rejection region has two regions: one located in the lower tail of the curve, the other located in the upper tail of the curve. The space to the left of t(α/2,N-3) is the lower tail and the space to the right of t(1-α/2, N-3) is the upper tail. Equation 25 states the null hypothesis can be rejected for two reasons: if t is greater than t(α/2, N-3), or if it t is less than t(1-α, N-3). If either of these is true, the null hypothesis is located beyond the critical value somewhere in one of the rejection regions, which means the null hypothesis can be rejected and the alternative hypothesis can be accepted. But, if both of the reasons needed to reject the null hypothesis are false, the null hypothesis is located before the critical value of both tails somewhere in the acceptance region, which means it cannot be rejected and the coefficient being tested could be 0—which is statistically insignificant.

Data

The data was collected from www.BaseballSavant.com. This website maintains the PITCH f/x database, which contains data on every pitch thrown from the 2008 to 2014 season, using high speed cameras located in every Major League ballpark. Since the data from 2008 to 2009 has some classification issues, those years are excluded from the data sets; thus the data sets are from seasons 2010 to 2014. Each data set has approximately 21,000 observations. Since there are five data sets, the total amount of observations is approximately 105,000.

The website allows for many types of filters to be used when searching for data, but the filters used for our purposes are pitch type, pitch result, batted ball result, and at-bat result. The filters for pitch result do not include the type of outcome resulting from the ball being put in play. To get those results the filters for at-bat result had to be used. This resulted in the inclusion of data that was supposed to be excluded. For example, if a four-seam fastball was thrown during an at-bat, but the batter did not swing, then it needs to be excluded, but if the at-bat ended with one of the selected at-bat filters then it was included in the data set. All lines of data containing this type of issue had to be removed from the data sets.

Also, the data on movement came in two components—horizontal movement and vertical movement. Some of the values for horizontal and vertical movement were negative and some were positive. Horizontal movement is positive if the pitch moves towards the right side of home plate, and negative if the pitch moves towards the left side of home plate from the catchers’ perspective. Vertical movement is positive if the pitch drops less than it would from gravity alone, and negative if the pitch drops more than it would from gravity alone. If a pitch had one type of movement that was positive and another type of movement that was negative, the two values would subtract from each other when adding them together and not properly reflect total movement. To prevent this from occurring, the absolute value was taken for each type of movement and then added together.

Since a Limited Dependent Variable model is being used, a new variable had to be created. This variable captures the ranking of each outcome by assigning a numerical value to each type of outcome. Since each outcome was ranked from least to most desirable from the perspective of the pitcher, the least desirable outcome, a home run, was assigned the value of one, and the most desirable outcome, a swinging strike, was assigned the value of ten. Also, a variable had to be created indicating the year from which the data originated. Since there are five years’ worth of data, the variable could take on one of five possible values—1 through 5. This was done because all of the data was combined when put into the program. Having a variable indicating year allowed for a dummy variable to be created in the program so different data sets could be created and regressions could be run on each data set, and then all the data sets combined.

Procedures of Estimation and Inference

The program used to run the regression was SAS, version 9.3. The procedure used to estimate the mean, standard deviation, and the minimum and maximum values for the independent variables was the MEANS procedure. The procedure used to estimate the intercept, coefficients, and interval estimates was the QLIM procedure. The QLIM procedure is a Limited Dependent Variable model, and can use either the Binary Probit or Logit model, or the Ordinal Probit or Logit model. The Binary Probit or Logit model is used when the dependent variable assumes only one of two values. Since the dependent variable has ten possible values, the Binary model was not appropriate for our purposes. The Ordinal Probit or Logit model allows for a dependent variable to assume more than two values and the values can be ranked in either ascending or descending order, which was most appropriate for our purposes. The difference between the Ordinal Probit and Ordinal Logit model is the Ordinal Logit model assumes the error term has a standard Logistic distribution, and the Ordinal Probit model assumes the error term has a standard Normal distribution. Error terms can be assumed to have a standard normal distribution if the dependent variable is influenced by an unobserved continuous variable and the possibilities for the unobserved continuous variable is infinite, even if the possibilities are bounded between a minimum and maximum value.

The outcome of a pitch can be thought of as a proxy for quality of contact—the softer the contact the better the outcome for the pitcher and vice versa. Even though the model has ten dependent categorical ordinal outcomes—which by definition means it is not continuous—it measures a single variable at a distance, which is quality of contact. Quality of contact can be thought of as being continuous: it is a spectrum of infinite possibilities bounded between two values—no contact and perfect contact. Even though perfect contact is a nebulous concept, it still acts as a boundary that cannot be surpassed. This means quality of contact meets the criteria for having error terms that have a standard normal distribution, which means the Ordinal Probit model is the model most appropriate for our purposes.

The purpose of the Ordinal Probit model is to estimate the probability an observation will fall into one of the categorical outcomes. The central idea behind the Ordinal Probit model is there is an unobserved continuous variable underlying the dependent variable, which influences the ordering of the dependent variable. The unobserved continuous variable is quality of contact, which is assumed to determine the outcome, and it is assumed velocity and movement of a pitch influence quality of contact.

The Ordinal Probit model creates upper and lower threshold values partitioning the continuous variable into a series of regions corresponding to one of the ordinal categories representing one of the regions along the continuous spectrum. These upper and lower thresholds create intervals; each interval corresponds to a range of contact required for a particular type of outcome. Quality of contact lies on a continuous spectrum of no contact to perfect. Each outcome occupies a region along the quality of contact spectrum. Each outcome has two threshold values: if the quality of contact worsens and passes an upper threshold quality of contact value of a particular outcome, the outcome will be the outcome ranked immediately below the outcome whose upper threshold quality of contact value was passed, this is a lower limit. If the quality of contact improves and passes the lower threshold quality of contact value of a particular outcome, the outcome will be the outcome ranked immediately above whose lower threshold quality of contact value was passed, this is an upper limit.

The Ordinal Probit model relaxes the constraint that the effect of the independent variables is constant across different predicted values of the dependent variable. The model assumes an S-shaped curve. In each tail section of the curve the dependent variable responds slowly to changes in the independent variables, and as it moves closer towards the middle of the curve, the dependent variable responds faster. This implies as the probability of a particular outcome occurring approaches .5, changes in velocity and movement cause relatively large changes in the probability of a particular outcome occurring. As the probability of a particular outcome occurring approaches 0 or 1, changes in velocity and movement induces relatively small changes in the probability of the particular outcome occurring.

This cascading effect of outcome-probability has intuition: if the probability of an outcome occurring approaches 0, the probability of the outcomes furthest away—either below its lower limit or above its upper limit depending on the type of contact—must be approaching 1. This means as the probability of a particular outcome decreases by a particular amount, the amount it decreases by is allocated disproportionally between the outcomes in a particular direction in descending order, with the outcome ranked immediately above or immediately below receiving the biggest increase in probability of occurrence, and the outcome furthest away probability of occurrence increasing the least, which is closest to 1. Another way to put it is, as velocity and movement changes, contact moves along its spectrum changing the probability of each of outcome occurring; some probabilities increase and some decrease. If the probability of an outcome decreases, the amount it decreases by increases the probability of the outcome located immediately below or above to increase the most, and the outcome located the furthest away to increase the least, with the probability of all the intermediate outcomes increasing or decreasing disproportionally with their distance from the origin.

For example, a home run and swinging strike are on opposite ends of the contact spectrum. If the probability of a home run occurring approaches 0, and the probability of a swinging strike occurring approaches 1, the amount of velocity and movement—and therefore contact—required for the two outcomes is substantially different because the probability of anything occurring in between must be approaching 0, but not at the rate in which the home run contact is approaching 0. As velocity and movement change towards the amount of velocity and movement required to induce the type contact resulting in a home run, then the probabilities of the outcomes located between swinging strike and home run will increase, with the probability of the outcome located immediately below swinging strike, pop-out, increasing the most, and the outcome located immediately below pop-out, ground-out, increasing the second most, and so on, with the probability of a home run occurring increasing the least. As velocity and movement continue to change and contact moves along its spectrum towards the type of contact required for a home run, the probabilities of each outcome change with the outcomes closest to a swinging strike increasing the most until, eventually, the allocation of probability is reversed and the probability of a home run occurring approaches 1 and the probability of a swinging strike occurring approaches 0.

Empirical Results

Discrete Response Profile & Means            

Table 2 is the discrete response profile for seasons 2010 to 2014. It reports the frequency of each outcome and the percent the frequency represents of all the outcomes.

Index Outcome Frequency % of Total
1 Home Run                6 0.01%
2 Triple            196 0.18%
3 Double          2,252 2.09%
4 Single          8,112 7.52%
5 Foul        50,835 47.12%
6 Line Out          2,891 2.68%
7 Fly Out        10,435 9.67%
8 Ground Out        12,198 11.31%
9 Pop Out          4,072 3.77%
10 Swinging Strike        16,881 15.65%

Table 3 contains the amount of observations for each variable, the mean, standard deviation, and the minimum and maximum values for seasons 2010 to 2014.

Variable N Mean Std Dev Min Max
Velocity    107,880 91.9668966 2.9209241 78 104.1
Movement    107,880 13.1161436 3.2585848 0.29 44.41

Parameter Estimates

Table 4 contains the parameter estimates for data from the 2010 to 2014 seasons. It contains the estimates, standard error, t values, and p values for each of the parameters. The standard error indicates the accuracy of the estimate in representing the population. The t and p values test for statistical significance. They both assume the null hypothesis is true and equal to 0. The t value indicates if the estimate is statistically significant from 0, the larger the t value, the more likely the null hypothesis is wrong and the parameter is statistically significant from 0. The p value indicates the probability the null hypothesis is true and the parameter is not statistically significant from 0. The lower the p value the more likely the null hypothesis is false and the parameter is statistically significant from 0.

Parameter Estimate S.E. t Value Pr > [t]
Intercept 2.175107 0.134481 16.17 < .0001
Velocity 0.017939 0.001114 16.1 < .0001
Movement -0.001804 0.000998 -1.81 0.0706

           Hypothesis Testing

Since the standard error and t value have been reported, their level of significance can be tested. Using the null and alternative hypothesis from equations 23 and 24 and using a critical value of 5 percent, equation 25 can be written as:

t(2.5, 107,877) < 16.10 < t(97.5, 107,877) = -1.960 < 16.1 < 1.960             (26)

t(2.5, 107,877) < -1.81 < t(97.5, 107,877) = -1.960 < -1.81 < 1.960             (27)

Equation 26 is the test hypothesis for velocity. Since -1.960 is less than 16.1 and 16.1 is greater than 1.960, the null hypothesis for velocity is located to the right of the critical value in the upper tail of the curve somewhere in the rejection region, which means it can be stated with 95% confidence that velocity is statistically significant from 0 and influences the quality of contact and the outcome of the pitch, holding movement constant.

Equation 27 is the test hypothesis for movement. Since -1.960 is less than -1.81, but -1.81 is not greater than 1.960 then the null hypothesis for movement is located to the left of the upper tail’s critical value, which is not beyond the critical value in the rejection region, and means the null hypothesis cannot be rejected. This means it can be stated with 95% confidence that movement is not statistically significant from 0. If movement were 0, the quality of contact and outcome of the pitch would not change, holding velocity constant. This means it can be removed from equations 1, 3, and 13 through 22. Regression Equation Since estimates for the parameters have been calculated and their level of significance has been determined, the values can be plugged into equation 1 to get:

Oi= 2.175107 + .017939*V + ε                      (26)

Movement has been removed because it has no effect on the outcome. Also, the error term remains unknown because its precise value cannot be determined using a Limited Dependent Model. The error term takes on a range of values depending on the value of the independent variables and the value of the upper and lower limit of the outcome.

Interval Estimates

Table 5 contains the interval estimates for seasons 2010 to 2014 for each type of outcome. It gives the lower limit, upper limit, standard error, t value, and p value, and the upper limit minus the lower limit, which gives the size of the interval.

Parameter Home Run Triple Double Single Foul Line Out Fly Out Ground Out Pop Out Swinging Strike
Lower Limit 0.900493 1.798627 2.175107 2.5067 3.975195 4.043806 4.304515 4.664113 4.811196
Upper Limit 0.900493 1.798627 2.175107 2.5067 3.975195 4.043806 4.304515 4.664113 4.811196
S.E. 0.086475 0.088013 0.088104 0.088125 0.088165 0.08818 0.088204 0.088235
t Value 10.41 20.44 28.45 45.11 45.87 48.81 52.88 54.53
Pr > [t] < .0001 < .0001 < .0001 < .0001 < .0001 < .0001 < .0001 < .0001
Upper – Lower 0.898134 0.37648 0.331593 1.468495 0.068611 0.260709 0.359598 0.147083

Velocity can be removed from equations 13 through 22 and the values from Tables 4 and 5 can be plugged into the equations to get:

4.811196< 2.175107 + .017939*V + ε = OSS                                                        (27)

4.664113 < 2.175107 + .017939*V + ε < 4.811196 = OPO                           (28)

4.304515 < 2.175107 + .017939*V + ε < 4.664113 = OGO                          (29)

4.043806 < 2.175107 + .017939*V + ε < 4.304515 = OFO                           (30)

3.975195 < 2.175107 + .017939*V + ε < 4.043806 = OLO                           (31)

2.506700 < 2.175107 + .017939*V + ε < 3.975195 = OFL                           (32)

2.175107 < 2.175107 + .017939*V + ε < 2.506700 = OSG                          (33)

1.798627 < 2.175107 + .017939*V + ε < 2.175107 = ODB                           (34)

0.900493 < 2.175107 + .017939*V + ε < 1.798627 = OTP                           (35)

0.900493 > 2.175107 + .017939*V + ε = OHR                                                    (36)

  Goodness of Fit Measures

Goodness of fit measures describes how well the model fits the observations. The measures typically summarize the discrepancy between observed values and the expected values in the model. Since the linear regression model was not used, the goodness of fit measures is not those that are typically expected such as the coefficient of determination, R2. Table 6 contains the reported measures for the data from the 2010-2014 seasons.

Measure Value
Likelihood Ration (‘R) 259.29
Upper Bound of R (U) 350699
Aldrich-Nelson 0.0024
Cragg-Uhler 1 0.0024
Cragg Uhler 2 0.0025
Estralla 0.0024
Adjusted Estralla 0.0022
McFadden’s LRI 0.0007
Veall-Zimmerman 0.0031
McKelvey-Zavoina 0.0027

The most useful of these measures is McFadden’s LRI because it is analogous to R2. It is bounded between 0 and 1 and, in theory, can equal 1, meaning the model is a perfect fit for the data, even though most models that are a good fit fall in the range of .2 to .4 (vii). All of the other measures except for the Likelihood Ration (R) and Upper Bound of R (U) are similar to McFadden’s LRI—they’re an attempt to simulate R2.

Conclusion

Since the estimated coefficient for velocity is positive, the greater the amount of velocity the lower the quality of contact, meaning a desirable outcome for the pitcher is likely to occur. This supports the first part of the hypothesis. But the estimate for movement was not significantly different from 0, which does not support the second part of the hypothesis. A pitcher is not trying to induce movement when he throws a four-seam fastball and the movement that does occur is relatively little compared to pitches in which a pitcher is trying to induce movement. Indeed, a four-seam fastball rotates backwards, which keeps the ball straight and limits the movement. This relatively small amount of movement may not do much to deceive a hitter and cause him to either swing and miss or make poor contact. It would be interesting to see if the amount of movement in pitches in which a pitcher is trying to induce movement leads to lower quality of contact.

According to Table 2, it appears to be difficult for a pitcher to get a hitter to swing and miss at a four-seam fastball. Hitters make contact 84.35 percent of the time, and swing and miss 15.65 percent of the time. It also appears to be difficult for a hitter to make the type of contact required to not make an out—only 9.8 percent of the outcomes resulted in a hit. When a hitter does make an out the type of contact is mostly poor—54.97 percent of the outs are ground-outs and pop-outs. The outs requiring a bit more solid contact—line-out and fly-out—make up 45.03 percent of all the outs. It also appears the most frequent outcome is a foul. Fouls can be good for a pitcher if they result in strikes, but a foul will only result in a strike if the count has less than two strikes. If the count for the hitter has two strikes, it is good for the hitter because he gets to see another pitch.

Since the interval for the foul is the largest and the intercept is the lower limit for the outcome immediately above it—the single—it is easy to see the model predicts the most likely outcome to be a foul. This makes sense because it was the outcome that occurred most often by a wide margin. But given the ambiguity of the foul in terms of value to the pitcher and hitter, and the quality of contact required to cause a foul, any type of positive analysis will be ambiguous. A statement cannot be made about the value of this outcome except the value changes from the pitcher to the hitter depending on the count.

Since the goodness of fit measure is rather low, the model is not a good fit for the data. This result does not mean the model is not predictive. Rather, it means there are other variables influencing the quality of contact and the outcome of the pitch that are not included in the model. In some ways, this makes sense: baseball is a complicated game and the outcome of a four-seam fastball depends on much more than just velocity and movement. Things such as the location of the pitch, the sequencing of the previous pitches, the handedness of the pitcher and batter, the base/out state, and the count play a large part in determining the outcome of the pitch. If some of these variables were included in the model then its predictive power and goodness of fit would have most likely increased.

Taking the average fastball velocity from table 3, 91.96 MPH, plugging it into equation 26 and ignoring the error term, the value is 3.82, which falls in the interval for foul, as expected. But, in order for the speed to result in a swinging strike, it needs to travel around 147 MPH, or 19 standard deviations above the mean. This doesn’t fit very well with reality—no pitcher will ever throw a pitch at 147 MPH and plenty of hitters swing and miss four-seam fastballs with velocity around the mean. If velocity were the only thing determining the outcome, it would require 147 MPH to result in a swing and miss. But velocity is not the only determinant; it has only a small influence over the outcome of the pitch. This supports the conclusion the model does not fit the data very well and the error term is probably rather large relative to the estimated coefficient for velocity.

In the extremely competitive environment of major league baseball where teams flesh out the smallest advantage to give them an edge over their competitors, it makes sense for them to put a greater emphasis on velocity. It does have an influence on generating favorable outcomes for the pitcher. Therefore the trend in baseball is likely to continue and velocity is going to continue to increase.


Trouble With the Aging Curve

Ever since I became enamored by the baseball statistical community, I’ve tried to gather as much information as I could. I registered on several websites dedicated to the analysis of baseball statistics such as baseballprospectus.com or FanGraphs.com or HardballTimes.com. I read every book, article I could get my hands on and even tried my hand at producing my own research and analysis in order to achieve two goals in my life: 1. Publish my research and become a savvy baseball analytical mind; and 2. Work within a baseball organization.

My first basic analysis came in the form of three year projections in order to try my hand at fantasy baseball. Personally, I’m proud to say that my first dip within the analytical waters where fruitful as my projections helped me win my league 3 times out of 5 attempts[1]. But, after many years keeping my projections and questions to myself; I’ve finally felt compelled to start more serious research and publish my questions and results online to share with people interested in these topics. So, without further ado, I give you my first serious publication.

***

Many readers will often find that writers, commentators and analysts highly value a player before they reach their age 30 season. But, once they pass this mark, players will begin to gradually decline; their production will falter, they’re prone to getting injured more than once within the same season, their speed will begin to abandon them. In other words, the shine begins to disappear and is replaced by a shelled version of a player we, the fans, and managers value. Furthermore, I’ve often read in many articles that players even peak at the age of 27 – this being the season where a player will give his (all-time) best performance before beginning that slow decline into retirement.

Now, I have two problems with this:

  1. What stats determine that a player’s best season is his age 27 season?
  2. Does this peak age season vary for every position or are all players subjected to the same aging curve?

To answer the first question, I used player statistics starting from 1960 up to 2013 and looked specifically at power numbers – slugging percentage, isolated power and on-base plus slugging[2]. I then calculated each player’s age in accordance with their birthday and how old they would be by June 30th and took this to be their age-season. Once I had this, I began running histograms in order to determine the lowest performance, highest performance, mean and first and third percentiles.

For this analysis, I only used the data for players who were between 20 and 35 years of aged during any given season. What I found, starting with SLG, was that players – power-wise – don’t reach their peak at 27 but after their 30s. A player’s SLG increases gradually as he gets older until he reaches his age 31-32 season. A player will have a mean SLG of 0.437 by age 27, while, during his age-32 season, the mean SLG will be 0.447 – ten percentile points higher or an increase of 2.3%.

So, as we can see, SLG-wise, a player will show a better performance past his 30th birthday. But maybe I am biased. Maybe if I checked ISO, we will find different results.

What I found were very similar results. A player’s isolated power, again, on the mean, didn’t peak at age 27. The ISO was 0.159. And, the ISO didn’t peak during the age-32 season but a year earlier during the age 31 season. During this season, ISO was 0.167 while the next season it began to decline at 0.165. ISO increases by 5.0% during those five years.

Finally, I decided to take a look at OPS to see if I could find a similar pattern. Again, players mean OPS peaks during their age 32 season, going from 0.784 at their age 27 season to 0.801 by the time they’re 32. It’s not much of an increase (2.2%) but it’s something.

What I can determine, then, is that a player’s power begins to develop once he hits 27 years of age and will gradually increase right up to when he turns 32. But, after this, his power performance will begin to decline, though not by much.

Another thing that I concluded from looking at these three histograms is that, even though there are gradual increases every season.  Player performance – power-wise – will be fairly consistent from one season to the next. Save for the early seasons (21-25 when a player is still developing), there are no surprising jumps in power[3] from one age to the next. Therefore, though we might prefer younger players for cost control reasons, when we need power production, we can’t fully disregard an older player’s power performance. Chances are they will still produce the same.

***

Having checked how power changes as a player ages, I come to my second question: Does the aging curve differ across positions? Well in football – or soccer for Americans – we have four major positions: striker, midfielder, defense and goalkeeper. Through statistical analysis by Arsenal F.C.’s data department, Arsene Wenger, Arsenal’s manager, found that a players decline varies on the position he plays on the field. That is to say, a striker will age differently than a goalkeeper, and a defender will age different to these two positions.

And, as we all know, work at different positions takes a different toll on a player’s body. Catchers will suffer become more fatigued as a season rolls by than players at any other position; shortstops, as well, have a more demanding position that will require more physical effort. We expect different results from each of the three outfield positions. So, it would be natural that players at different positions age differently on the power curve[4].

What I found out was that my thoughts were correct: positioning on the diamond does affect a player’s power performance but not by much. These are the results based on the mean:

Position Peak Age SLG
Catcher 33 0.413
First Base 31 0.451
Second Base 35 0.390
Third Base 34 0.417
Shortstop 35 0.389
Left Field 32 0.441
Center Field 32 0.433
Right Field 32 0.447

 

As we can see from the data, first basemen will usually be the first position players to peak. After them, the three outfield positions will peak at age 32. Catchers will then follow suit. Finally, the hot corner will peak at 34 and the middle infield will produce more power by the time they turn 35 than any of their previous years.

What we can conclude from this table is the following; because the demand on power from first base more than defense, players will tend to flex their muscles more often than not; whilst primarily defensive positions such as catcher, second base and shortstop will develop more power later in their careers than when they start off. Outfielders, on the other hand, tend to produce power throughout their careers.

The position that does surprise me is the hot corner. I would have expected third basemen to peak earlier in their careers because most players at the position are power hitters. Then again, there are many good defensive third basemen who aren’t big power players (I’m looking at you Juan Uribe).

***

After reviewing all the numbers, I can safely conclude that as players age, power doesn’t decline. On the contrary, power also increases though not by very much. Furthermore, the gradual increase in power at the plate will vary by position, much like a football – soccer – player’s performance will vary according to his position. Therefore, though we may like young players because of their hustle, cost-control and their energy, it doesn’t hurt to carry a few veterans in the lineup, if not to mentor the young ones, to provide some pop within the lineup.

 

[1] A small sample size, I admit, but nevertheless, a positive achievement as it encouraged me to delve deeper into baseball analytics.

[2] I didn’t look at OBP as I believe that this stat has more to do with a player’s ability at identifying pitch types, though in retrospect, this can also become better as a player ages and gains more experience.

[3] Though there are many outliers as you can see.

[4] I have charts and charts of histograms for each position measuring SLG, ISO and OPS but since I don’t want to oversaturate with information.


Big Winners of the Offseason So Far: AL

As all of baseball convened in San Diego this past week, there were a lot of holes to fill. There are some teams that have been very active in free agency and trades over the past weeks and this article means to look at three teams in the American League that have enhanced their rosters over that span of time.

These teams did not make the playoffs in 2014 and they added players that may make them playoff caliber teams in 2014.

CHICAGO WHITE SOX
2014 Regular Season Record (73-89)

There has been a lot of pressure on the White Sox to build a winner as the Detroit Tigers and Kansas City Royals have made the World Series in the past three years and the Cleveland Indians made the playoffs in 2013. The White Sox made a couple big splashes this offseason to boost their profile in the AL Central.

A bit before the Winter Meetings, they inked Adam LaRoche to bolster their weak lineup and provide left-handed power to match Jose Dariel Abreu’s right-handed power in the middle of the lineup. LaRoche has averaged 27 home runs per 162 games in his career and twice in the past three years has had an OPS over .800. LaRoche may not be an All-Star caliber player, but, other than an awful 2011, LaRoche has consistently been a strong performer with an OPS+ of 114 for his career.

The White Sox have an ace in Chris Sale, with a 9.8 K/9 and a 2.76 ERA since entering the league in 2010. He had a 2.17 ERA over 26 starts last season, but the Sox needed a second top pitcher to compliment Sale in the rotation. They did just that by moving prospect Marcus Semien, along with other minor league prospects, for Jeff Samardzija. The 29 year old veteran has struck out 200 or more batters in each of the past two years and posted a sub-3.00 ERA last season. His ERA went up and strikeouts went down as he went from the Cubs in the National League to the Athletics in the American League, but did see his WHIP drop strongly to beneath 1.00 and struck out 99 while walking only 12. The White Sox now have two top-25 starters coming into the 2015 season, as Sale will be top-5 starter and Samardzija will comfortably sit in the 22-24 range.

The White Sox needed some help in the bullpen as Zach Putnam or Jake Petricka were set to be the closer for 2015, so they dipped into their pockets, signing two former All-Stars to multi-year contracts. Zach Duke signed a bit before Winter Meetings and the former All-Star starter has a 2.20 ERA in his last 88 appearances and the White Sox needed a left-handed relief option as the entire bullpen was right handed before signing Duke. The big splash for the White Sox, though, was signing former Yankee All-Star closer David Robertson. Since 2011, Robertson has a 12.3 K/9 and from 2011-2013, had no higher than a 2.67 ERA. He only has 46 saves in his MLB career, as he was the setup man for Mariano Rivera coming into 2014. But Robertson had 39 saves last year, and has seen his BB/9 go from 4.7 in 2011 to a 2.8 average from 2012-2014. Duke will provide left-handed relief help that the White Sox were devoid of and Robertson will be the All-Star caliber closer that the White Sox have been without since Bobby Jenks left.

 

TORONTO BLUE JAYS
2014 Regular Season Record (83-79)

The Blue Jays play in the most active division and have been active in the market. They signed a Gold Glove caliber catcher, an MVP candidate at third base, and freed up space on the roster for a top prospect.

Russell Martin is a highly underrated player who is very strong in intangibles, like his blocking of pitches and elite game calling skills, and will bring his veteran experience to Toronto. Martin’s game calling abilities are well known; his catching abilities will enhance the entire Blue Jays staff, as he led a Pirates staff to back-to-back playoffs with top five ERAs in each season. Martin may never steal double-digit bases again, as he did each season from 2006 to 2009, but he had a .832 OPS last year and hit 39 home runs in his two previous seasons in the AL East, both with the Yankees. His .402 OBP of 2014 may be a bit of a misnomer of his abilities; he had a .332 OBP in the previous five seasons, but he will have much more than 45 runs as a top of the lineup hitter in a lineup with three MVP candidates behind him. Martin may be in a lineup with MVP caliber talent, but could end up being the most vital piece of a playoff run for the Blue Jays.

Josh Donaldson is the newest MVP candidate in the Blue Jays lineup, adding to the already formidable combination of Edwin Encarnacion and Jose Bautista. The Blue Jays had to trade three prospects and starting third baseman Brett Lawrie to get Donaldson, but Donaldson is well worth the investment. He has been the starting third baseman for the Athletics for two years and over that time he hit 53 home runs and was a top-10 MVP finisher in both 2013 and 2014. Donaldson broke out in 2013 with a .883 OPS and 64 XBH and had a bit of a letdown in 2014; he still finished with 29 home runs and 98 RBI in 2014, even though he struck out 20 more times and saw his OPS drop to .798.  There are not many power hitting third basemen in baseball and the Blue Jays are fortunate to have Donaldson, a top five 3B option.

The Blue Jays saw a couple needs in the offseason and two were filling a gap in the outfield left by free agents Melky Cabrera and Colby Rasmus, as well as finding a place in the rotation for top prospect Daniel Norris. By trading fifth starter J.A. Happ for Michael Saunders, and allowing Norris to slide into the rotation, both gaps were filled. Norris is the #25 ranked prospect according to MLB.com, with a 2.53 ERA last year and a 10.7 K/9 over his three minor league seasons. He may struggle a bit earlier in the season, but he could have a similar impact to 2014 rookie star Marcus Stroman with his power fastball and a strong slider/changeup combination. Norris may not have a huge impact to start the season, but could be an impact player later in the season.

Saunders was a bit undervalued in Seattle, but has a very interesting profile. He slots into the bottom of the projected Blue Jays lineup and has a little bit of a better profile than the man he is replacing, Colby Rasmus. Saunders is a very good defensive outfielder, but has had two seasons with more than 10 home runs and steals, while also posting three consecutive seasons with an OPS above league average. The only season where Saunders had 500 or more at bats, 2012, he posted 19 home runs and 21 steals; his OBP has risen from .306 in 2012 to .341 in 2014, so there is potential for Saunders to be even better with more opportunity in Toronto. Saunders was obtained for a very movable piece in Happ; if the Blue Jays are able to fill a major need in the outfield and only have to give up a fifth starter to do so, this would be a huge victory for the Blue Jays.

 

BOSTON RED SOX
2014 Regular Season Record (71-91)

The 2013 champion Red Sox bore no resemblance to the 2014 team that finished last in the AL East. As the Red Sox are a financial juggernaut, they were able to flex their muscles adding two former All-Stars and then traded for two All-Star pitchers in San Diego.

Pablo Sandoval has been an instrumental part of three Giants World Series and, after disappointment from Will Middlebrooks, will bring his talents to the Red Sox in 2o15. Much has been written about Sandoval’s streaky play and his free swinging ways, but Sandoval is a .294 hitter over his seven MLB seasons and averaged 44 extra base hits over the past four seasons. The switch hitting Sandoval will get a serious boost from the left side by hitting doubles off of the Green Monster; this is a needed boost as Sandoval has not had 30 or more doubles in a season since back-to-back 30 double seasons in 2009 and 2010. Only once in his career has Sandoval had more than 80 RBI and twice has he had 20 or more home runs; Sandoval’s value comes from his postseason experience and is a top 15 3B in a weak 3B crop.

Rick Porcello was a top prospect coming through the Tigers system, but really never broke through as a stable pitching option until his 15 win 2014 season where he had a 3.43 ERA. The Red Sox need a lot of pitching help, as they finished 10th in the AL in ERA, and Porcello’s ground ball tendencies may fit the Red Sox well. Xander Bogaerts will be more prepared at shortstop this season and Dustin Pedroia‘s defense up the middle will absolutely suit Porcello’s skills. Porcello is coming off of his first 200 inning season and has seen his WHIP go from 1.41 in his first four seasons to 1.25 in the last two seasons. He has seen his K:BB ratio rise over 3 as well and he is only 26 years old going into his seventh MLB season. That experience should be great for him coming into the grinder that is the AL East. Porcello has a career FIP that is 30 points less than his career ERA, showing that the talent is there for Porcello; look for him to breakthrough as an All-Star caliber pitcher this year.

Hanley Ramirez was the top hitter available and has been one of the most polarizing players over the past five seasons. Coming into 2010, he was the top fantasy baseball prospect, but saw his OPS go from .853 in 2010 to .742 combined in 2011 and 2012; he then posted a .907 OPS in 2013 and 2014, including a white hot 1.040 OPS in 88 games of 2013. Ramirez has twice before been a 50+ SB player and led the NL in BA in 2009, so the talent is there. But Ramirez has averaged only 121 games played since 2010 and has had two seasons where he played in less than 100 games.

Ramirez will also move to left field this season which should be a very interesting move for fantasy purposes; had Ramirez stayed at third, or even shortstop, he may have been a third round pick, but as an outfielder it is very questionable. There is a chance that Ramirez has less wear and tear in the outfield and becomes a top-10 hitter again, but a .282/.358/.467 slashline in the outfield is not worthy of a top-10 OF spot. A lot will be expected from Ramirez, but this may be the season that he is able to play 150 games of All-Star caliber play in the outfield, regaining his reputation as an MVP candidate.


2015 Fantasy Sleepers: Starting Pitching

The key to winning at fantasy baseball is finding players who will outperform their draft position.  This will be the first of a series of articles addressing undervalued and overvalued players that you should be targeting in your draft.

Read the rest of this entry »


Using Gifs to Visualize Curveballs on the Scouting Scale

After reading Kiley McDaniel’s articles on explaining the scouting scale, I thought that I would take a different approach in trying to explain it–namely the approach of gifs.  Although the scale is normally reserved for players who haven’t lost their rookie status in the majors, perhaps a visualization can better illustrate what “major league average,” “plus,” or “below average” looks like.

To show the differences between curveballs in different positions along the scouting scale, major league curveballs must first be graded.  To do this, I pulled up Baseball Prospectus’ pitch f/x leaderboard and set it to filter for pitchers who threw at least 100 curveballs in 2014.  The way in which pitch movement is measured and the fact that lefties’ curveballs’ horizontal movement appeared to be measured lower than righties forced my methodology.

I split up righties and lefties into separate groups for analysis.  Pitch movement was recorded based on the inches a pitch broke more than the the pitch with the least break.  Total movement was recorded as the square root of the combined horizontal and vertical movement squares (C2 = A2 + B2).  Z Scores were recorded for each curveball’s velocity and movement, and then were added (with movement receiving a 1.5 times greater weight) to form a grade.  The grades were then transferred into Z Scores with a median of 50 and each standard deviation being 10.  Finally, the righties and lefties were combined to make a final scouting scale.

Name / Throws / Total Movement Z Score / Velocity Z Score / Scouting Grade

Garrett Richards R 2.081 0.289 73.4
Drew Pomeranz L 0.882 1.398 69.0
Tyler Skaggs L 1.757 0.047 68.8
Blaine Hardy L 1.356 0.410 67.1
Gio Gonzalez L 1.406 0.317 66.9
Alex Cobb R 0.907 0.948 66.0
Roenis Elias L 0.949 0.771 65.3
Sonny Gray R 0.589 1.187 64.4
Jarred Cosart R 1.125 0.289 63.8
Felix Hernandez R 0.812 0.712 63.5
Jake Arrieta R 0.972 0.428 63.2
Charlie Morton R 1.191 0.066 63.0
Carlos Torres R 0.973 0.350 62.7
Craig Kimbrel R -0.474 2.430 62.1
Sean Marshall L 1.795 -0.970 62.0
Yoervis Medina R -0.471 2.330 61.5
Joe Kelly R 0.852 0.341 61.4
Mark Melancon R 0.312 1.116 61.2
Stephen Strasburg R 0.633 0.612 61.0
Justin Grimm R 0.404 0.915 60.8
Robbie Erlin L 1.711 -1.022 60.7
Jamey Wright R 0.958 0.057 60.6
Adam Wainwright R 1.731 -1.106 60.6
Wandy Rodriguez L 1.401 -0.639 60.1
Kevin Jepsen R -0.315 1.861 59.9
Jeremy Hellickson R 1.367 -0.664 59.9
Clay Buchholz R 1.007 -0.166 59.6
Scott Atchison R 0.429 0.683 59.5
Will Harris R 0.538 0.518 59.5
Michael Bolsinger R 0.524 0.521 59.3
John Axford R 0.857 0.015 59.3
David Robertson R -0.245 1.619 59.0
Jeremy Affeldt L 1.197 -0.497 59.0
Ian Kennedy R 0.943 -0.186 58.8
Yu Darvish R 0.941 -0.182 58.8
Eric Surkamp L 0.624 0.320 58.7
Nick Masset R 0.680 0.182 58.6
Tom Koehler R 0.551 0.360 58.5
Marcus Stroman R -0.189 1.455 58.4
Zack Wheeler R 0.604 0.263 58.4
Jose Fernandez R -0.255 1.532 58.3
Scott Downs L 1.405 -1.064 57.2
Edinson Volquez R 0.240 0.609 57.1
Felix Doubront L 1.093 -0.629 56.9
Juan Gutierrez R 0.110 0.744 56.7
Jesse Chavez R 1.076 -0.731 56.5
Jeremy Jeffress R 0.264 0.476 56.4
Danny Duffy L 0.446 0.272 56.4
Cody Allen R -1.180 2.630 56.4
Mike Leake R 0.210 0.515 56.2
Dellin Betances R -0.552 1.635 56.0
Trevor Bauer R 0.416 0.147 55.8
Jose Veras R 0.924 -0.638 55.6
Adam Warren R -0.220 1.022 55.2
Clayton Kershaw L 1.119 -0.906 55.2
Chris Tillman R 0.987 -0.828 55.0
Cole Hamels L 0.142 0.517 54.9
Miles Mikolas R 1.158 -1.093 54.9
Jeff Locke L 0.268 0.317 54.9
Gerrit Cole R -0.845 1.884 54.7
Josh Fields R 0.233 0.257 54.7
Nick Tepesch R 0.370 0.018 54.4
Brandon Workman R 0.743 -0.544 54.4
Carlos Carrasco R -0.275 0.954 54.2
Cesar Ramos L 1.941 -2.290 54.2
Tom Wilhelmsen R 0.309 0.034 53.9
Jenrry Mejia R 0.117 0.308 53.9
Trevor Cahill R 0.202 0.153 53.7
Odrisamer Despaigne R 0.813 -0.770 53.6
Collin McHugh R 1.350 -1.635 53.2
Jesse Hahn R 1.150 -1.345 53.2
Phil Hughes R 0.573 -0.502 53.0
Samuel Deduno R -0.389 0.906 52.8
Brett Cecil L -1.373 2.483 52.8
Ian Krol L -0.094 0.555 52.7
Wade Davis R -1.260 2.184 52.6
Jordan Zimmermann R -0.034 0.308 52.3
Andre Rienzo R 0.052 0.170 52.3
Junichi Tazawa R 0.731 -0.857 52.2
Jason Hammel R 0.477 -0.512 52.0
Wesley Wright L -0.302 0.764 52.0
Mike Fiers R 1.376 -1.890 51.8
Nathan Eovaldi R 0.542 -0.644 51.8
Alex Wood L -0.385 0.851 51.7
Jake Buchanan R 0.570 -0.709 51.6
Dillon Gee R 0.926 -1.261 51.5
Trevor May R 0.298 -0.328 51.4
David Buchanan R 0.247 -0.263 51.3
Hyun-jin Ryu L 1.074 -1.395 51.3
Rick Porcello R 0.171 -0.189 51.1
Yordano Ventura R -1.045 1.603 50.9
Anthony Ranaudo R 0.136 -0.195 50.7
Aaron Loup L -0.110 0.282 50.6
Brad Hand L -0.491 0.851 50.6
Brandon McCarthy R -0.757 1.116 50.5
Will Smith L -0.282 0.526 50.5
Vidal Nuno L 0.094 -0.043 50.5
Justin Verlander R -0.301 0.415 50.4
Tommy Hunter R -1.053 1.539 50.4
Santiago Casilla R -0.726 1.038 50.3
Brad Peacock R 0.257 -0.447 50.2
Madison Bumgarner L 0.013 0.043 50.2
Fernando Abad L -0.223 0.397 50.2
C.J. Wilson L 0.077 -0.066 50.1
Francisco Rodriguez R 0.334 -0.586 50.1
A.J. Burnett R -0.847 1.167 49.9
Erik Bedard L 0.572 -0.842 49.9
Tanner Roark R 0.889 -1.455 49.8
Kevin Quackenbush R 0.264 -0.531 49.7
Casey Janssen R 0.767 -1.287 49.7
Yovani Gallardo R -0.366 0.376 49.5
Matt Cain R -0.007 -0.173 49.4
Cory Rasmus R 0.332 -0.683 49.4
J.P. Howell L -0.488 0.652 49.2
Shelby Miller R 0.046 -0.328 48.9
Joba Chamberlain R -0.417 0.331 48.7
Mike Minor L -1.028 1.373 48.6
Lance Lynn R -0.505 0.441 48.5
Josh Beckett R 0.835 -1.587 48.4
Daisuke Matsuzaka R 0.510 -1.109 48.3
Chase Anderson R -0.043 -0.318 48.1
Jorge De La Rosa L 0.397 -0.845 48.0
Danny Farquhar R 0.214 -0.725 47.9
Nick Martinez R 0.107 -0.599 47.7
Jerry Blevins L 0.348 -0.825 47.6
Matt Garza R 0.444 -1.141 47.5
Franklin Morales L 0.340 -0.874 47.2
Craig Stammen R -0.732 0.563 47.1
Javy Guerra R -0.179 -0.266 47.1
Scott Feldman R 0.326 -1.041 46.9
Anthony Varvaro R -0.810 0.638 46.8
Hector Noesi R -0.910 0.741 46.5
Miguel Gonzalez R -0.144 -0.428 46.3
John Lackey R -0.520 0.128 46.3
Kevin Correia R -0.427 -0.015 46.3
Kyle Kendrick R -0.313 -0.186 46.3
Tyler Thornburg R -0.364 -0.118 46.2
Colby Lewis R -0.216 -0.344 46.2
Donn Roach R 0.008 -0.683 46.2
Tim Lincecum R 0.226 -1.022 46.1
Chris Capuano L -0.013 -0.507 46.0
Josh Tomlin R -0.062 -0.618 45.9
J.A. Happ L -0.568 0.275 45.7
James Paxton L -1.519 1.643 45.3
Vance Worley R -0.360 -0.289 45.1
J.J. Hoover R 0.093 -0.973 45.1
Tim Hudson R 0.007 -0.854 45.0
Wei-Yin Chen L 0.065 -0.809 44.7
Marco Estrada R -0.419 -0.286 44.5
Kyle Lohse R 0.091 -1.109 44.1
Vic Black R -1.482 1.248 44.1
Gavin Floyd R -1.285 0.951 44.1
Bruce Chen L 0.405 -1.421 44.0
Phil Coke L -1.232 1.003 43.8
Jon Lester L -0.249 -0.478 43.7
Paul Maholm L 0.340 -1.492 42.8
Jose Quintana L -1.420 1.134 42.7
David Phelps R -1.211 0.622 42.7
Zach Duke L -0.491 -0.288 42.5
Brett Oberholtzer L -1.197 0.751 42.4
Homer Bailey R -1.197 0.538 42.2
Jon Niese L -0.078 -0.954 42.2
Alfredo Simon R -0.751 -0.179 41.9
Jim Johnson R -1.275 0.605 41.9
Zack Greinke R 0.308 -1.903 41.0
Scott Carroll R -0.685 -0.431 40.9
Jason Vargas L -0.469 -0.562 40.8
Travis Wood L 0.088 -1.443 40.5
Heath Bell R -1.696 0.948 40.0
David Price L -1.564 0.954 39.9
Doug Fister R 0.071 -1.726 39.8
Jordan Lyles R -1.731 0.961 39.7
Michael Wacha R -0.378 -1.112 39.4
Masahiro Tanaka R -0.204 -1.413 39.2
Joel Peralta R -1.008 -0.208 39.2
James Shields R -1.514 0.518 38.9
Jake Odorizzi R 0.579 -2.759 38.0
Andrew Heaney L -1.515 0.568 37.7
Max Scherzer R -1.307 -0.163 36.5
Anibal Sanchez R -1.676 0.357 36.2
Julio Teheran R -0.450 -1.539 35.9
Scott Kazmir L -1.244 -0.124 35.7
Yusmeiro Petit R -1.253 -0.402 35.4
Mat Latos R -1.151 -0.586 35.2
Jacob deGrom R -1.879 0.473 35.0
Tommy Milone L -0.975 -0.626 35.0
Joe Nathan R -2.412 1.271 35.0
Edwin Jackson R -1.821 0.370 34.9
Matt Shoemaker R -1.126 -0.802 34.0
Drew Smyly L -1.762 0.330 33.4
Dan Haren R -1.545 -0.292 33.2
Hector Santiago L -1.609 0.069 33.2
Grant Balfour R -2.631 1.274 32.8
Ryan Vogelsong R -1.631 -0.405 31.6
Mark Buehrle L -0.593 -1.691 31.5
Erasmo Ramirez R -2.280 0.528 31.3
Jeremy Guthrie R -1.384 -0.854 31.1
Jacob Turner R -2.033 0.102 31.0
Hiroki Kuroda R -1.637 -0.528 30.7
Johnny Cueto R -2.591 0.880 30.6
Jake Peavy R -2.414 0.573 30.3
Josh Collmenter R -0.839 -1.884 29.7
Sam LeCure R -1.215 -1.461 28.7
Bronson Arroyo R -1.320 -1.416 28.0
Aaron Harang R -1.461 -1.319 27.2
John Danks L -1.548 -1.051 25.9
Eric Stults L -0.448 -2.837 24.9
Fernando Salas R -3.656 1.613 24.8
Carlos Villanueva R -2.102 -0.718 24.8
Jered Weaver R -0.725 -2.840 24.4

For our first gifs, we’ll look at two of the top curveballs on the by-the-numbers scouting scale.  At an slightly above average velocity of 79.7 mph with the highest amount of break above than the baseline (sorry, Fernando Salas), Garrett Richards’ curveball is a sight to behold and grades out as a 73:

Next, we’ll look at Tyler Skagg’s curveball, which, features great horizontal and vertical movement while maintaining average velocity.  It grades out as a 69:

With the plus-plus-type (70) curveballs out of the way, we’ll take a look at some plus (60) curveballs.  Wandy Rodriguez fits this category.  His curveball has slightly less, but similar, movement as Skagg’s but it’s lesser velocity makes it a lesser-quality pitch.  Notice how it has defined break, but it appears a bit loopy due to its velocity:

With Kevin Jepsen, we see a curveball that is graded similarly but looks much different than Wandy’s.  Although its break looks sharp because of its velocity, the total movement is slightly below average.  This pitch may actually be a slider, but Jepsen’s breaking pitches tend to run together so consider it a representative picture of his curveball.

Next, we’ll look at an average (50 on the scale) curveball.  Erik Bedard gets slightly above average break, but with below average velocity.  It has solid two-plane break, but it doesn’t look very sharp:

Next, we’ll look at a below average (40 on the scale) curveball.  Jordan Lyles has good velocity on his curveball, averaging 81.75 mph, but he also averages nearly two standard deviations less break than an average curveball in this sample.  The following is a tough angle to see the break, but it lacks the sharp break of a better curveball.

Finally, we’ll look at Jered Weaver’s well below average (24 on the by-the-numbers scale!) curveball.  His curveball was the slowest in the sample and also featured below average break (much of the perceived break is from its low velocity).  It’s still an effective pitch for him, but it’s probably a result of his height and delivery combination than his curveball.  With just about any other pitcher, it would likely be ineffective.

 


Seth Smith Would Be Great Fit for Mariners

With Wil Myers headed to San Diego, and only a clean bill of health keeping Matt Kemp from joining him (which may or may not magically appear of LA is willing to pay a larger portion of Kemp’s remaining $105 million on his contract…), the San Diego Padres find themselves with an opportunity to move 2014 right fielder Seth Smith. Kemp’s days as an everyday center fielder have long passed, and Myers is likely better suited for a corner outfield spot as well. This leaves nowhere to play Smith, and the Padres could deal Smith to acquire some help at a spot other than the corner outfield.

The biggest issue with Smith has always been his splits. No matter how you slice it, Smith should relegated to a platoon role.

Smith’s splits:

                  RHP          LHP

wRC+     123              63

wOBA   .362            .274

ISO         .204           .109

A team that has been in the hunt for a corner outfielder this offseason is the Seattle Mariners. So far we’ve seen Seattle add Nelson Cruz to serve as the everyday DH, and over the past month we’ve seen the Mariners linked to names like Melky Cabrera, Dayan Viciedo, Justin Upton, and Kemp. The big hangup with Kemp, as well as Upton, was the issue with teams trying to pry away young pitchers like Taijuan Walker and James Paxton. Fortunately for the M’s, a player like Seth Smith would not cost them these young arms.

With Kemp and even more cash expected to head to San Diego, Upton will likely become the top corner outfielder available this offseason. Upton has just one year remaining on his current contract, and will make nearly $15 million in 2015. Meanwhile, Smith will make nearly $13 million over the next two years, and his contract contains a club option for the 2017 as well. Now we all know that Justin Upton is a better baseball player than Seth Smith, but is one year of Upton (and no Walker or Paxton for the next 6 years) better than two years of Smith and Justin Ruggiano (with Walker and Paxton still in Seattle for the next 6 years)?

Earlier this week, the Mariners acquired Ruggiano from the Cubs for minor league reliever Matt Brazis. Ruggiano is two years removed from a career year in Miami, in which he posted a 2.6 WAR in just 91 games. Even though his WAR over the last two seasons combined for just 1.3, he still profiles as an excellent platoon player. In 2013, Ruggiano posted a 130 wRC+ vs LHP, as well as a .362 wOBA. For 2014, Ruggiano’s numbers were nearly identical, posting a 129 wRC+ and a .362 wOBA vs LHP. Take a look at the career numbers below, with Smith and Ruggiano’s being their career splits:

                     Upton     Smith     Ruggiano

wRC+         121            123           128

wOBA        .359         .362          .360

ISO             .202         .204          .241

Whether or not the Mariners add Justin Upton, or Seth Smith, or go with some sort of Brad Miller/Ruggiano platoon that we’ve seen rumored, they will get solid production from that right field spot. Smith would cost them some value, and the Smith/Ruggiano platoon may not be the sexiest, but Smith would not cost them a Walker or Paxton.


Under the Radar: John Mayberry

Amidst the expensive December fireworks being set off by Andrew Friedman and Theo Epstein, the cash-strapped New York Mets quietly took another step towards correcting a major 2014 deficiency with the addition of John Mayberry for $1.45 million.

Removing the historically bad hitting performance of their pitching staff (they started the season with a major league record 0-for-64), the often maligned Mets lineup actually generated a respectable 104 wRC+ against right-handed pitching in 2014, good enough for 5th best in the National League.

Their offense vs. left-handed pitching was another story however as an 89 wRC+ (14th NL) and 22 HR (MLB worst) left the Mets scrapping to find runs in the late innings of games against deep lefty-heavy bullpens.  Leading the struggles vs lefties were Eric Young Jr (84 PA, 60 wRC+), Lucas Duda (125 PA, 54 wRC+), and Chris Young (83 PA, 51 wRC+).

I prepared for this first FanGraphs Community article of mine by studying Mayberry a little closer.  As a fan who has witnessed plenty of NL East action over the years, I was well aware of Mayberry’s established platoon splits.  What I wasn’t aware of was the massive amount of growth he had in 2014.

John Mayberry Splits vs LH Pitching

2011 – 6.7 BB%, 15.0 K%, 0.44 BB/K, .288 ISO, .306 BABIP, 157 wRC+
2012 – 5.6 BB%, 17.8 K%, 0.32 BB/K, .223 ISO, .289 BABIP, 116 wRC+
2013 – 7.4 BB%, 15.7 K%, 0.47 BB/K, .220 ISO, .244 BABIP, 106 wRC+
2014 – 13.4 BB%, 12.2 K%, 1.10 BB/K, .329 ISO, .214 BABIP, 151 wRC+

After 3 seasons with respectable peripherals, Mayberry took his platoon game to another level in 2014 with career-best numbers across the board except for an inexplicable .214 BABIP.  Over 534 career plate appearances against LHP, Mayberry carries a .269/.324/.533, 30 HR, 130 wRC+.

In addition to Mayberry is the aggressively acquired Michael Cuddyer (career 132 wRC+ vs LHP), and the Mets are now in position to be significantly strengthened vs. left-handed pitching without making headlines or gutting their very deep farm system.

 


 


The Ballad of Brett Lawrie

He’s not a good enough 3B and he doesn’t hit well enough to play at any of the easier defensive spots.

1261 PA, .273/.348./450, 102 OPS+

“He” is Edwin Encarnacion, then of the Cincinnati Reds, and those are his stats through his age 24 season (2005-2007). Just three years into his major league career, Encarnacion had yet to attain 600 PA in any one season, and questions were already be raised about his viability as an every day player. The quote above comes from here, and, to be fair, it represented the judgment of only some E5 observers. But despite having the opportunity to act out one of the best baseball revenge fantasies ever, Encarnacion never fully put those doubts to rest while he was with Cincinnati. Following what seemed to be a possible breakout season at age 25 in 2008, E5’s power disappeared the following year, and the disgusted Reds shipped him midseason to America’s Hat in exchange for the Ghost of Rolen Past, who gave Cincinnati the final 3.5 seasons of his career, 1.5 of which were useful.

North of the border, Encarnacion’s power returned in 2010 even as his OBP continued to regress; his production overall rebounded to the level it had been in 2008 (109 OPS+ and wRC+). He continued to maneuver around third base as though it were a point singularity, however, so in 2011 the Blue Jays began transitioning him to a 1B/DH, giving him 92 starts in those slots as opposed to just 30 at third. The results at the plate were encouraging: his average and OBP made substantial gains without giving away too much power. Then, in 2012, Encarnacion finally went off, commencing a three year tear during which his OPS has never been below .900 and his worst HR total was 34. Today, Encarnacion can hit in the middle of any major league lineup. If Alex Anthopoulos is working for the MLB Network in 2016, it won’t be Encarnacion’s fault.

***

He started off cold as ice … before getting hurt sliding into second base.

1361 PA, .250/.331/.415, 97 OPS+

“He” is Alex Gordon, and those are his stats through his age 24 season (2007-2009). Just three years into his major league career, Gordon’s stellar offensive production in college already seemed a distant memory, and questions were being raised about his viability as an every day player. The quote above comes from here, and while that writer was ready to give up on Gordon, to be fair, many others were calling on the Royals to remain patient with the second overall pick in the 2005 draft. In 2009 Gordon’s power, never substantial in the majors to that point, really began circling the drain, and after getting off to an anemic .685 OPS start in April 2010, the disgusted Royals demoted Gordon and banished him to left. He would never appear at third base again.

He would, however, rediscover his stroke. Gordon hammered 16 homers in just 321 PAs at Omaha, good for a steak-sized 164 wRC+. He returned to The Show on July 23, and while the remainder of his 2010 season did little to quiet his critics (he finished with a wRC+ of 85, actually two points worse than the previous year), in 2011 he began a four year rampage, headlined by 96 doubles during 2011-2012, as well as ironman durability. Since opening day 2011, Gordon is third in the majors in plate appearances, behind only Ian Kinsler and Elvis Andrus. Gordon’s durability and on-base skills have made him a key offensive cog in the Royals somewhat surprising resurgence. He’ll never have to go to back to Omaha unless he has relatives there.

***

He’s been injured numerous times, suspended and has underperformed with the bat once pitchers began taking advantage of his lack of patience at the plate.

 1431 PA, .265/.323/.426, 104 OPS+

“He” is Brett Lawrie, and those are his stats through his age 24 season (2011-2014). Just four years into his career, his first 171 blistering plate appearances of his career have disappeared in the rearview, and questions are being raised about his viability as an everyday player. The quote above comes from here, and was written before Lawrie strained his oblique (once again) last year, ending his season and, as it turned out, his tenure with the Blue Jays.

Lawrie plays baseball as though he’s being chased by an enraged Sumatran tiger. In his first brief season in the bigs (when, incidentally, he replaced Encarnacion as the Jays’ starting third baseman) this paid off with a .293/.373/.580 slash line.  Since then, however, he has been unable to translate all that energy into baseball achievement. That kind of intensity can wear thin unless it’s backed by production, and Lawrie’s rate stats have gone generally backwards since 2011. He’s had a fractured finger, repeated oblique injuries, and a bad slide into second, among other injuries.  Lawrie blames the turf at Rogers Centre, but the turf lawyered up, and Lawrie’s case proved at best inconclusive.

As the career paths of Encarnacion and Gordon suggest,  one way to resuscitate Lawrie’s bat might be to move him to the left end of the defensive spectrum. That won’t happen in Oakland; after the Donaldson trade the A’s third base depth chart (Renato Nunez aside) looks like the Fallujah skyline. Billy Beane has little incentive to try Lawrie anywhere other than third. And maybe it will work. The hopeful comp here might be Gary Gaetti, whose stat line through age 24 (1981-1983) looks similar to the other guys in this post, viz:

1241 PA, .237/.293/.428, 94 OPS+

But Gaetti was already a superior defender, and became a durable, full time starter at age 23. His plate appearances are light because he only had a shotglass of coffee in 1981.

Gaetti was good. Real good: a 42 WAR career during which he amassed 2,280 hits and 360 homers while adding defensive value almost right up to the end. It’s certainly worth Beane’s time to see if he has that kind of player on his roster, especially since it appears that the A’s are going to be running a talent show rather than a pennant race next season. My guess is that if Lawrie develops at third, he’ll have slightly more bat and slightly less glove than Gaetti, though to be fair, both men had exactly the same career minor league OPS (.851).

It’s less clear whether participating in this particular rat race is the best outcome for Lawrie. Like E5 and Gordon, he might be better served by moving to a safer corner where he can concentrate on developing his offensive skills without placing his body’s soft tissue in excessive danger. I’m sure if you asked him he’d say he wants to stay at third. My guess is that Encarnacion and Gordon once thought that way too, and Lawrie’s career to this point looks more like theirs than Gaetti’s.

Brett Lawrie, like Repo Man, is always intense. I have a hard time not rooting for him; he attacks his job with an explosive, exuberant passion that would get me (and probably you) fired. I want him to succeed. I’m not at all sure he will.


Adjusting to the New Reality

Adjusting to the New Reality

The level of offense in baseball has been dropping for some time now. In the 1980s and into the early 1990s, teams scored around 4.3 runs/game (with the exception of 1987, when offense jumped up to 4.7 runs/game for one year, then went right back down in 1988). Offense started to rise in 1993 and first jumped over 5 runs/game in 1996. Run-scoring peaked at 5.1 runs/game in 2000, then leveled off to around 4.8 runs/game through 2007. Since 2008, offense has gone down steadily, with 2014 seeing an average of 4.1 runs/game. You have to go back to 1981 to find fewer runs per game in baseball (4.0 runs/game).

This has implications in the world of fantasy baseball. Consider the table below that shows the ERA in Major League Baseball by year, going back to 2001:

YEAR ERA
2001 4.42
2002 4.28
2003 4.40
2004 4.46
2005 4.29
2006 4.53
2007 4.47
2008 4.32
2009 4.32
2010 4.08
2011 3.94
2012 4.01
2013 3.87
2014 3.74

 

Some would point to PED testing for the lower level of offense, some would blame a bigger strike zone, some would peg it on the increasing number of relievers throwing 95+ for an inning or two. Whatever the reason, this is the new reality and sometimes it can be hard to adjust to new realities.

Let’s look at the numbers shown above in more detail.

Over the stretch of years from 2001 to 2009, MLB had an ERA of 4.39. Over the three-year stretch from 2010 to 2012, ERA dropped to 4.01. The last two years have seen big drops each year, from 4.01 to 3.87, to 3.74.

This has repercussions in fantasy baseball. With ERA dropping quickly, we need to reevaluate the pitchers we take on draft day and during the season.

Let’s go back to 2009, when MLB had an ERA of 4.32. The top 60 starting pitchers in ERA (minimum of 160 IP) combined for an ERA of 3.54. The median ERA for this top 60 was 3.77. There were 11 pitchers with an ERA under 3.00.

Fast forward to 2014. Last year, MLB had an ERA of 3.74. The top 60 starting pitchers in ERA (minimum of 160 IP) combined for an ERA of 3.14. The median ERA for this group was 3.33. There were 22 pitchers with an ERA under 3.00.

2009 2014
ERA in MLB 4.32 3.74
ERA of Top 60 3.54 3.13
Median ERA of Top 60 3.77 3.33
Pitchers under 3.00 11 22

 

In 2009, the median guy in the top 60 was someone like John Danks (3.77) or Jarrod Washburn (3.78). Last year, the median guys in the top 60 were Jose Quintana (3.32) and Chris Archer (3.33). [Caveat: I know ERA isn’t the only way to judge a pitcher in fantasy baseball. I’m keeping it simple.]

Six years ago, when scouring the waiver wire, that pitcher with a 4.00 ERA was a potential pick-up. These days, you don’t want to look at that guy, he’ll just hurt your team. This may seem obvious, but it really is a change in mindset when you’re looking to improve your team. What we once thought was good is no longer good.

One of the side effects of a big drop in the run environment is the difficulty for projection systems to keep up. If we go back to the 2010 season, we can see a stark example. If a pitcher had league average ERAs in 2007 (4.47), 2008 (4.32), and 2009 (4.32), we could do a simple 3/2/1 weighted average for his three seasons and project an ERA of 4.35 for 2010. League-wide, though, ERA dropped from 4.32 in 2009 to 4.08 in 2010. Most projection systems will project ERAs that will be in line with the previous few seasons’ run environment. In this case, the projections will be well above what the actual ERAs were for the 2010 season (unless a projection system can anticipate such a drop in offense).

Let’s do the same for more recent seasons. If we take a pitcher with league average ERAs in 2011 (3.94), 2012 (4.01), and 2013 (3.87), and do a simple 3/2/1 weighted average, we get a 2014 projection of 3.93. The actual ERA in MLB in 2014 was 3.74, so pitchers as a group are going to be forecast with ERAs around 0.20 higher because the drop in offense was so drastic.

With this in mind, I looked at last year’s projections from four systems: Steamer, ZiPS, Davenport, and Oliver. I looked at all pitchers who were projected by each of the four systems who pitched 30 or more innings in 2014. There were 326 pitchers in this group and they finished 2014 with a combined ERA of 3.58. You can see how each of the projection systems forecast these players prior to the 2014 season:

2014 SEASON
Actual ERA 3.58
Davenport projection 3.76
Oliver projection 3.81
ZiPS projection 3.90
Steamer projection 3.91

 

When looking at the data, what you shouldn’t do is say that Davenport had the best projections. What is true is that Davenport best anticipated the run environment. Looking at the table, it would be easy to assume that Davenport and Oliver had the best projections, as they were closest to the actual ERA of this group of pitchers. In reality, if you are trying to assess which system better projected individual players, you would first want to adjust them all to the actual run environment, then compare the differences between projected ERA and actual ERA for individual pitchers.

In the case of the 326 pitchers used above, the table below shows the average absolute difference in actual ERA and projected ERA for each individual pitcher, using projections adjusted to the run environment of this group of pitchers.

Adjusted Projections
System AvgAbsDiff
Steamer 0.85
Davenport 0.86
Oliver 0.88
ZiPS 0.90

 

Looking at it this way, it’s easy to see that the different projection systems were very close on this group of 326 pitchers and Davenport and Oliver are in the middle of the pack, with Steamer moving from the bottom to the top.

What does this mean for 2015? If you’re the type of fantasy baseball player who likes to create your own projections by combining projections from other sources, you will first want to know what level of offense those projections are expecting (ERA in this example). If you think 2015 will be much like 2014 (3.74 league-wide ERA) but the projections expect an ERA much higher or lower, you should adjust all pitchers by the amount the projections are high or low. With these new adjusted projections, you can now combine your projections.

As an example, I took those same 326 pitchers from above and compared their actual combined ERA from 2014 to their 2015 Steamer projections. This group of pitchers had a combined ERA of 3.58 in 2014. Steamer is projecting them to have a 3.84 ERA in 2015. The difference is 0.26 in ERA. I don’t know the run environment Steamer is basing their projections on, but this would suggest that it’s higher than what we saw in 2014.

Based on the disclaimer that accompanies each team’s ZiPS projections, we know that ZiPS is projecting based on the AL having an ERA of 3.93 and the NL having an ERA of 3.75. This would be a slight increase from the 2014 season (AL: 3.82 ERA, NL: 3.66 ERA) and is, essentially, a 3/2/1 weighted average from 2012, 2013, and 2014.

I looked at the starting rotations for the five teams that we have ZiPS projections for so far. There are 25 pitchers and they are projected by ZiPS to pitch 3985 innings with a 3.73 ERA. These same 25 pitchers are projected by Steamer to pitcher 4039 innings with a 3.98 ERA. Steamer is high by 0.25. Steamer projects higher ERAs for 23 of these 25 pitchers. This is a small sample of just 25 pitchers, but it would appear that you will want to adjust the Steamer pitching projections down if you do any sort of combining of projections in your fantasy baseball prep.

In addition, if you’re in a keeper league and have access to last year’s data for your league, you may want to project your keepers and potential additions for 2015 and compare your team projections to last year’s stat categories. This way, you will have an idea of how competitive your team will be. For example, I’m in an 18-team, 25-man roster league. We have nine starters on offense, four starting pitchers, and two relievers in our active lineups, and a 10-man bench that can be made up of players from any position. Teams in this league averaged around 1000 innings last season, so when I create projections, I can plug in the stats for my keepers and potential additions to see how my team looks for the upcoming season. In order to compare my projected 2015 team to 2014 stat categories, I want my projections to be adjusted to the level of offense of 2014 (in this case, ERA).

Offense in baseball has been dropping for a few years now. Successful fantasy players will have to adjust to this new reality when doing their pre-season prep work, on draft day, and when adding players from the waiver wire.