Archive for March, 2015

Jacob deGrom Fearless Forecast

Matt Harvey is getting all the hype these days, touching 99 mph on the gun, throwing nasty 84 mph curves, and looking healthy. I think he will have an excellent year. For some reason though, the world at large is still underrating Jacob deGrom.

First off, I recommend you read this FanGraphs article from midsummer, detailing the changes he made to his pitching mechanics to make this “rags to riches” leap into the upper echelon.

I’ve been notoriously high on deGrom since I watched him pitch. I wrote about him on reddit back in July 2014. I’ll update the numbers I used, infra:

He’s been excellent — and not in any flukey kind of way. deGrom’s pitch types and peripherals support that what he did last year is VERY REAL.

Let me reiterate last year’s line: 140 IP (178.1 IP of usage), 9.2 k/9, 2.69 ERA, 1.14 WHIP, 2.67 FIP, 3.03 xFIP, 3.19 SIERA. Those are top-20 numbers. And unlike phenoms that regress with time (see Jesse Hahn in 2014), deGrom only got BETTER as the innings racked up.

That is what we love to see — for three reasons:

(1) His body can withstand the rigors of a 200 IP season,

(2) He IMPROVED, rather than regressing, and

(3) Hey, for those of us in H2H leagues, we want our guy pitching well for the fantasy playoffs!

His control improved with time, with increased strikeouts. As of my last post, he had an 8.8 k/9 and 2.7 K/BB. He ended year with a 9.3 k/9 and 3.4 K/BB. We love to see improvement in both those respects. Keep the walks down and strikeouts up, and success often naturally follows!

He’s generating a lot of swinging strikes. For reference, the league average sw/str% is approximately 8.6%.

Jacob deGrom has an overall 11.9 sw/str%, which is well above league average. Looking at pitch F/X data, his slider (12.4% sw/str%, 46/370 pitches), changeup (20.2%, 55/272), both fastballs (10.8%, 108/1000), and curveball (16.0%, 34/212) are all above-average, strikeout-quality pitches.

deGrom essentially features a five-pitch arsenal. Of 2,225 MLB pitches thrown:

44.9% (1000/2225) Fastballs averaging 93.5 mph. Max Velocity, 97.3 mph.

16.5% (368/2225) 2-Seam Fastballs averaging 93.2 mph, Max Velocity, 97.4 mph.

16.6% (370/2225) Sliders averaging 86.8 mph, Max Velocity 91.3 mph (adding mph to his slider is a huge part of his success).

12.2% (272/2225) Changeups averaging 83.9 mph.

9.5% (212/2225) Curveballs averaging 79.3 mph.

3 Cutters–not really a pitch he uses.

deGrom has a diverse arsenal of pitches, with some legitimate velocity differentials, and a good fastball, topping out at 97+ mph. He has 7 mph between fastballs and slider. 10 mph between fastballs and changeup. 14+ mph between fastballs and curveball. 22.5 mph between the high-end spectrum of his fastball and low-end spectrum of his curve.

Essentially, deGrom is legit. His peripherals and Pitch F/X data don’t really suggest that he’s due for any significant regression. Citi Field is still an excellent pitcher’s park, despite the fact that the fences were recently moved in (3-11 feet). I don’t think it will make a significant difference; maybe a home run or two leaves the park that wouldn’t have before.

It’s worth noting that his top speeds increased late in the year, logging his highest speed fastball in the second half of the season. Again, I love a pitcher that doesn’t fatigue.

Concerns: He had Tommy John surgery in 2010, but it seems he has worked his way back from that. Sophomore slump or hitters figuring him out are worth considering. And of course, a couple fly ball outs might turn into home runs.

Fearless prediction: 32 games, 210 IP, 2.80 ERA, 1.05 WHIP, 234 Ks (10 k/9) – and deGrom finally gains some respect withing the fantasy baseball community as a top-15 fantasy pitcher. That bold prediction being said, I think he’s being criminally underrated in fantasy drafts, with his ADP of 112 in yahoo leagues.

112! At that price, go ahead and reach.


Drafting an Injured Hunter Pence

Hunter Pence has been one of baseball’s most durable players since his first full-time season in 2008. Over the last seven years, Pence has never played fewer than 154 games, and he’s coming off a three-year stretch of 160, 162, and 162 games. He is the active leader in consecutive games played, with 382.

Unfortunately, that streak will end when the Giants open their regular season on April 6th in Arizona. Pence was hit by a pitch in a spring training game on Thursday and will be out six-to-eight weeks with a broken arm. Of course, in the real world, the important thing for Pence and the Giants is that he heals quickly and gets back on the field as soon as possible. In the fantasy baseball world, it’s natural to wonder how the injury affects his value on draft day.

One of the reasons Pence has been valuable in fantasy baseball over the years has been his durability. He has played almost every day for the last seven seasons, and this has allowed him to accumulate counting stats even if his rate stats are not elite. He’s not a 30-homer guy, rarely a 20-steals guy, and has only hit over .300 once since 2008. He’ll generally score 80- to 90 runs and drive in around 90. He’s scored 100 or more runs one time. He’s driven in 100 or more runs one time.

Consider his average season since 2008:

159 G, 671 PA, 172 H, 88 R, 24 HR, 89 RBI, 13 SB, .280 AVG

That’s solid across-the-board production but without any of the big, round numbers that are so exciting to see (100 runs, 30 homers, 100 RBI, 20 steals, .300 average). An interesting comparison is Carlos Gonzalez. Gonzalez is an elite player, when healthy. When he’s in the lineup, he’s a top 5 guy. Unfortunately, Gonzalez is often not healthy.

Consider the average season for Carlos Gonzalez since 2008:

109 G, 444 PA, 118 H, 69 R, 19 HR, 65 RBI, 16 SB, .294 AVG

Now let’s look at both Pence and Gonzalez since 2008, per 162 games played:

162 G, 686 PA, 176 H,   90 R, 25 HR, 91 RBI, 14 SB, .280 AVG—Hunter Pence

162 G, 662 PA, 177 H, 103 R, 29 HR, 98 RBI, 24 SB, .294 AVG—Carlos Gonzalez

Given the same amount of playing time, Carlos Gonzalez beats Hunter Pence across the board. Gonzalez is the guy that you can dream on to achieve the big, round numbers mentioned above. In the real world, though, despite his inferior statistics on a per-plate appearance basis, Pence has been the more valuable fantasy outfielder because of his durability.

So, what about 2015? How much does Pence’s broken arm affect his fantasy value?

I created dollar values using composite projections from Fantasy411.com (a combination of 12 sources). These projections are based on a 12-team league with 21 players, including 9 active hitters (no MI or CI), 7 active pitchers (2 SP, 2 RP, 3 P), and 5 bench spots. There were 63 outfielders projected for positive value (a little more than 5 per team). Using these projections, a healthy Hunter Pence is projected for the following stats:

644 PA, 159 H, 81 R, 20 HR, 82 RBI, 12 SB, .270 AVG

This puts him #12 among outfielders, but a dollar more in value would move him as high as ninth and a dollar less would drop him to 15th, so you could say he’s in the 9-15 range when it comes to outfielders. Others in that same range based on these projections are Ryan Braun, Jacoby Ellsbury, Corey Dickerson, Matt Kemp, Justin Upton, and Matt Holliday. With these stats (per this set of projections), Pence would be a late fourth-round pick.

Healthy Hunter Pence

$21

#12 OF (range is from 9 to 15)

Late 4th round

Comparable to: Ryan Braun, Corey Dickerson, Matt Kemp, Justin Upton

This year we know Pence will miss some time. The initial estimates say six to eight weeks until he’s ready to play. Pence seems to me to be the type of guy who will do whatever he can to get back on the field as soon as possible. In fact, I can’t imagine Pence could sit still for five minutes, let alone an entire baseball game. He’s probably going to drive his teammates crazy.

So let’s say Pence misses the month of April. That leaves him with five months of playing time. Some simple math would suggest the injured Pence will get 83% of the playing time a healthy Pence would get, so we’ll pro-rate his projection above to 83% of the playing time:

535 PA, 132 H, 68 R, 17 HR, 68 RBI, 10 SB, .270 AVG—83% of the season

Losing a month of playing time drops Pence’s value into the mid-30s among outfielders, around such players as Brandon Moss, Denard Span, Marcell Ozuna, and Alex Rios.

Injured Hunter Pence (missing one month of the season)

$9

#36 OF (range is from 33 to 39)

Early 13th round

Comparable to: Brandon Moss, Denard Span, Marcell Ozuna, Alex Rios

But wait, there’s more! We know Pence will miss time. It could be a couple weeks, it could be a month, it could be a month-and-a-half. We also know that we can replace him for that time, so we can factor in his replacement to get a better value for Pence. If you drop him all the way down to 83% of his projected stats, he drops too far on your cheat sheet and you’ll never acquire him.

Let’s factor in the value of a replacement outfielder for the time Pence is going stir-crazy on the Giants’ bench. Based on the composite projections from Fantasy411, the first five replacement outfielders are Michael Saunders, Michael Morse, Curtis Granderson, Angel Pagan, and Dexter Fowler. If you combine the stats for these five players and pro-rate them to one month’s worth of playing time, you get the following:

87 PA, 20 H, 11 R, 2 HR, 9 RBI, 2 SB, .258 AVG—Pence one-month replacement

Add this to our “83% of the season” numbers for Pence from above:

535 PA, 132 H, 68 R, 17 HR, 68 RBI, 10 SB, .270 AVG—83% of the season

And we get:

622 PA, 152 H, 78 R, 19 HR, 77 RBI, 12 SB, .268 AVG—Pence + Replacement

This batting line moves Pence back up the rankings. He becomes the #20 outfielder, in the range of Alex Gordon, Nelson Cruz, and Jason Heyward.

Injured Hunter Pence + Replacement Player for One Month

$17

#20 OF (range is from 18 to 23)

6th round

Comparable to: Alex Gordon, Nelson Cruz, Jason Heyward

Of course, your numbers may vary, but the process is the important part. A healthy Hunter Pence is a late 4th-round pick. An injured Hunter Pence with no replacement is an early 13th round pick. An injured Hunter Pence with a replacement player for one month is a mid 6th round pick.

The recent injury to Hunter Pence hurts his value, but he could still be someone to target if other owners shy away from him and he’s still around in the 7th round or later.


Don’t Hate Dee Because He’s Beautiful

I have every reason to hate Dee Gordon.

Prior to the 2012 season, I found myself struggling to figure out who would get the final keeper slot in a longtime, highly competitive fantasy league I played in. It came down to two players: Mike Trout and Dee Gordon. They both would have cost me the same, but Gordon was coming off a rookie campaign where he batted .304 with 24 steals in a miniscule 224 at-bats. Trout, on the other hand, was heading into 2012 with what seemed to me like a more clouded future. He had just posted a pedestrian .671 OPS with a 22.2 K%–albeit as a 19-year old–the year prior. He was also blocked in LF at the time by the great Bobby Abreu, and was looking at possibly another year of seasoning in the minors. In the end I chose Gordon, and the rest is terrible, nightmare-inducing history.

So how strange that I find myself here now, defending Dee Gordon, the very man who hoodwinked me into choosing him over Mike mother-flippin’ Trout.

Ironically, I think the hate for Gordon has gone a bit too far this year. It’s odd to think that there’s any hate for a guy coming off a season where he led all of baseball in steals while also posting a top-25 batting average of .289. But some people seem awfully down on the guy coming into 2015. Perhaps they too were burned by his 2011 breakout, and refuse to make the same mistake twice. Though I can’t fault them if that is the case, there is reason to believe that Dee Gordon’s days of breaking our hearts are over.

Gordon's Batted Ball Percentages 2014

The first thing to point out are his batted-ball rates. As the graph illustrates, there weren’t any earth-shattering changes occurring here. It is worth noting, however, that Gordon set a career high in groundball percentage and a career low in fly-ball percentage. And if you’re willing to consider 2013 an aberration like I am (he only managed 106 plate appearances that year), he has actually been gradually trending in the right direction with both his fly-ball and groundball percentages while maintaining a fairly steady line-drive rate. Spikes in groundball percentages are rarely considered ideal, but when a player has the elite speed Gordon does, the odds of turning a weak dribbler or a grounder towards the hole into a hit get a very favorable bump.

Which brings me to perhaps the most eyebrow-raising aspect of Gordon’s 2014 season: his bunt-hit percentage (BUH%). After averaging a 28.5 BUH% over the prior three seasons, Gordon posted a ridiculous 42.6 BUH% in 2014. To put that number into perspective, here’s how it stacked up against the league’s other elite speedsters:

2014 BUH% Among Elite Speedsters

Bunting for hits is a skill. The fact that his success rate rose by nearly 15% last year tells me that he worked on and dramatically improved this skill. Perhaps more importantly, though, it tells me that he’s keenly aware of how dangerous a weapon this skill can be for him when used effectively. When paired with his declining fly-ball rates–and especially his new career low IFFB% of 8%, down from 13.2%–the numbers start to paint the picture of a player who may have finally begun to consciously tailor his plate approach to his strengths.

While I will never forgive Dee Gordon for what he did to me, I do see reasons to be optimistic about his 2015 season. Should his elite ability to bunt for hits carry over into this season, his .346 BABIP shouldn’t see as much regression as people seem to think, and another year of plus average and a stolen-base crown seems well within his reach.


Is It Time to Re-Evaluate the Value of the Walk?

One of the founding notions of sabermetrics has been the emphasis of the walk. Before sabermetrics, in the dark ages, people hardly paid attention to the walk. Teams would pay players based on there batting average, HR, and RBIs and no one really put a lot of stock on the “scrappy” player who would draw walks and get on base. Sabermetrics essentially started around the mid 1900s and one of their founding principals was that the walk was way undervalued. Now the walk is deemed as an extremely valuable tool, and organizations will often pay a heavy hand for someone with a good walk rate. But what if the value of the walk was dropping, what if a walk in today’s game was not nearly as valuable as it use to be? Baseball you see is a living organism and is prone to change, just because something was valuable in the past, doesn’t mean it’s valuable in the present. We constantly need to be adjusting to the value of certain strategies and skills in order to stay ahead of the game.

This essentially all started when I looked at the correlation between pitches per plate appearance (Pit/PA) and runs scored per game (R/G), for 2014, and found that there was no real correlation (You can find the article here). I therefore decided to expand the data pool, look through a twenty year span to examine if 2014, was an anomaly, part of a consistent trend, or if Pit/PA never really had any correlation with (R/G).

So what I did was, I calculated the correlation coefficient of Pit/PA and R/G dating all the way back to 1994, for each individual year. If you don’t know what correlation coefficient is, or what is a strong or week correlation coefficient, I explain it, in my previous article. Anyways, the data that I found had a high level of variance. I did, however display two labels, the largest correlation coefficient in the last twenty years and the smallest. Why? Because although there is a large variation in the data from year to year, and it wouldn’t be unreasonable to believe that Pit/PA has a much higher correlation to R/G in 2015, it still is displaying a downward trend.

baseball

1994 had the highest correlation, while 2014 had the lowest correlation. So at this point you’ve probably noticed the variation and downward trend. Essentially what this tells us is that Pit/PA’s correlation with R/G is basically unpredictable. If your team, for example, sees a lot of pitches, it doesn’t mean that they will have a good offense. In fact if someone says that this team sees a lot of pitches and it’s a good thing, well he’s probably just blurting crap out. This is not to suggest that that individual is wrong, it is rather to suggest that seeing pitches doesn’t have a consistent correlation with runs scored. It is rather difficult then or impractical to come to any conclusion from this data set.

Now, what follows is an examination of similar trends and stronger trends of data. Oh, and I almost forgot, you’re also probably wondering well what about the base on balls, what was the point of that introduction? Well after I looked at the correlation between Pit/PA and R/G, I took a look at the correlation between BB% and R/G for 2014.

baseball2

This basically shows no distinct correlation between BB% and R/G in 2014. Then I calculated the correlation coefficient to get an exact number, and got R=0.0908. Essentially this displays that there was no correlation between BB% and R/G in 2014.

I therefore ran the numbers again, for 20 years, to see if this was just an abnormality in the data. I also wanted to get a sense of whether there was a specific trend.

 

baseball 3

For this chart I decided to display all the data sets, to give you an idea of what the correlations looked like. The two, however, that I really want you to focus on are the 2012 correlation (R=0.083) and 2014 (R=0.0908) correlation. Both of these years show a significant drop-off in the correlation between BB% and R/G. Before there was always a positive correlation between the two data points, even at times strong correlations. In 2014 and 2012, however, there was essentially no correlation between BB% and R/G.

So what does this mean? Why the sudden drop in data correlation and will it continue? I also found it odd that in 2013, the correlation went all the way back up to R=0.4749, which is not the strongest correlation, but still a good one.

First, however, before we try to answer the two questions I’ve asked, let’s look at another set of correlation data, and that’s the correlation between BB% and OBP. Why? Well my hypothesis was if the correlation between BB% and OBP is getting smaller than naturally the correlation between BB% and R/G would get smaller as well.

baseball 4

As you might be able to tell although less drastic the correlation between BB% and OBP has similar results to the correlation between BB% and R/G. Again the part of the graph, which you should focus on is the two outlier data points. Again they are 2012 (R=0.2317) and 2014 (R=0.3570). This at this point gives us some explanation for the two outlier data points in the previous graph.

Essentially what one needs to understand from this is, since BB% is becoming less correlated with OBP, it’s evidently going to have a lesser correlation with R/G. Since the primary value of a BB is the effect it has on the OBP (obviously though not the only). Also generally and through the 20 years of data there has been a strong correlation between BB% and OBP. Apart from 2012 and 2014 where their correlation is weaker, although still a positive correlation.

So now we need to understand this, if the walk has a small correlation with OBP, then its value will be significantly affected. The problem here is trying to figure out why in 2012 and 2014 there was a sudden drop in its correlation with OBP. My first hypothesis was that it had something to do with the overall BB% of the league.

league BB

In hindsight this was probably a simplistic hypothesis. At this point you’ve probably figured out that this was not the answer. Yes, the overall BB% is trending down, just like the previous charts, but the difference is that it doesn’t have the outliers of 2012 and 2014. (I included this to dispel a possible easy assumption to the answer.)

There are in fact several possibilities for the drop in correlation between BB% and OBP. Perhaps it’s the shift, perhaps it’s the low run environment, perhaps it’s high rise in strikeouts. I think another interesting element to look at it is how are hitters doing later in the count. Considering the rise in strikeouts, it’s probably not unreasonable to assume that hitters are performing worse than ever when hitting with two strikes, although this of course is just a hypothesis. The answer to that question is for another study, for another day. What is certain, however, is that this upcoming season will be a fascinating data point. Will the correlations keep getting smaller or are these two data points just truly abnormalities? In any case I think it’s important to consider this, baseball is an ever changing game, and just because something has value one year, doesn’t mean it has value another. Teams need to keep changing and mixing their strategies in order to stay ahead in this wacky game.

Finally, something to note: these data sets are not meant to arrive to any conclusion. I have not arrived at any conclusions about baseball through this data. What it does is, it raises more questions for further and more detailed and elaborate studies. For, example it would be interesting, for Pit/PA to look at it from a pitchers point of view, although I’m not sure that would give us different results. These data sets are also general; they give us a general idea of the situation. Perhaps there are specific teams or players that thrive on seeing a lot of pitches or that do translate a high number of BBs into runs. Also and this might be the most important element to note, correlations aren’t always linked with causation. For example, pop fly’s may have a positive correlation with Pit/PA, that doesn’t mean that pop fly’s caused Pit/PA. What correlations, however, can do is direct us into the right direction to finding the causation. It is a measure or a way of advancing and creating more elaborate and specific research.

So I conclude, now that one has digested all this data, is it time to re-evaluate the value of a walk?

 

All data courtesy of baseball reference.


Brandon Inge, Superstar

Brandon Inge, Superstar.

How many wins is chemistry worth? Do nice guys really finish last?

As a Pirates fan since birth, I’ve grown used to my baseball fandom engendering a sense of sympathy in others. Born in 1989, I came of baseball-loving age in the mid-nineties, immediately following the halcyon Bonds/Bonilla/Van Slyke & co. days and immediately preceding the less-halcyon days of the Aramis Ramirez-for-Bobby Hill trade, “Operation Shutdown,” the expansion-drafting of Joe Randa, Pat Meares’ general existence, the Moskos pick, the Matt Morris trade . . . (list of soul-crushingly depressing baseball stories truncated for reader’s mental health).

And yet I remained faithful, despite having no conscious memory of a Pirates team being anything other than heartbreakingly awful. I’ve since likened this experience, in conversations with friends, to Linus sitting in the pumpkin patch each year, waiting for the Great Pumpkin to appear. It sometimes seemed that the Great Pumpkin would never come.

It’s ironic, then, that in the year that finally saw the Great Pumpkin arrive in Pittsburgh (2013), the same city also witnessed the end of the career of one Charles Brandon Inge.

Inge, nicknamed ‘Cringe’ by some of the crueler Pittsburgh faithful for his anemic .181/.204/.238 batting line during the 2013 campaign, was at that point in his thirteenth season as one of baseball’s premiere utility men, playing every position on the diamond during his career. During his peak, he was a slick-fielding third baseman who also clubbed 27 HRs en route to a 4.1 fWAR season in 2006. But by 2013, Inge was 36 and on his way out of the league. Signed before the season to provide depth behind Pedro Alvarez and Neil Walker, Inge’s poor performance eventually led to his unceremonious release by the Pirates at the end of July.

And yet, this article has less to do with Inge’s on-field merits (which, as the previous paragraph suggests, were both significant and significantly variable), and more to do with Inge’s impact off the field. Inge won the 2010 Marvin Miller Man of the Year Award, given to the player whose “performance and contributions to his community inspire others to higher levels of achievement,” for his work with C.S. Mott Children’s Hospital. A frequent visitor to C.S. Mott, Inge also donated $100,000 for a new infusion center to treat pediatric cancer and twice hit home runs for young cancer patients. Dude’s a nice guy.

Perhaps more relevant, though, is pitcher and noted stathead Brandon McCarthy’s statement that Inge and fellow veteran Jonny Gomes had been worth twenty-four wins to the 2012 Athletics through chemistry alone. Normative ethics aside, it’s impossible to measure the moral character of a man—but we can measure, or at least attempt to quantify, the impact he has on his teammates.

Intrigued, I set out to determine whether Inge, patron saint of chemistry and all-around good guy, really made such a gigantic difference to his teammates’ performance. Mine is not the first investigation into this topic—Baseball Prospectus’ Russell A. Carleton examined the same issue in March of 2013, and there have been numerous attempts to place a valuation on chemistry over the years. But as you’ll see, there are some methodological differences to our approaches, and the differences expose some interesting conclusions.

Methodology

There is no ironclad way to assess Inge’s potential effect on his teammates, short of cloning entire teams of players, randomly assigning Brandon Inges to some of them, and having them play a large number of seasons.

In order to determine Inge’s value as accurately as possible, I can’t simply measure his teammates’ performance—I’d just be concluding that Inge played with good or bad teammates. Instead, I need to develop a counterfactual, or a method of estimating how we could’ve reasonably expected Inge’s teammates to play in his absence. Fortunately, an excellent one already exists—a ZiPS projection. ZiPS, to my knowledge, does not have a ‘played with Brandon Inge variable,’ so it should be unbiased. Carleton instead used an AR(1) covariance matrix to try to adjust for player talent, but given that ZiPS explicitly incorporates past performance with a view to projecting, as accurately as possible, how a player will perform in the upcoming season, I believe it is a suitable tool.

I chose wOBA as the dependent variable for our study—while Carleton looked at multiple indicators (BB%, K%, etc), one, all-encompassing measure of players’ offensive performance seems best suited to answering the question, “Do players perform better with Brandon Inge on their team?”

In order to develop the requisite dataset for this analysis, I downloaded every player-season since 2006[1] from FanGraphs’ leaderboards and filtered the data to include only those players who amassed at least 200 plate appearances. This yielded 3130 player-seasons. Next, I created a binary variable called ‘IngeTeammate,’ with a value of ‘1’ if the player was on Inge’s team during the given season (and not Inge himself), and ‘0’ if he wasn’t. For the 2012 season, the only one in which Inge played for multiple teams, I counted Inge as having played for the Athletics, with whom he spent the majority of the season.

The next part was a bit tricky—bringing in the ZiPS projections. The latest years, the ones for which ZiPS has been featured on FG, were easy—data was readily available, wOBA already calculated, and records already associated with a player id. But wading deeper into the past unearthed some issues—in order to match records, I had to manually match player names (including the two Chris Carters, and, apparently, two Abraham Nunezes . . . Nunezii . . . who knows?) and hand-calculate ZiPS-projected wOBA for older player-seasons using the weights provided on the FanGraphs Guts page. One potential issue with some of the oldest data is the lack of projections for things like intentional walks and sacrifice flies.

However, forging through all of the record-matching and manual wOBA-calculating eventually yielded ZiPS wOBA projections matched to 3088 of the 3130 player-seasons. Of the 42 unmatched seasons, only one was an Inge teammate (2010 Brennan Boesch). 81 of the 3088 matched seasons were Inge teammates. So unless you think ZiPS would have pegged Boesch, a relatively unknown 25-year-old at the time, for a significantly better performance than the .322 wOBA he posted in 2010, the unmatched records probably didn’t have a huge effect.

What we’re left with is data that look like this:

Year Name Team Age PA IngeTeammate ZiPS wOBA wOBAdiff wOBA
2010 Jose Bautista Blue Jays 29 683 0 0.322 0.100 0.422
2010 Jim Thome Twins 39 340 0 0.343 0.096 0.439
2010 Wilson Betemit Royals 28 315 0 0.302 0.084 0.386
2010 Josh Hamilton Rangers 29 571 0 0.365 0.080 0.445
2010 Chris Johnson Astros 25 362 0 0.286 0.067 0.353
2010 Carlos Gonzalez Rockies 24 636 0 0.350 0.063 0.413
2010 Justin Morneau Twins 29 348 0 0.387 0.061 0.448
2010 Paul Konerko White Sox 34 631 0 0.361 0.056 0.417
2010 Joey Votto Reds 26 648 0 0.383 0.055 0.438
2010 Danny Valencia Twins 25 322 0 0.299 0.052 0.351
2010 Giancarlo Stanton Marlins 20 396 0 0.305 0.051 0.356
2010 Miguel Cairo Reds 36 226 0 0.288 0.051 0.339
2010 Will Rhymes Tigers 27 213 1 0.288 0.050 0.338
2010 Tyler Colvin Cubs 24 395 0 0.301 0.050 0.351
2010 Michael Morse Nationals 28 293 0 0.328 0.049 0.377
2010 Adrian Beltre Red Sox 31 641 0 0.343 0.048 0.391
2010 Ryan Hanigan Reds 29 243 0 0.321 0.048 0.369
2010 Yorvit Torrealba Padres 31 363 0 0.279 0.044 0.323
2010 Matt Joyce Rays 25 261 0 0.321 0.043 0.364
2010 Aubrey Huff Giants 33 668 0 0.344 0.043 0.387
2010 Drew Stubbs Reds 25 583 0 0.295 0.043 0.338
2010 Andres Torres Giants 32 570 0 0.316 0.042 0.358
2010 Corey Patterson Orioles 30 341 0 0.274 0.042 0.316
2010 Austin Jackson Tigers 23 675 1 0.288 0.041 0.329
2010 Brett Gardner Yankees 26 569 0 0.306 0.040 0.346
2010 Colby Rasmus Cardinals 23 534 0 0.329 0.040 0.369
2010 Andruw Jones White Sox 33 328 0 0.323 0.039 0.362

In the above table, wOBAdiff refers to the amount by which the player outperformed his ZiPS wOBA projection. A negative number would indicate that a player underperformed his projection. So Jose Bautista outperformed his 2010 projection by .100—multiplying by 1000 tells us that this was 100 points of wOBA. It was good to be Joey Bats in 2010.

Results

If we look at the mean wOBA deviation (in terms of points of wOBA) Inge teammates and non-teammates experienced from their ZiPS projections, we see the following results:

  Player-Seasons Total PA Mean Weighted Diff. (wOBA pts)
Mean Unweighted Diff. (wOBA pts)
Non-Teammate 3007 1,378,732 -3.09 -4.62
Teammate 81 37,965 4.30 4.24

In other words, if we weight by plate appearances, Inge teammates outperformed their ZiPS projections by an average of about 4.30 points of wOBA. All other players underperformed their projections by an average of about 3.09 points. Which might not seem like a lot, but if you were to apply that 7.4 wOBA difference to an average-hitting team over a 6000 PA team-season, that’s roughly 34 runs. So 3.4 wins. Which is, you know, quite a bit. The unweighted version is even more extreme, suggesting that players with lower numbers of PA have outperformed their projections even more when teamed with Inge.

If we simply run a regression including the independent variables IngeTeammate (binary) and age and the dependent variable wOBAdiff (unweighted), we can express the story another way:

wOBAdiff = 0.0127064 + (IngeTeammate* 0.0090544) + (age* -0.0005993)

I included age as a control because ZiPS projections, as you can see from the model above, tended to slightly overproject older players in comparison to younger players, and therefore I needed to consider the possibility that Inge simply benefitted from playing only with young players (he didn’t).

Note that in the model above, 0.001 corresponds to one point of wOBA (i.e. a hitter moving from .323 to .324 would have gained a point of wOBA). The r-squared of the model is absurdly low (0.006), but that’s to be expected—after all, I’m not trying to assert that Brandon Inge is responsible for all or even a significant part of the variation between MLB players’ expected and actual performance. More importantly, the variable ‘IngeTeammate’ is significant at a 98.4% threshold.

Considering the possible influence of aging is interesting, as the Inge difference is even more pronounced among younger players, or those whom he allegedly mentored while playing with the A’s. If we filter the data above to include only players 27 and younger, the table looks like this:

  Player-Seasons Total PA Mean Weighted Diff. (wOBA pts)
Mean Unweighted Diff. (wOBA pts)
Non-Teammate 1241 568,944 -0.50 -2.09
Teammate 30 14,298 16.58 17.27

We’re starting to run into some serious sample size issues that make me uncomfortable drawing any particularly bold conclusions, but young players who play with Inge have done really, really well, collectively knocking the snot out of their ZiPS projections. There are problems with extrapolating this to a 6000 PA team-season, given that presumably an entire team won’t be composed of young players, but if one did so the result would be a ridiculous 78.6 runs of additional value.

The table below lists every 27-and-under player season for which the player was an Inge teammate:

Year Name Team Age PA ZiPSwOBA wOBAdiff wOBA
2008 Matt Joyce Tigers 23 277 0.275 0.084 0.359
2011 Alex Avila Tigers 24 551 0.308 0.076 0.384
2013 Jordy Mercer Pirates 26 365 0.282 0.051 0.333
2010 Will Rhymes Tigers 27 213 0.288 0.050 0.338
2012 Chris Carter Athletics 25 260 0.319 0.050 0.369
2011 Brennan Boesch Tigers 26 472 0.300 0.048 0.348
2007 Curtis Granderson Tigers 26 676 0.344 0.044 0.388
2010 Austin Jackson Tigers 23 675 0.288 0.041 0.329
2012 Yoenis Cespedes Athletics 26 540 0.328 0.040 0.368
2010 Miguel Cabrera Tigers 27 648 0.399 0.032 0.431
2013 Jose Tabata Pirates 24 341 0.308 0.032 0.340
2012 Josh Reddick Athletics 25 673 0.296 0.030 0.326
2013 Andrew McCutchen Pirates 26 674 0.365 0.028 0.393
2013 Starling Marte Pirates 24 566 0.317 0.027 0.344
2006 Omar Infante Tigers 24 245 0.306 0.016 0.322
2008 Curtis Granderson Tigers 27 629 0.358 0.015 0.373
2009 Clete Thomas Tigers 25 310 0.302 0.015 0.317
2012 Josh Donaldson Athletics 26 294 0.286 0.014 0.300
2011 Andy Dirks Tigers 25 235 0.297 0.011 0.308
2013 Neil Walker Pirates 27 551 0.328 0.005 0.333
2013 Pedro Alvarez Pirates 26 614 0.327 0.003 0.330
2006 Curtis Granderson Tigers 25 679 0.335 0.000 0.335
2009 Miguel Cabrera Tigers 26 685 0.407 -0.005 0.402
2010 Alex Avila Tigers 23 333 0.306 -0.007 0.299
2011 Austin Jackson Tigers 24 668 0.315 -0.010 0.305
2012 Jemile Weeks Athletics 25 511 0.304 -0.028 0.276
2012 Derek Norris Athletics 23 232 0.304 -0.029 0.275
2006 Chris Shelton Tigers 26 412 0.380 -0.033 0.347
2013 Travis Snider Pirates 25 285 0.310 -0.039 0.271
2008 Miguel Cabrera Tigers 25 684 0.419 -0.043 0.376

It’s not as if one year is hugely skewing the results—pretty much every year, whichever young players happen to be playing with Brandon Inge outperform their projections. The graph below illustrates the mean wOBA differential younger Inge teammates exhibited each season. I would’ve imagined, prior to viewing these results, that Inge’s positive ‘effect’ might’ve been almost entirely a product of the 2012 Athletics, but this doesn’t seem to be the case—outside of the 2006 Tigers (when Omar Infante, Curtis Granderson, and Chris Shelton collectively underperformed their ZiPS projections by a modest average of ~5 points of wOBA), Inge’s younger teammates have outperformed ZiPS every single year in the sample.

Perhaps, one could say, Inge has simply benefitted from playing on teams run by intelligent front offices. After all, the Tigers, Athletics, and (more recently) the Pirates all have reputations as relatively savvy management teams. Maybe they’re just collectively able to out-forecast ZiPS.

When we look at ZiPS wOBA differentials by team, however, the Tigers (+1.36 points of wOBA), Athletics (+0.11) and Pirates (-0.31) all had weighted mean differentials less than the Inge gap. The average over all teams was -2.89, so while all three front offices ‘beat the market,’ so to speak, they still don’t explain the huge Inge effect. It looks as though there’s something here.

After observing the results for Inge, I was curious about whether other veteran players might also exhibit similar correlations—while we’d expect to find no correlation with ZiPS wOBA differential for most players, it might be the case that, as with Inge, patterns emerge. Specifically, I looked at two players with diametrically opposite reputations—A.J. Pierzynski and Jonny Gomes. Below, I replicate the initial summary table used for the Inge analysis and note the magnitude of the effect:

A.J. Pierzynski

  Player-Seasons Total PA Mean Weighted Diff. (wOBA pts)
Mean Unweighted Diff. (wOBA pts)
Non-Teammate 3004 1,375,450 -2.75 -4.29
Teammate 84 41,247 -7.65 -7.87

The game’s most hated player didn’t fail to disappoint, as his teammates collectively underperformed their ZiPS projections by an additional of 4.9 points of wOBA when compared to non-teammates, an effect worth -22.6 runs to the team over the course of a full season. I should note that I assigned Pierzynski to the 2014 Red Sox (with whom he spent considerably more time) instead of the 2014 Cardinals—both teams underperformed their ZiPS projections, but the Red Sox did so by a larger margin.

Pierzynski’s unweighted results, while still negative, are less damning, and using a regressed model reflects this:

wOBAdiff = 0.0128794+ (AJTeammate* -0.0033689) + (age* -0.0005939)

The intercept and coefficient for age are, understandably, almost identical to those I observed in the Inge model. The significance level for AJTeammate, however, is only 64.1%, suggesting that we can’t really conclude much of anything with the same level of confidence as for Inge.

Still, twenty-plus runs is a non-negligible amount, and Pierzynski’s numbers have been negative across all four teams for whom he’s played (White Sox, Rangers, Red Sox, Cardinals). It may be that more historical data would reveal a broader trend, given that we’ve limited our sample size to only the latter half of Pierzynski’s career.

Jonny Gomes

  Player-Seasons Total PA Mean Weighted Diff. (wOBA pts)
Mean Unweighted Diff. (wOBA pts)
Non-Teammate 3000 1,376,613 -3.05 -4.56
Teammate 88 40,084 2.58 1.52

The phenomenally-bearded Gomes, Inge’s running partner in the Brandon McCarthy quote that triggered this analysis, also appears to be a potential chemistry star, though his results are less extreme than Inge’s. His teammates outperformed non-teammates by 5.6 points of wOBA, worth an estimated 26 runs per season.

wOBAdiff = 0.0124387+ (GomesTeammate* 0.0055032) + (age* -0.0005873)

The effect, as with Pierzynski, is not statistically significant—the significance level is 87.4%.

Conclusions

We can’t make firm statements about causality from this analysis, but we can say pretty conclusively that being on the same team as Inge during the last nine years correlates positively with hitting better than ZiPS projects you to hit.

Maybe you don’t believe Inge should get credit for the extra 3.4 wins of value each year. We don’t have a ‘chemistry above replacement’ metric to account for the fact that some other player with a modicum of veteranosity might plausibly have a positive effect if analyzed the same way. And there’s no feasible way to develop one on the horizon—you can only start to do this sort of analysis retrospectively, and it requires a large number of plate appearances and player-seasons before we can conclude that any pattern has emerged. I’m not really arguing that Inge deserves all the credit for his teammates’ overperformance, only that we have reason to believe a nonzero effect may exist.

But let’s entertain, for a minute, the possibility that the 3.4 win-per-season gap we see *is* entirely attributable to Inge. That maybe all the minute, unnoticed interactions between players over the course of a season can add up to improved performance at the plate. The effect could even be greater than 3.4 wins—I didn’t examine pitching and fielding at all. After all, everything we know about human psychology suggests that happier workers are more productive, and I’ve yet to hear any compelling reason that ballplayers constitute an exception. We sometimes, in the analytics community, fall into the trap of assuming that because we can’t measure something accurately, it doesn’t deserve a meaningful place in our analysis. And yet our inability to measure a phenomenon is not proof of its nonexistence—just ten years ago, we lacked meaningful metrics for catcher framing, for instance.

Perhaps Inge contributed more hidden value over the last decade than anyone this side of Jose Molina, and Brandon McCarthy’s twenty-four wins were, if still hyperbole, grounded in a subtle truth. 3.4 wins currently has a market value north of $20M, making Inge a substantially underpaid man over the course of his career.

It’s a shame, on some level, that it’s only after he’s retired that we recognize the unheralded Inge for who he might secretly have been: Brandon Inge, Superstar.

 

[1] Before 2006, I struggled to find ZiPS projections in a readable format to develop the counterfactuals.

Data retrieved from FanGraphs and Baseball Think Factory.


Hardball Retrospective – The “Original” 2012 Tampa Bay Rays

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. Consequently, Hank Aaron is listed on the Braves roster for the duration of his career while the Blue Jays claim Carlos Delgado and the Brewers declare Paul Molitor. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the finest single-season rosters for every Major League organization based on overall rankings in OWAR and OWS along with the general managers and scouting directors that constructed the teams. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. Additional information and a discussion forum are available at TuataraSoftware.com.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

Assessment

The 2012 Tampa Bay Rays             OWAR: 46.4     OWS: 254     OPW%: .607

GM Chuck Lamar acquired 77.7% (21 of 27) of the ballplayers on the 2012 Rays roster. With the exception of Elliot Johnson and Jose Veras all of the players were selected during the Amateur Draft. Based on the revised standings the “Original” 2012 Rays registered 98 victories and secured the American League Eastern division title by a 16-game margin over the New York Yankees.

David Price (20-5, 2.56) collected the 2012 AL Cy Young Award for his superlative campaign in which he topped the Junior Circuit in victories and ERA while striking out 205 batters. “Big Game” James Shields (15-10, 3.52) tallied 223 whiffs and fashioned a 1.168 WHIP. Jeremy Hellickson (10-11, 3.10) provided a reliable effort in his sophomore season and added a Gold Glove Award to his trophy case. Jason Hammel (8-6, 3.43) and Matt Moore (11-11, 3.81) stabilized the back-end of the rotation.

Jake McGee led the bullpen crew with a 1.95 ERA and a WHIP of 0.795. Wade Davis contributed an ERA of 2.43 in 54 relief appearances after starting 64 contests in the three prior campaigns.

ROTATION POS WAR WS
David Price SP 6.4 19.12
Jeremy Hellickson SP 3.57 11.21
James Shields SP 2.85 12.33
Jason Hammel SP 2.82 9.74
Matt Moore SP 1.76 8.07
BULLPEN POS WAR WS
Jake McGee RP 1.09 7.5
Wade Davis RP 0.9 6.43
Jose Veras RP 0.75 5.01
Chris Seddon SW 0.37 1.97
Chad Gaudin RP -0.46 2.68
Alex Cobb SP 1.14 6.18
Jeff Niemann SP 0.5 2.07
Dan Wheeler RP -0.68 0

Josh Hamilton blasted 43 round-trippers and scored 103 runs (both career-bests) en route to a fifth-place finish in the 2012 A.L. MVP balloting. B.J. Upton and protégé Desmond Jennings nabbed 31 bags apiece at the top of the order. Upton established a personal best with 28 circuit clouts. Evan Longoria batted .289 with 17 jacks despite missing 88 games due to a partially torn hamstring. John Jaso delivered a career-high .394 OBP and Jonny “Ironsides” Gomes swatted 18 big-flies. 

LINEUP POS WAR WS
B. J. Upton CF 2.56 19.64
Desmond Jennings LF 1.62 15.18
Josh Hamilton DH/CF 4.39 25.5
Evan Longoria 3B 2.39 11.12
Jonny Gomes RF/DH 2.05 13.04
John Jaso C/DH 2.83 15.96
Aubrey Huff 1B 0.07 1.14
Elliot Johnson 2B/SS 1.02 8.62
Reid Brignac SS -0.19 0.52
BENCH POS WAR WS
Carl Crawford LF 0.46 3.19
Jason Pridie RF 0.1 0.5
Matt Diaz LF -0.25 1.61
Stephen Vogt C -0.35 0.16
Delmon Young DH -1.37 6.95

The “Original” 2012 Tampa Bay Rays roster

NAME POS WAR WS General Manager Scouting Director
David Price SP 6.4 19.12 Andrew Friedman R.J. Harrison
Josh Hamilton CF 4.39 25.5 Chuck LaMar Dan Jennings
Jeremy Hellickson SP 3.57 11.21 Chuck LaMar Tim Wilken
James Shields SP 2.85 12.33 Chuck LaMar Dan Jennings
John Jaso DH 2.83 15.96 Chuck LaMar
Jason Hammel SP 2.82 9.74 Chuck LaMar Dan Jennings
B. J. Upton CF 2.56 19.64 Chuck LaMar Dan Jennings
Evan Longoria 3B 2.39 11.12 Andrew Friedman R.J. Harrison
Jonny Gomes DH 2.05 13.04 Chuck LaMar Dan Jennings
Matt Moore SP 1.76 8.07 Andrew Friedman R.J. Harrison
Desmond Jennings LF 1.62 15.18 Andrew Friedman R.J. Harrison
Alex Cobb SP 1.14 6.18 Andrew Friedman R.J. Harrison
Jake McGee RP 1.09 7.5 Chuck LaMar Cam Bonifay
Elliot Johnson SS 1.02 8.62 Chuck LaMar Dan Jennings
Wade Davis RP 0.9 6.43 Chuck LaMar Cam Bonifay
Jose Veras RP 0.75 5.01 Chuck LaMar Dan Jennings
Jeff Niemann SP 0.5 2.07 Chuck LaMar Cam Bonifay
Carl Crawford LF 0.46 3.19 Chuck LaMar Dan Jennings
Chris Seddon SW 0.37 1.97 Chuck LaMar Dan Jennings
Jason Pridie RF 0.1 0.5 Chuck LaMar Dan Jennings
Aubrey Huff 1B 0.07 1.14 Chuck LaMar Dan Jennings
Reid Brignac SS -0.19 0.52 Chuck LaMar Cam Bonifay
Matt Diaz LF -0.25 1.61 Chuck LaMar Dan Jennings
Stephen Vogt C -0.35 0.16 Andrew Friedman R.J. Harrison
Chad Gaudin RP -0.46 2.68 Chuck LaMar Dan Jennings
Dan Wheeler RP -0.68 0 Chuck LaMar
Delmon Young DH -1.37 6.95 Chuck LaMar

Honorable Mention

The “Original” 2008 Rays                   OWAR: 39.4     OWS: 276     OPW%: .528

Five members of the 2008 Tampa Bay Rays accrued at least 20 Win Shares including Josh Hamilton, B.J. Upton, Aubrey Huff, Evan Longoria and Akinori Iwamura. Hamilton hit .304 with 32 jacks and a League-leading 130 RBI. “Huff Daddy” launched 32 four-baggers and knocked in 108 baserunners. Longoria (.272/27/85) claimed Rookie of the Year honors and Upton swiped 44 bases.

On Deck

The “Original” 2009 Rockies

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database – Transaction a – Executive 

SB Nation – “Evan Longoria injury – 2012 return in question”

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive


K-BB vs. the RotoGraphs Top Starting Pitcher Rankings

Back on January 1, I wrote an article proposing a “quick-and-easy” way to rank starting pitchers for fantasy baseball. The TL; DR (Too Long; Didn’t Read) summary is that you can take the projections of your starting pitchers and rank them by the simple metric “strikeouts minus walks” (K-BB). I also looked at slightly more complex metrics like “strikeouts minus walks minus home runs” (K-BB-HR) and “strikeout rate minus walk rate, divided by games started” (K%-BB%/GS) and both of those had a slightly better correlation, but are not as simple.

The correlation between the starting pitcher rankings based on K-BB and starting pitcher rankings based on dollar values was around 0.80 for each of the last three years.

At the time, I created a list of the top starting pitchers based on Steamer projections, as those were the only readily available projections out there. Now that we’re getting closer to the season, more projections are available. At Fantasy411, they have a downloadable spreadsheet with the composite projections from 12 different providers. It’s a true “wisdom of the crowds” approach.

Using this collection of projections, I ranked the starting pitchers using the very simple K-BB metric and compared those rankings to the consensus rankings for starting pitchers on the updated RotoGraphs Top 300. I downloaded the spreadsheet from the post on February 17th by Paul Sporer where he explained that players not ranked by a writer would get a “last ranked+1” for that particular player. There were 87 starting pitchers ranked in the Top 300. [Note: I would have used K-BB-HR but the composite projections did not have home runs allowed for pitchers]

First off, the correlation between the RotoGraphs Top 300 rankings for these 87 starting pitchers and my rankings based on K-BB came out to 0.81. Also, 46 of the 87 pitchers (53%) were within 12 spots of each other on the two lists, or the equivalent of one round in a 12-team league. Seventy-four of the 87 pitchers (85%) were within 24 spots of each other, the equivalent of two rounds in a 12-team league.

The charts below show the starting pitchers based on the RotoGraphs Top 300, along with their rank by K-BB, the difference between the two, and the composite projection from the Fantasy411 sources for selected pitchers who were off by a significant number of picks. By looking at the projections for these pitchers, we may better understand why the rankings differ so much.

The Top 20:

Most of the pitchers in the top 20 are similarly ranked by the RotoGraphs’ Five and the K-BB method. Just one of these 20 pitchers has a rankings difference that is off by more than 12 (one round in a 12-team league). Carlos Carrasco has the biggest difference in this group of pitchers between his K-BB rank of 44th and his RotoGraphs’ rank of 16th. Carrasco is a popular sleeper. He’s such a popular sleeper that he’s probably no longer a sleeper. I think most people are wide-awake on Carlos Carrasco by this point. The composite projection has Carrasco down for 156 innings in 2015. Steamer projects 163 innings, ZiPS has him for 119, and the optimistic Fans are projecting 191 innings, which is more than Carrasco has pitched in the last two seasons combined.

 

The Next 20 (21-40)

There are more differences as we move down the list of starting pitchers. Sonny Gray (ranked 25th by RotoGraphs, 47th by K-BB) is an interesting guy to look at. For his career, Gray has a 2.99 ERA and 1.17 WHIP, but his FIP is 3.39, xFIP is 3.34, and SIERA is 3.44. Gray’s ERA and WHIP have been helped by a .277 career BABIP. That’s quite low, but Oakland as a team allowed a .276 BABIP in 2013 (2nd best in baseball) and .272 BABIP in 2014 (tops in baseball). If you expect that to continue, then Gray is probably better ranked by the RotoGraphs Five. Steamer (3.75 ERA, 1.29 WHIP) and ZiPS (3.36 ERA, 1.26 WHIP) are not so optimistic.

Phil Hughes’ impressive ability to limit bases on balls might have him ranked too highly by K-BB.

Andrew Cashner has the second largest difference between his ranking by K-BB (81st) and RotoGraphs (37th) of any pitcher in the RotoGraphs Top 300. Cashner has a history of injuries and he’s on record as saying he’s focusing more on getting quick outs than strikeouts. Over the last two seasons, his K% has been 18.1% and 18.4%. That 18.4% mark last year placed him 80th among pitchers with 120 or more innings. His 5.7% BB% placed him 41st and that was the best BB% of his career. Looking at just strikeouts and walks it’s easy to see why Cashner is ranked by K-BB among pitchers like Matt Cain and Jon Niese rather than Cliff Lee and Zack Wheeler (Cashner is between those two in the RotoGraphs starting pitcher rankings).

 

The Next 20 (41-60)

Garrett Richards’ composite projection calls for 137 strikeouts in 160 innings, which comes out to a 7.7 K/9. Steamer and ZiPS both project Richards to strike out around 8.2 batters per nine innings. If Richards’ composite projection is upped to a strikeout rate of 8.2 K/9, he would move up to 64th on the K-BB list.

Dallas Keuchel was very successful last year, posting a 2.93 ERA and 1.18 WHIP despite a middling strikeout rate (6.6 K/9). He succeeded last year with a terrific ground ball rate (63.5%) and by allowing far fewer home runs than he had in his first two years in the big leagues. The K-BB metric ranks Keuchel 77th among starting pitchers based on the two things a pitcher has the most control over.

Justin Verlander and Ian Kennedy are the two pitchers with the biggest difference in rankings in favor of K-BB over the RotoGraphs Five rankings and Drew Hutchison and Scott Kazmir are both in the top seven. Verlander is coming off an ugly 4.54 ERA, 1.40 season in which his strikeout rate dropped just below 7.0 K/9 after being around 9 K/9 for the bulk of his career. The composite projection expects his strikeout rate to go back up to 7.7 K/9 and his ERA to come down close to his 2014 FIP of 3.74. With a projection of 208 innings, Verlander is ranked 25th by K-BB, 34 spots ahead of where he’s ranked by the RotoGraphs’ writers. Similarly, Ian Kennedy is ranked 23rd by K-BB and 55th by RotoGraphs. He’s coming off a better year than you might realize, with 9.3 K/9 and a 3.21 FIP, but a 3.63 ERA.

 

The Final 27 (61-87)

In this final group of pitchers, the guys that K-BB likes much more than the RotoGraphs’ writers include John Lackey, Mike Minor, and A.J. Burnett. It’s possible that Lackey (36 years old) and Burnett (38 years old) were ranked lower by the RotoGraphs’ writers because of expected age-related decline. Also, Burnett had a 4.59 ERA and 1.41 WHIP with the Phillies in 2014. The composite projection may be looking at the 38-year-old Burnett through rose-colored glasses when he’s projected for 195 innings and an ERA below 4.00, but he has pitched an average of 202 innings over the last seven years and had 213 2/3 innings last season. Like Burnett, Minor is coming off a terrible year—4.77 ERA, 1.44 WHIP, which has him ranked 77th by the RotoGraphs Five. As bad as his results were in 2014, Minor’s strikeout rate was in the range of his two previous seasons and his walk rate was only slightly worse than his career mark. After the season he just had he’s a potential buy low candidate based on K-BB.

The three pitchers in this group who are much higher ranked by the RotoGraphs writers are James Paxton (the biggest difference in ranking of all the pitchers on this list), Tanner Roark (4th largest difference), and Henderson Alvarez (5th largest difference). Paxton (143 innings) and Roark (122 innings) have low playing time projections that limit their K-BB value. Henderson Alvarez is projected for a solid 182 innings, but with a projected strikeout rate of just 5.3 K/9 he gets little love from the K-BB metric.

This comparison included all 87 pitchers who were ranked in the RotoGraphs Top 300. The following pitchers are among the top 87 when ranked by K-BB and don’t show up on the RotoGraphs Top 300:

 

#60 CC Sabathia

#69 Wade Miley

#74 Bartolo Colon

#79 Yovani Gallardo

#81 Bud Norris

#84 Jon Niese

#84 Ricky Nolasco

#86 Trevor Bauer

 

I plan to revisit this at the end of the year. I’ll compare the RotoGraphs’ rankings and the K-BB rankings for these 87 pitchers to the actual end of season dollar value rankings for starting pitchers in 2015.


A PCA for Batter Similarity Scores (Part 1: Basic Methodology)

This is the first in a series of pieces on a tool I’ve been working on. Admittedly, right now it’s quite raw, and probably needs some adjustments, which I’ll elaborate on towards the end of this post. It’s also quite lengthy – set it aside for when you have ample time to follow along, as there are some example calculations included to demonstrate the process.

Most of you are familiar with the “Similarity Scores” feature on Baseball Reference. If not, the explanation can be found here. The idea is to provide player comps using the player’s statistics. This has been around a while, and is based on a fairly simplistic “points-based” approach. Such an approach has the advantage of being easy to follow and intuitive, and as a quick tool to create fun conversation, it’s nice. However, it’s not very useful for purposes of projection for many reasons – not the least of which being that the points used are arbitrary and the statistics used are result statistics (hits, HRs, RBIs, etc) rather than being process-driven. It’s also intended to work on a player’s entire career. Some players have one or more drastic shifts in results over the course of their careers – and, to project a player in 2015 from his work in 2013-2014, we need to isolate data by season.

With the mountains of granular data available since Similarity Scores were first published, I thought it would be interesting to take a cut at creating something new in the same vein. My primary objectives were to create a similarity metric that (a) compared individual seasons rather than entire careers; (b) was based primarily on a hitter’s “process” or approach at the plate rather than strictly on results which are influenced heavily by luck; and (c) was mathematically defensible, in other words, non-arbitrary.

Read the rest of this entry »


Pitch Grades vs. Relative Pitch Grades

When deciding on the grade for a pitcher’s breaking pitch, a scout relies on the pitch’s velocity and movement (although command can be factored in as well).  These factors are combined into a single number on a scale from 20-80, with major league average as a 50 and a standard deviation recorded at 10.

Clayton Kershaw’s curveball has long been regarded as one of the best in the business, yet by my systematic calculation factoring in velocity, horizontal movement, and vertical movement, his curveball rated among the bottom third of curveballs.  Its below-average velocity and below-average horizontal movement held it back despite its above-average vertical movement.  While I was pondering this conundrum, I remembered another fact–Kershaw’s fastball happens to have a lot of rise.  What if movement was recorded by the difference between the pitcher’s fastball movement and the breaking ball movement instead of the breaking ball’s movement compared to an arbitrary point?  I was about to find out.

(This paragraph is solely methodology, so skip it if you wish.)  Using Baseball Prospectus’ excellent pitch f/x leaderboard, I selected all pitchers who threw at least 200 four-seam fastballs and at least 100 curveballs in 2014.  A breaking pitch’s horizontal and vertical movement was recorded as the difference between the pitch’s raw movement and the pitcher’s four-seam fastball’s movement.  I calculated the z-scores for the curveball’s velocity and the z score for the combination of the z scores of the curveball’s relative movements.  (I gave a 150% weight to vertical movement over horizontal movement).  Then, I combined the z scores of the velocity and combined relative movement to calculate a relative pitch score.  (I gave a 150% weight to combined relative movement over velocity).  Finally, I calculated a scouting grade on the 20-80 scale based off the relative pitch score.  Below is a table showing my results.

While I may not have solved the difference of evaluation among Kershaw’s curveball, the relative scouting grade at least opens discussion on how movement and velocity of pitches should be evaluated.  Is it better for a breaking pitch to be faster, or is it better to create a wider difference in velocity between the fastball and breaking ball?  Is it better for a pitcher’s breaking ball movement to be as different from the fastball as possible, or do some similarities create greater deception because a hitter can’t recognize the pitch earlier?

Player CU Vel CU H Mov CU V Mov Rel SG Unadj SG
Garrett Richards 79.68 5.85 -12.33 76.26 71.63
Sonny Gray 82.45 9.21 -5.44 76.09 67.11
Justin Grimm 81.63 7.15 -6.92 72.63 64.67
Blaine Hardy 78.62 1.55 -9.74 71.28 53
Adam Wainwright 75.37 9.34 -9.23 69.59 60.71
John Axford 78.77 4.37 -9.81 69.07 59.61
Carlos Torres 79.9 6.43 -9.09 68.1 64.79
Robbie Erlin 74.28 0.81 -10.64 68.01 43.53
Felix Hernandez 80.97 7.25 -8.16 67.84 66.62
Yu Darvish 78.16 8.5 -7.46 67.83 60.8
Yordano Ventura 83.76 2.28 -5.63 65.12 55.81
Clay Buchholz 78.25 8.88 -7.36 64.52 61.56
Brandon Workman 77.09 4.68 -9.35 64.24 55.08
Tyler Skaggs 77.58 6.64 -8.92 63.97 59.31
Jake Arrieta 80.12 5.81 -9.46 63.55 64.97
Juan Gutierrez 81.1 6.16 -6.77 62.94 60.89
Kevin Jepsen 84.52 3.9 -6.89 62.68 64.44
James Paxton 82.54 0.39 -2.69 62.49 41.05
Chris Tillman 76.22 3.31 -10.43 62.3 52.94
Craig Kimbrel 86.3 4.84 -5.95 62.14 68.18
Adam Warren 81.94 4.9 -6.62 61.45 59.77
Dellin Betances 83.85 7.04 -3.79 61.29 61.37
Edinson Volquez 80.67 5.95 -7.3 60.89 60.83
Wade Davis 85.56 3.29 -4.69 60.28 59.75
Tom Wilhelmsen 78.85 6.01 -7.58 60.26 57.4
Trevor May 77.76 7.56 -6.23 59.46 54.56
Cody Allen 86.88 5.09 -3.71 59.12 64.13
Tyler Thornburg 78.45 2.83 -7.21 59.01 48.63
Casey Janssen 74.81 9.84 -5.38 58.98 50.23
Trevor Bauer 79.18 5.17 -8.33 58.51 58.36
Brad Peacock 77.37 5.91 -7.39 57.85 53.18
Brett Oberholtzer 79.73 1.73 -3.22 57.83 38.69
Cole Hamels 79.01 3.23 -6.2 57.81 48.13
Josh Tomlin 76.76 4.39 -7.46 56.34 48.65
Drew Pomeranz 81.77 4.9 -7.41 56.24 61.47
Gio Gonzalez 78.4 5.35 -8.64 56.18 57.73
Andre Rienzo 79.26 6.3 -6.52 55.97 56.17
Scott Atchison 80.83 4.81 -8.52 55.78 62
Scott Kazmir 77.03 0.12 -3.35 55.77 29.19
Ian Kennedy 78.23 6.26 -9.1 55.71 60.51
Nick Tepesch 78.82 5.36 -8 55.71 57.04
Cory Rasmus 76.65 5.44 -7.84 55.67 51.66
Tom Koehler 79.92 5.1 -8.64 55.42 60.79
Marco Estrada 77.92 4.66 -6.27 55.31 48.81
Mike Fiers 72.93 3.7 -11.31 55.29 48.33
Odrisamer Despaigne 76.43 8.24 -7.25 55.08 55.59
Francisco Rodriguez 76.96 7.14 -6.73 54.97 53.1
Anthony Ranaudo 78.16 4.35 -7.93 54.82 53.13
Stephen Strasburg 80.69 7.59 -7.28 54.74 64.35
Mike Minor 81.66 0.98 -3.82 54.69 43.24
Yovani Gallardo 79.95 4.01 -6.75 54.68 53.49
Jeremy Hellickson 76.7 7.51 -9.53 54.53 60.71
Justin Verlander 79.98 4.97 -6.45 54.29 54.83
Dillon Gee 74.91 8.15 -7.75 54.19 53.13
J.A. Happ 78.27 2.13 -4.71 53.89 40.06
Kevin Quackenbush 77.17 5.05 -7.88 53.7 52.15
Roenis Elias 79.81 7.16 -6.09 53.3 58.18
Marcus Stroman 83.34 8.96 -2.3 53.27 60.33
Jason Vargas 75.68 1.63 -5.08 53.05 33.84
Clayton Kershaw 74.61 2.35 -8.93 52.85 43.08
Collin McHugh 73.68 8.26 -9 52.55 53.77
David Phelps 80.72 2.7 -5.06 52.51 48.01
Tommy Hunter 83.55 6.01 -3.14 52.48 56.72
Wesley Wright 79.8 3.86 -4.65 52.48 47.24
Miles Mikolas 75.4 6.22 -9.7 52.36 55.32
Javy Guerra 77.99 3.47 -7.42 52.2 49.48
Jason Hammel 77.19 6.3 -7.76 52.1 54.57
Vic Black 82.68 3.57 -3.91 51.95 51.46
Phil Coke 80.48 0.98 -3.37 51.9 39.25
Nick Martinez 76.92 3.84 -8.05 51.86 49.41
Jordan Zimmermann 79.71 5.65 -6.73 51.77 56.4
Phil Hughes 77.22 6.62 -7.84 51.53 55.54
Zack Greinke 72.88 6.58 -7.08 51.41 43.17
Colby Lewis 77.7 6.12 -5.81 50.97 50.21
Joe Kelly 79.88 6.94 -8.44 50.7 64.12
David Price 80.37 2.84 -1.58 50.68 38.24
Tim Lincecum 75.63 3.81 -8.37 50.62 47.15
Junichi Tazawa 76.12 6.87 -8.14 50.61 54.27
Matt Garza 75.25 4.66 -8.62 50.4 48.74
Jake Peavy 80.56 2.49 -1.91 50.39 38.81
Joba Chamberlain 79.74 5.1 -6.03 50.25 53.43
John Lackey 79.16 5.54 -5.39 50.13 51.3
Grant Balfour 82.74 2.44 -1.31 50.07 42.27
Wei-Yin Chen 74.9 3.11 -6 50 37.62
Danny Duffy 78.22 3.8 -6.76 49.95 48.98
Michael Wacha 75.4 4.99 -6.15 49.88 43.24
Zack Wheeler 79.61 6.2 -8.2 49.75 61.25
Anthony Varvaro 80.73 4.14 -5.43 49.68 52.11
Shelby Miller 77.76 7.65 -5.21 49.51 52.05
Edwin Jackson 79.94 1.31 -3.97 49.41 40.28
Miguel Gonzalez 77.47 5.77 -6.3 49.39 50.22
Anibal Sanchez 79.85 3.27 -3.53 49.35 43.11
Jesse Hahn 74.64 8.63 -8.05 49.33 54.32
Brandon McCarthy 82.24 5.96 -4.24 49 56.44
Mat Latos 76.95 3.64 -4.84 48.99 40.53
Tanner Roark 74.25 6.13 -9.04 48.69 50.65
Vance Worley 77.87 6.06 -5.44 48.53 49.5
J.J. Hoover 75.78 6.82 -6.21 48.49 48.24
Jose Fernandez 83.56 8.96 -1.82 48.41 59.58
Felix Doubront 75.43 2.87 -8.77 48.33 45.72
Jordan Lyles 81.77 2.16 -3.92 48.31 46.3
Santiago Casilla 82.01 4.48 -5.46 48.18 55.95
Kevin Correia 79.47 6.02 -3.46 48.15 47.94
Erik Bedard 74.86 4.96 -6.51 48.07 42.86
James Shields 80.39 3.11 -4.08 47.79 45.51
Gerrit Cole 84.65 6.34 -3.47 47.79 60.91
Chase Anderson 77.79 4.59 -7.32 47.24 51.15
Rick Porcello 78.15 7.45 -5.93 46.98 54.45
Travis Wood 72.92 1.78 -6.47 46.64 31.32
Samuel Deduno 81.59 6 -5.39 46.63 58.04
Johnny Cueto 81.53 2.02 -1.73 45.52 39.62
David Buchanan 77.95 4.31 -8.22 45.48 53.31
Ian Krol 79.05 5.17 -4.38 45.11 47.56
Erasmo Ramirez 80.4 3.22 -1.79 45.05 39.68
Charlie Morton 78.99 9.72 -7.1 45.02 64.43
Matt Cain 78.2 7.49 -5.26 44.99 52.88
Jose Quintana 80.94 2.22 -2.39 44.96 40.41
A.J. Burnett 82.4 4.23 -5.31 44.89 55.94
Vidal Nuno 77.29 4.84 -5.16 44.68 44.76
Daisuke Matsuzaka 75.37 8.38 -6.13 44.63 50.41
Hector Noesi 81.07 5.54 -4.12 44.59 52.45
Jake Odorizzi 70.24 4.97 -8.77 44.52 37.95
Joel Peralta 78.14 5.41 -3.91 44.15 44.68
Joe Nathan 82.63 2.69 -1.83 43.9 43.93
Carlos Carrasco 81.71 6.55 -5.33 43.55 59.35
Josh Beckett 73.88 7.73 -7.83 43.33 50
Fernando Salas 83.76 1.47 1.74 43.17 34.49
Lance Lynn 80.15 4.98 -5.79 43.07 53.5
Jered Weaver 69.96 6.47 -3.84 42.49 27.42
Will Smith 79.06 4.48 -4.32 41.86 45.94
Hector Santiago 77.64 3.59 -0.53 41.31 30.6
Jorge De La Rosa 74.81 4.56 -6.26 41.19 41.21
Hyun-jin Ryu 73.1 5.04 -7.88 41.17 42.5
Matt Shoemaker 76.24 6.77 -1.78 41.06 37.45
Fernando Abad 78.63 4.08 -4.74 40.75 45.18
Ryan Vogelsong 77.55 2.32 -4.13 40.71 37.22
Nathan Eovaldi 76.8 7.12 -7.37 40.45 54.37
Jon Niese 74.5 2.71 -5.81 39.94 35.31
Dan Haren 77.88 3.67 -3.65 39.66 39.63
Brad Hand 80.04 5.66 -2.5 39.15 45.96
Franklin Morales 74.71 5.81 -5.22 38.68 40.9
Alfredo Simon 78.2 5.32 -4.82 38.35 47.04
Eric Stults 68.63 1.8 -5.12 37.89 17.63
Madison Bumgarner 77.56 5.55 -4.39 37.85 44.88
Yusmeiro Petit 77.55 7.25 1.56 37.82 32.71
Jeremy Guthrie 76.14 4.86 -3.22 37.74 36.93
Masahiro Tanaka 74.41 5.28 -6.35 37.48 42.06
Gavin Floyd 81.7 5.13 -3.23 37.24 50.69
Aaron Harang 74.67 2.85 -4.36 36.79 32.16
Jacob deGrom 80.26 4.49 -1.82 36.32 42.16
Jon Lester 75.95 4.82 -4.15 36.25 38.87
Homer Bailey 80.44 5.72 -2.91 36.13 48.13
Jose Veras 76.8 10.11 -5.62 35.11 56.15
Jerry Blevins 74.84 6.77 -4.32 35.1 40.88
Drew Smyly 78.4 3.38 -0.21 34.68 31.1
C.J. Wilson 77.18 5.52 -4.61 34.67 44.5
John Danks 74.15 2.51 -1.84 34.63 23.5
Julio Teheran 74.02 6.32 -4.89 34.54 39.49
Jacob Turner 79.14 3.29 -2.46 34.4 38.63
Tommy Milone 75.46 4.13 -2.38 34.24 31.52
Tim Hudson 76.14 8.32 -4.29 33.02 47.21
Max Scherzer 78.2 5.92 -2.31 32.64 41.66
Hiroki Kuroda 77.15 4.58 -2.65 31.98 37.2
Paul Maholm 72.76 5.41 -5.55 29.91 36.31
Carlos Villanueva 76.58 4.05 -1.53 29.9 31.74
Scott Carroll 77.43 7.38 -2.75 25.13 44.15
Mark Buehrle 72.16 3.91 -3.74 23.02 26.85