Archive for Research

A Short History of Starters Who Fail to Record an Out

Failing to record an out is a starting pitcher’s worst nightmare. Generally, it means that either the pitcher suffered an injury or had absolutely nothing that particular day. In the case that the pitcher is healthy but eminently hittable, one can only imagine the embarrassment the pitcher feels. Additionally, it’s a pretty big letdown to the pitcher’s teammates. Players underperform from time to time, but perhaps nothing hurts a team as much as a starter who gets rocked and subsequently pulled before retiring a batter. In a matter of minutes, the pitcher’s squad can already be a few runs behind, and the bullpen becomes destined for a long day.

From data available at Baseball-Reference (since 1914), in the regular season, there have been 1,282 instances of starting pitchers leaving the game before recording one out (thanks, Play Index). The first time this occurred, on record, was April 24, 1914. The Cubs’ Charlie Smith faced five batters; he beaned one, allowed three hits, and one counterpart reached on error. The last time it happened was August 7, 2013, when Shelby Miller was yanked after taking a line drive to the elbow off the bat of Dodger’s outfielder Carl Crawford. Read the rest of this entry »


Jacob deGrom Fearless Forecast

Matt Harvey is getting all the hype these days, touching 99 mph on the gun, throwing nasty 84 mph curves, and looking healthy. I think he will have an excellent year. For some reason though, the world at large is still underrating Jacob deGrom.

First off, I recommend you read this FanGraphs article from midsummer, detailing the changes he made to his pitching mechanics to make this “rags to riches” leap into the upper echelon.

I’ve been notoriously high on deGrom since I watched him pitch. I wrote about him on reddit back in July 2014. I’ll update the numbers I used, infra:

He’s been excellent — and not in any flukey kind of way. deGrom’s pitch types and peripherals support that what he did last year is VERY REAL.

Let me reiterate last year’s line: 140 IP (178.1 IP of usage), 9.2 k/9, 2.69 ERA, 1.14 WHIP, 2.67 FIP, 3.03 xFIP, 3.19 SIERA. Those are top-20 numbers. And unlike phenoms that regress with time (see Jesse Hahn in 2014), deGrom only got BETTER as the innings racked up.

That is what we love to see — for three reasons:

(1) His body can withstand the rigors of a 200 IP season,

(2) He IMPROVED, rather than regressing, and

(3) Hey, for those of us in H2H leagues, we want our guy pitching well for the fantasy playoffs!

His control improved with time, with increased strikeouts. As of my last post, he had an 8.8 k/9 and 2.7 K/BB. He ended year with a 9.3 k/9 and 3.4 K/BB. We love to see improvement in both those respects. Keep the walks down and strikeouts up, and success often naturally follows!

He’s generating a lot of swinging strikes. For reference, the league average sw/str% is approximately 8.6%.

Jacob deGrom has an overall 11.9 sw/str%, which is well above league average. Looking at pitch F/X data, his slider (12.4% sw/str%, 46/370 pitches), changeup (20.2%, 55/272), both fastballs (10.8%, 108/1000), and curveball (16.0%, 34/212) are all above-average, strikeout-quality pitches.

deGrom essentially features a five-pitch arsenal. Of 2,225 MLB pitches thrown:

44.9% (1000/2225) Fastballs averaging 93.5 mph. Max Velocity, 97.3 mph.

16.5% (368/2225) 2-Seam Fastballs averaging 93.2 mph, Max Velocity, 97.4 mph.

16.6% (370/2225) Sliders averaging 86.8 mph, Max Velocity 91.3 mph (adding mph to his slider is a huge part of his success).

12.2% (272/2225) Changeups averaging 83.9 mph.

9.5% (212/2225) Curveballs averaging 79.3 mph.

3 Cutters–not really a pitch he uses.

deGrom has a diverse arsenal of pitches, with some legitimate velocity differentials, and a good fastball, topping out at 97+ mph. He has 7 mph between fastballs and slider. 10 mph between fastballs and changeup. 14+ mph between fastballs and curveball. 22.5 mph between the high-end spectrum of his fastball and low-end spectrum of his curve.

Essentially, deGrom is legit. His peripherals and Pitch F/X data don’t really suggest that he’s due for any significant regression. Citi Field is still an excellent pitcher’s park, despite the fact that the fences were recently moved in (3-11 feet). I don’t think it will make a significant difference; maybe a home run or two leaves the park that wouldn’t have before.

It’s worth noting that his top speeds increased late in the year, logging his highest speed fastball in the second half of the season. Again, I love a pitcher that doesn’t fatigue.

Concerns: He had Tommy John surgery in 2010, but it seems he has worked his way back from that. Sophomore slump or hitters figuring him out are worth considering. And of course, a couple fly ball outs might turn into home runs.

Fearless prediction: 32 games, 210 IP, 2.80 ERA, 1.05 WHIP, 234 Ks (10 k/9) – and deGrom finally gains some respect withing the fantasy baseball community as a top-15 fantasy pitcher. That bold prediction being said, I think he’s being criminally underrated in fantasy drafts, with his ADP of 112 in yahoo leagues.

112! At that price, go ahead and reach.


Don’t Hate Dee Because He’s Beautiful

I have every reason to hate Dee Gordon.

Prior to the 2012 season, I found myself struggling to figure out who would get the final keeper slot in a longtime, highly competitive fantasy league I played in. It came down to two players: Mike Trout and Dee Gordon. They both would have cost me the same, but Gordon was coming off a rookie campaign where he batted .304 with 24 steals in a miniscule 224 at-bats. Trout, on the other hand, was heading into 2012 with what seemed to me like a more clouded future. He had just posted a pedestrian .671 OPS with a 22.2 K%–albeit as a 19-year old–the year prior. He was also blocked in LF at the time by the great Bobby Abreu, and was looking at possibly another year of seasoning in the minors. In the end I chose Gordon, and the rest is terrible, nightmare-inducing history.

So how strange that I find myself here now, defending Dee Gordon, the very man who hoodwinked me into choosing him over Mike mother-flippin’ Trout.

Ironically, I think the hate for Gordon has gone a bit too far this year. It’s odd to think that there’s any hate for a guy coming off a season where he led all of baseball in steals while also posting a top-25 batting average of .289. But some people seem awfully down on the guy coming into 2015. Perhaps they too were burned by his 2011 breakout, and refuse to make the same mistake twice. Though I can’t fault them if that is the case, there is reason to believe that Dee Gordon’s days of breaking our hearts are over.

Gordon's Batted Ball Percentages 2014

The first thing to point out are his batted-ball rates. As the graph illustrates, there weren’t any earth-shattering changes occurring here. It is worth noting, however, that Gordon set a career high in groundball percentage and a career low in fly-ball percentage. And if you’re willing to consider 2013 an aberration like I am (he only managed 106 plate appearances that year), he has actually been gradually trending in the right direction with both his fly-ball and groundball percentages while maintaining a fairly steady line-drive rate. Spikes in groundball percentages are rarely considered ideal, but when a player has the elite speed Gordon does, the odds of turning a weak dribbler or a grounder towards the hole into a hit get a very favorable bump.

Which brings me to perhaps the most eyebrow-raising aspect of Gordon’s 2014 season: his bunt-hit percentage (BUH%). After averaging a 28.5 BUH% over the prior three seasons, Gordon posted a ridiculous 42.6 BUH% in 2014. To put that number into perspective, here’s how it stacked up against the league’s other elite speedsters:

2014 BUH% Among Elite Speedsters

Bunting for hits is a skill. The fact that his success rate rose by nearly 15% last year tells me that he worked on and dramatically improved this skill. Perhaps more importantly, though, it tells me that he’s keenly aware of how dangerous a weapon this skill can be for him when used effectively. When paired with his declining fly-ball rates–and especially his new career low IFFB% of 8%, down from 13.2%–the numbers start to paint the picture of a player who may have finally begun to consciously tailor his plate approach to his strengths.

While I will never forgive Dee Gordon for what he did to me, I do see reasons to be optimistic about his 2015 season. Should his elite ability to bunt for hits carry over into this season, his .346 BABIP shouldn’t see as much regression as people seem to think, and another year of plus average and a stolen-base crown seems well within his reach.


Is It Time to Re-Evaluate the Value of the Walk?

One of the founding notions of sabermetrics has been the emphasis of the walk. Before sabermetrics, in the dark ages, people hardly paid attention to the walk. Teams would pay players based on there batting average, HR, and RBIs and no one really put a lot of stock on the “scrappy” player who would draw walks and get on base. Sabermetrics essentially started around the mid 1900s and one of their founding principals was that the walk was way undervalued. Now the walk is deemed as an extremely valuable tool, and organizations will often pay a heavy hand for someone with a good walk rate. But what if the value of the walk was dropping, what if a walk in today’s game was not nearly as valuable as it use to be? Baseball you see is a living organism and is prone to change, just because something was valuable in the past, doesn’t mean it’s valuable in the present. We constantly need to be adjusting to the value of certain strategies and skills in order to stay ahead of the game.

This essentially all started when I looked at the correlation between pitches per plate appearance (Pit/PA) and runs scored per game (R/G), for 2014, and found that there was no real correlation (You can find the article here). I therefore decided to expand the data pool, look through a twenty year span to examine if 2014, was an anomaly, part of a consistent trend, or if Pit/PA never really had any correlation with (R/G).

So what I did was, I calculated the correlation coefficient of Pit/PA and R/G dating all the way back to 1994, for each individual year. If you don’t know what correlation coefficient is, or what is a strong or week correlation coefficient, I explain it, in my previous article. Anyways, the data that I found had a high level of variance. I did, however display two labels, the largest correlation coefficient in the last twenty years and the smallest. Why? Because although there is a large variation in the data from year to year, and it wouldn’t be unreasonable to believe that Pit/PA has a much higher correlation to R/G in 2015, it still is displaying a downward trend.

baseball

1994 had the highest correlation, while 2014 had the lowest correlation. So at this point you’ve probably noticed the variation and downward trend. Essentially what this tells us is that Pit/PA’s correlation with R/G is basically unpredictable. If your team, for example, sees a lot of pitches, it doesn’t mean that they will have a good offense. In fact if someone says that this team sees a lot of pitches and it’s a good thing, well he’s probably just blurting crap out. This is not to suggest that that individual is wrong, it is rather to suggest that seeing pitches doesn’t have a consistent correlation with runs scored. It is rather difficult then or impractical to come to any conclusion from this data set.

Now, what follows is an examination of similar trends and stronger trends of data. Oh, and I almost forgot, you’re also probably wondering well what about the base on balls, what was the point of that introduction? Well after I looked at the correlation between Pit/PA and R/G, I took a look at the correlation between BB% and R/G for 2014.

baseball2

This basically shows no distinct correlation between BB% and R/G in 2014. Then I calculated the correlation coefficient to get an exact number, and got R=0.0908. Essentially this displays that there was no correlation between BB% and R/G in 2014.

I therefore ran the numbers again, for 20 years, to see if this was just an abnormality in the data. I also wanted to get a sense of whether there was a specific trend.

 

baseball 3

For this chart I decided to display all the data sets, to give you an idea of what the correlations looked like. The two, however, that I really want you to focus on are the 2012 correlation (R=0.083) and 2014 (R=0.0908) correlation. Both of these years show a significant drop-off in the correlation between BB% and R/G. Before there was always a positive correlation between the two data points, even at times strong correlations. In 2014 and 2012, however, there was essentially no correlation between BB% and R/G.

So what does this mean? Why the sudden drop in data correlation and will it continue? I also found it odd that in 2013, the correlation went all the way back up to R=0.4749, which is not the strongest correlation, but still a good one.

First, however, before we try to answer the two questions I’ve asked, let’s look at another set of correlation data, and that’s the correlation between BB% and OBP. Why? Well my hypothesis was if the correlation between BB% and OBP is getting smaller than naturally the correlation between BB% and R/G would get smaller as well.

baseball 4

As you might be able to tell although less drastic the correlation between BB% and OBP has similar results to the correlation between BB% and R/G. Again the part of the graph, which you should focus on is the two outlier data points. Again they are 2012 (R=0.2317) and 2014 (R=0.3570). This at this point gives us some explanation for the two outlier data points in the previous graph.

Essentially what one needs to understand from this is, since BB% is becoming less correlated with OBP, it’s evidently going to have a lesser correlation with R/G. Since the primary value of a BB is the effect it has on the OBP (obviously though not the only). Also generally and through the 20 years of data there has been a strong correlation between BB% and OBP. Apart from 2012 and 2014 where their correlation is weaker, although still a positive correlation.

So now we need to understand this, if the walk has a small correlation with OBP, then its value will be significantly affected. The problem here is trying to figure out why in 2012 and 2014 there was a sudden drop in its correlation with OBP. My first hypothesis was that it had something to do with the overall BB% of the league.

league BB

In hindsight this was probably a simplistic hypothesis. At this point you’ve probably figured out that this was not the answer. Yes, the overall BB% is trending down, just like the previous charts, but the difference is that it doesn’t have the outliers of 2012 and 2014. (I included this to dispel a possible easy assumption to the answer.)

There are in fact several possibilities for the drop in correlation between BB% and OBP. Perhaps it’s the shift, perhaps it’s the low run environment, perhaps it’s high rise in strikeouts. I think another interesting element to look at it is how are hitters doing later in the count. Considering the rise in strikeouts, it’s probably not unreasonable to assume that hitters are performing worse than ever when hitting with two strikes, although this of course is just a hypothesis. The answer to that question is for another study, for another day. What is certain, however, is that this upcoming season will be a fascinating data point. Will the correlations keep getting smaller or are these two data points just truly abnormalities? In any case I think it’s important to consider this, baseball is an ever changing game, and just because something has value one year, doesn’t mean it has value another. Teams need to keep changing and mixing their strategies in order to stay ahead in this wacky game.

Finally, something to note: these data sets are not meant to arrive to any conclusion. I have not arrived at any conclusions about baseball through this data. What it does is, it raises more questions for further and more detailed and elaborate studies. For, example it would be interesting, for Pit/PA to look at it from a pitchers point of view, although I’m not sure that would give us different results. These data sets are also general; they give us a general idea of the situation. Perhaps there are specific teams or players that thrive on seeing a lot of pitches or that do translate a high number of BBs into runs. Also and this might be the most important element to note, correlations aren’t always linked with causation. For example, pop fly’s may have a positive correlation with Pit/PA, that doesn’t mean that pop fly’s caused Pit/PA. What correlations, however, can do is direct us into the right direction to finding the causation. It is a measure or a way of advancing and creating more elaborate and specific research.

So I conclude, now that one has digested all this data, is it time to re-evaluate the value of a walk?

 

All data courtesy of baseball reference.


Brandon Inge, Superstar

Brandon Inge, Superstar.

How many wins is chemistry worth? Do nice guys really finish last?

As a Pirates fan since birth, I’ve grown used to my baseball fandom engendering a sense of sympathy in others. Born in 1989, I came of baseball-loving age in the mid-nineties, immediately following the halcyon Bonds/Bonilla/Van Slyke & co. days and immediately preceding the less-halcyon days of the Aramis Ramirez-for-Bobby Hill trade, “Operation Shutdown,” the expansion-drafting of Joe Randa, Pat Meares’ general existence, the Moskos pick, the Matt Morris trade . . . (list of soul-crushingly depressing baseball stories truncated for reader’s mental health).

And yet I remained faithful, despite having no conscious memory of a Pirates team being anything other than heartbreakingly awful. I’ve since likened this experience, in conversations with friends, to Linus sitting in the pumpkin patch each year, waiting for the Great Pumpkin to appear. It sometimes seemed that the Great Pumpkin would never come.

It’s ironic, then, that in the year that finally saw the Great Pumpkin arrive in Pittsburgh (2013), the same city also witnessed the end of the career of one Charles Brandon Inge.

Inge, nicknamed ‘Cringe’ by some of the crueler Pittsburgh faithful for his anemic .181/.204/.238 batting line during the 2013 campaign, was at that point in his thirteenth season as one of baseball’s premiere utility men, playing every position on the diamond during his career. During his peak, he was a slick-fielding third baseman who also clubbed 27 HRs en route to a 4.1 fWAR season in 2006. But by 2013, Inge was 36 and on his way out of the league. Signed before the season to provide depth behind Pedro Alvarez and Neil Walker, Inge’s poor performance eventually led to his unceremonious release by the Pirates at the end of July.

And yet, this article has less to do with Inge’s on-field merits (which, as the previous paragraph suggests, were both significant and significantly variable), and more to do with Inge’s impact off the field. Inge won the 2010 Marvin Miller Man of the Year Award, given to the player whose “performance and contributions to his community inspire others to higher levels of achievement,” for his work with C.S. Mott Children’s Hospital. A frequent visitor to C.S. Mott, Inge also donated $100,000 for a new infusion center to treat pediatric cancer and twice hit home runs for young cancer patients. Dude’s a nice guy.

Perhaps more relevant, though, is pitcher and noted stathead Brandon McCarthy’s statement that Inge and fellow veteran Jonny Gomes had been worth twenty-four wins to the 2012 Athletics through chemistry alone. Normative ethics aside, it’s impossible to measure the moral character of a man—but we can measure, or at least attempt to quantify, the impact he has on his teammates.

Intrigued, I set out to determine whether Inge, patron saint of chemistry and all-around good guy, really made such a gigantic difference to his teammates’ performance. Mine is not the first investigation into this topic—Baseball Prospectus’ Russell A. Carleton examined the same issue in March of 2013, and there have been numerous attempts to place a valuation on chemistry over the years. But as you’ll see, there are some methodological differences to our approaches, and the differences expose some interesting conclusions.

Methodology

There is no ironclad way to assess Inge’s potential effect on his teammates, short of cloning entire teams of players, randomly assigning Brandon Inges to some of them, and having them play a large number of seasons.

In order to determine Inge’s value as accurately as possible, I can’t simply measure his teammates’ performance—I’d just be concluding that Inge played with good or bad teammates. Instead, I need to develop a counterfactual, or a method of estimating how we could’ve reasonably expected Inge’s teammates to play in his absence. Fortunately, an excellent one already exists—a ZiPS projection. ZiPS, to my knowledge, does not have a ‘played with Brandon Inge variable,’ so it should be unbiased. Carleton instead used an AR(1) covariance matrix to try to adjust for player talent, but given that ZiPS explicitly incorporates past performance with a view to projecting, as accurately as possible, how a player will perform in the upcoming season, I believe it is a suitable tool.

I chose wOBA as the dependent variable for our study—while Carleton looked at multiple indicators (BB%, K%, etc), one, all-encompassing measure of players’ offensive performance seems best suited to answering the question, “Do players perform better with Brandon Inge on their team?”

In order to develop the requisite dataset for this analysis, I downloaded every player-season since 2006[1] from FanGraphs’ leaderboards and filtered the data to include only those players who amassed at least 200 plate appearances. This yielded 3130 player-seasons. Next, I created a binary variable called ‘IngeTeammate,’ with a value of ‘1’ if the player was on Inge’s team during the given season (and not Inge himself), and ‘0’ if he wasn’t. For the 2012 season, the only one in which Inge played for multiple teams, I counted Inge as having played for the Athletics, with whom he spent the majority of the season.

The next part was a bit tricky—bringing in the ZiPS projections. The latest years, the ones for which ZiPS has been featured on FG, were easy—data was readily available, wOBA already calculated, and records already associated with a player id. But wading deeper into the past unearthed some issues—in order to match records, I had to manually match player names (including the two Chris Carters, and, apparently, two Abraham Nunezes . . . Nunezii . . . who knows?) and hand-calculate ZiPS-projected wOBA for older player-seasons using the weights provided on the FanGraphs Guts page. One potential issue with some of the oldest data is the lack of projections for things like intentional walks and sacrifice flies.

However, forging through all of the record-matching and manual wOBA-calculating eventually yielded ZiPS wOBA projections matched to 3088 of the 3130 player-seasons. Of the 42 unmatched seasons, only one was an Inge teammate (2010 Brennan Boesch). 81 of the 3088 matched seasons were Inge teammates. So unless you think ZiPS would have pegged Boesch, a relatively unknown 25-year-old at the time, for a significantly better performance than the .322 wOBA he posted in 2010, the unmatched records probably didn’t have a huge effect.

What we’re left with is data that look like this:

Year Name Team Age PA IngeTeammate ZiPS wOBA wOBAdiff wOBA
2010 Jose Bautista Blue Jays 29 683 0 0.322 0.100 0.422
2010 Jim Thome Twins 39 340 0 0.343 0.096 0.439
2010 Wilson Betemit Royals 28 315 0 0.302 0.084 0.386
2010 Josh Hamilton Rangers 29 571 0 0.365 0.080 0.445
2010 Chris Johnson Astros 25 362 0 0.286 0.067 0.353
2010 Carlos Gonzalez Rockies 24 636 0 0.350 0.063 0.413
2010 Justin Morneau Twins 29 348 0 0.387 0.061 0.448
2010 Paul Konerko White Sox 34 631 0 0.361 0.056 0.417
2010 Joey Votto Reds 26 648 0 0.383 0.055 0.438
2010 Danny Valencia Twins 25 322 0 0.299 0.052 0.351
2010 Giancarlo Stanton Marlins 20 396 0 0.305 0.051 0.356
2010 Miguel Cairo Reds 36 226 0 0.288 0.051 0.339
2010 Will Rhymes Tigers 27 213 1 0.288 0.050 0.338
2010 Tyler Colvin Cubs 24 395 0 0.301 0.050 0.351
2010 Michael Morse Nationals 28 293 0 0.328 0.049 0.377
2010 Adrian Beltre Red Sox 31 641 0 0.343 0.048 0.391
2010 Ryan Hanigan Reds 29 243 0 0.321 0.048 0.369
2010 Yorvit Torrealba Padres 31 363 0 0.279 0.044 0.323
2010 Matt Joyce Rays 25 261 0 0.321 0.043 0.364
2010 Aubrey Huff Giants 33 668 0 0.344 0.043 0.387
2010 Drew Stubbs Reds 25 583 0 0.295 0.043 0.338
2010 Andres Torres Giants 32 570 0 0.316 0.042 0.358
2010 Corey Patterson Orioles 30 341 0 0.274 0.042 0.316
2010 Austin Jackson Tigers 23 675 1 0.288 0.041 0.329
2010 Brett Gardner Yankees 26 569 0 0.306 0.040 0.346
2010 Colby Rasmus Cardinals 23 534 0 0.329 0.040 0.369
2010 Andruw Jones White Sox 33 328 0 0.323 0.039 0.362

In the above table, wOBAdiff refers to the amount by which the player outperformed his ZiPS wOBA projection. A negative number would indicate that a player underperformed his projection. So Jose Bautista outperformed his 2010 projection by .100—multiplying by 1000 tells us that this was 100 points of wOBA. It was good to be Joey Bats in 2010.

Results

If we look at the mean wOBA deviation (in terms of points of wOBA) Inge teammates and non-teammates experienced from their ZiPS projections, we see the following results:

  Player-Seasons Total PA Mean Weighted Diff. (wOBA pts)
Mean Unweighted Diff. (wOBA pts)
Non-Teammate 3007 1,378,732 -3.09 -4.62
Teammate 81 37,965 4.30 4.24

In other words, if we weight by plate appearances, Inge teammates outperformed their ZiPS projections by an average of about 4.30 points of wOBA. All other players underperformed their projections by an average of about 3.09 points. Which might not seem like a lot, but if you were to apply that 7.4 wOBA difference to an average-hitting team over a 6000 PA team-season, that’s roughly 34 runs. So 3.4 wins. Which is, you know, quite a bit. The unweighted version is even more extreme, suggesting that players with lower numbers of PA have outperformed their projections even more when teamed with Inge.

If we simply run a regression including the independent variables IngeTeammate (binary) and age and the dependent variable wOBAdiff (unweighted), we can express the story another way:

wOBAdiff = 0.0127064 + (IngeTeammate* 0.0090544) + (age* -0.0005993)

I included age as a control because ZiPS projections, as you can see from the model above, tended to slightly overproject older players in comparison to younger players, and therefore I needed to consider the possibility that Inge simply benefitted from playing only with young players (he didn’t).

Note that in the model above, 0.001 corresponds to one point of wOBA (i.e. a hitter moving from .323 to .324 would have gained a point of wOBA). The r-squared of the model is absurdly low (0.006), but that’s to be expected—after all, I’m not trying to assert that Brandon Inge is responsible for all or even a significant part of the variation between MLB players’ expected and actual performance. More importantly, the variable ‘IngeTeammate’ is significant at a 98.4% threshold.

Considering the possible influence of aging is interesting, as the Inge difference is even more pronounced among younger players, or those whom he allegedly mentored while playing with the A’s. If we filter the data above to include only players 27 and younger, the table looks like this:

  Player-Seasons Total PA Mean Weighted Diff. (wOBA pts)
Mean Unweighted Diff. (wOBA pts)
Non-Teammate 1241 568,944 -0.50 -2.09
Teammate 30 14,298 16.58 17.27

We’re starting to run into some serious sample size issues that make me uncomfortable drawing any particularly bold conclusions, but young players who play with Inge have done really, really well, collectively knocking the snot out of their ZiPS projections. There are problems with extrapolating this to a 6000 PA team-season, given that presumably an entire team won’t be composed of young players, but if one did so the result would be a ridiculous 78.6 runs of additional value.

The table below lists every 27-and-under player season for which the player was an Inge teammate:

Year Name Team Age PA ZiPSwOBA wOBAdiff wOBA
2008 Matt Joyce Tigers 23 277 0.275 0.084 0.359
2011 Alex Avila Tigers 24 551 0.308 0.076 0.384
2013 Jordy Mercer Pirates 26 365 0.282 0.051 0.333
2010 Will Rhymes Tigers 27 213 0.288 0.050 0.338
2012 Chris Carter Athletics 25 260 0.319 0.050 0.369
2011 Brennan Boesch Tigers 26 472 0.300 0.048 0.348
2007 Curtis Granderson Tigers 26 676 0.344 0.044 0.388
2010 Austin Jackson Tigers 23 675 0.288 0.041 0.329
2012 Yoenis Cespedes Athletics 26 540 0.328 0.040 0.368
2010 Miguel Cabrera Tigers 27 648 0.399 0.032 0.431
2013 Jose Tabata Pirates 24 341 0.308 0.032 0.340
2012 Josh Reddick Athletics 25 673 0.296 0.030 0.326
2013 Andrew McCutchen Pirates 26 674 0.365 0.028 0.393
2013 Starling Marte Pirates 24 566 0.317 0.027 0.344
2006 Omar Infante Tigers 24 245 0.306 0.016 0.322
2008 Curtis Granderson Tigers 27 629 0.358 0.015 0.373
2009 Clete Thomas Tigers 25 310 0.302 0.015 0.317
2012 Josh Donaldson Athletics 26 294 0.286 0.014 0.300
2011 Andy Dirks Tigers 25 235 0.297 0.011 0.308
2013 Neil Walker Pirates 27 551 0.328 0.005 0.333
2013 Pedro Alvarez Pirates 26 614 0.327 0.003 0.330
2006 Curtis Granderson Tigers 25 679 0.335 0.000 0.335
2009 Miguel Cabrera Tigers 26 685 0.407 -0.005 0.402
2010 Alex Avila Tigers 23 333 0.306 -0.007 0.299
2011 Austin Jackson Tigers 24 668 0.315 -0.010 0.305
2012 Jemile Weeks Athletics 25 511 0.304 -0.028 0.276
2012 Derek Norris Athletics 23 232 0.304 -0.029 0.275
2006 Chris Shelton Tigers 26 412 0.380 -0.033 0.347
2013 Travis Snider Pirates 25 285 0.310 -0.039 0.271
2008 Miguel Cabrera Tigers 25 684 0.419 -0.043 0.376

It’s not as if one year is hugely skewing the results—pretty much every year, whichever young players happen to be playing with Brandon Inge outperform their projections. The graph below illustrates the mean wOBA differential younger Inge teammates exhibited each season. I would’ve imagined, prior to viewing these results, that Inge’s positive ‘effect’ might’ve been almost entirely a product of the 2012 Athletics, but this doesn’t seem to be the case—outside of the 2006 Tigers (when Omar Infante, Curtis Granderson, and Chris Shelton collectively underperformed their ZiPS projections by a modest average of ~5 points of wOBA), Inge’s younger teammates have outperformed ZiPS every single year in the sample.

Perhaps, one could say, Inge has simply benefitted from playing on teams run by intelligent front offices. After all, the Tigers, Athletics, and (more recently) the Pirates all have reputations as relatively savvy management teams. Maybe they’re just collectively able to out-forecast ZiPS.

When we look at ZiPS wOBA differentials by team, however, the Tigers (+1.36 points of wOBA), Athletics (+0.11) and Pirates (-0.31) all had weighted mean differentials less than the Inge gap. The average over all teams was -2.89, so while all three front offices ‘beat the market,’ so to speak, they still don’t explain the huge Inge effect. It looks as though there’s something here.

After observing the results for Inge, I was curious about whether other veteran players might also exhibit similar correlations—while we’d expect to find no correlation with ZiPS wOBA differential for most players, it might be the case that, as with Inge, patterns emerge. Specifically, I looked at two players with diametrically opposite reputations—A.J. Pierzynski and Jonny Gomes. Below, I replicate the initial summary table used for the Inge analysis and note the magnitude of the effect:

A.J. Pierzynski

  Player-Seasons Total PA Mean Weighted Diff. (wOBA pts)
Mean Unweighted Diff. (wOBA pts)
Non-Teammate 3004 1,375,450 -2.75 -4.29
Teammate 84 41,247 -7.65 -7.87

The game’s most hated player didn’t fail to disappoint, as his teammates collectively underperformed their ZiPS projections by an additional of 4.9 points of wOBA when compared to non-teammates, an effect worth -22.6 runs to the team over the course of a full season. I should note that I assigned Pierzynski to the 2014 Red Sox (with whom he spent considerably more time) instead of the 2014 Cardinals—both teams underperformed their ZiPS projections, but the Red Sox did so by a larger margin.

Pierzynski’s unweighted results, while still negative, are less damning, and using a regressed model reflects this:

wOBAdiff = 0.0128794+ (AJTeammate* -0.0033689) + (age* -0.0005939)

The intercept and coefficient for age are, understandably, almost identical to those I observed in the Inge model. The significance level for AJTeammate, however, is only 64.1%, suggesting that we can’t really conclude much of anything with the same level of confidence as for Inge.

Still, twenty-plus runs is a non-negligible amount, and Pierzynski’s numbers have been negative across all four teams for whom he’s played (White Sox, Rangers, Red Sox, Cardinals). It may be that more historical data would reveal a broader trend, given that we’ve limited our sample size to only the latter half of Pierzynski’s career.

Jonny Gomes

  Player-Seasons Total PA Mean Weighted Diff. (wOBA pts)
Mean Unweighted Diff. (wOBA pts)
Non-Teammate 3000 1,376,613 -3.05 -4.56
Teammate 88 40,084 2.58 1.52

The phenomenally-bearded Gomes, Inge’s running partner in the Brandon McCarthy quote that triggered this analysis, also appears to be a potential chemistry star, though his results are less extreme than Inge’s. His teammates outperformed non-teammates by 5.6 points of wOBA, worth an estimated 26 runs per season.

wOBAdiff = 0.0124387+ (GomesTeammate* 0.0055032) + (age* -0.0005873)

The effect, as with Pierzynski, is not statistically significant—the significance level is 87.4%.

Conclusions

We can’t make firm statements about causality from this analysis, but we can say pretty conclusively that being on the same team as Inge during the last nine years correlates positively with hitting better than ZiPS projects you to hit.

Maybe you don’t believe Inge should get credit for the extra 3.4 wins of value each year. We don’t have a ‘chemistry above replacement’ metric to account for the fact that some other player with a modicum of veteranosity might plausibly have a positive effect if analyzed the same way. And there’s no feasible way to develop one on the horizon—you can only start to do this sort of analysis retrospectively, and it requires a large number of plate appearances and player-seasons before we can conclude that any pattern has emerged. I’m not really arguing that Inge deserves all the credit for his teammates’ overperformance, only that we have reason to believe a nonzero effect may exist.

But let’s entertain, for a minute, the possibility that the 3.4 win-per-season gap we see *is* entirely attributable to Inge. That maybe all the minute, unnoticed interactions between players over the course of a season can add up to improved performance at the plate. The effect could even be greater than 3.4 wins—I didn’t examine pitching and fielding at all. After all, everything we know about human psychology suggests that happier workers are more productive, and I’ve yet to hear any compelling reason that ballplayers constitute an exception. We sometimes, in the analytics community, fall into the trap of assuming that because we can’t measure something accurately, it doesn’t deserve a meaningful place in our analysis. And yet our inability to measure a phenomenon is not proof of its nonexistence—just ten years ago, we lacked meaningful metrics for catcher framing, for instance.

Perhaps Inge contributed more hidden value over the last decade than anyone this side of Jose Molina, and Brandon McCarthy’s twenty-four wins were, if still hyperbole, grounded in a subtle truth. 3.4 wins currently has a market value north of $20M, making Inge a substantially underpaid man over the course of his career.

It’s a shame, on some level, that it’s only after he’s retired that we recognize the unheralded Inge for who he might secretly have been: Brandon Inge, Superstar.

 

[1] Before 2006, I struggled to find ZiPS projections in a readable format to develop the counterfactuals.

Data retrieved from FanGraphs and Baseball Think Factory.


Hardball Retrospective – The “Original” 2012 Tampa Bay Rays

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. Consequently, Hank Aaron is listed on the Braves roster for the duration of his career while the Blue Jays claim Carlos Delgado and the Brewers declare Paul Molitor. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the finest single-season rosters for every Major League organization based on overall rankings in OWAR and OWS along with the general managers and scouting directors that constructed the teams. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. Additional information and a discussion forum are available at TuataraSoftware.com.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

Assessment

The 2012 Tampa Bay Rays             OWAR: 46.4     OWS: 254     OPW%: .607

GM Chuck Lamar acquired 77.7% (21 of 27) of the ballplayers on the 2012 Rays roster. With the exception of Elliot Johnson and Jose Veras all of the players were selected during the Amateur Draft. Based on the revised standings the “Original” 2012 Rays registered 98 victories and secured the American League Eastern division title by a 16-game margin over the New York Yankees.

David Price (20-5, 2.56) collected the 2012 AL Cy Young Award for his superlative campaign in which he topped the Junior Circuit in victories and ERA while striking out 205 batters. “Big Game” James Shields (15-10, 3.52) tallied 223 whiffs and fashioned a 1.168 WHIP. Jeremy Hellickson (10-11, 3.10) provided a reliable effort in his sophomore season and added a Gold Glove Award to his trophy case. Jason Hammel (8-6, 3.43) and Matt Moore (11-11, 3.81) stabilized the back-end of the rotation.

Jake McGee led the bullpen crew with a 1.95 ERA and a WHIP of 0.795. Wade Davis contributed an ERA of 2.43 in 54 relief appearances after starting 64 contests in the three prior campaigns.

ROTATION POS WAR WS
David Price SP 6.4 19.12
Jeremy Hellickson SP 3.57 11.21
James Shields SP 2.85 12.33
Jason Hammel SP 2.82 9.74
Matt Moore SP 1.76 8.07
BULLPEN POS WAR WS
Jake McGee RP 1.09 7.5
Wade Davis RP 0.9 6.43
Jose Veras RP 0.75 5.01
Chris Seddon SW 0.37 1.97
Chad Gaudin RP -0.46 2.68
Alex Cobb SP 1.14 6.18
Jeff Niemann SP 0.5 2.07
Dan Wheeler RP -0.68 0

Josh Hamilton blasted 43 round-trippers and scored 103 runs (both career-bests) en route to a fifth-place finish in the 2012 A.L. MVP balloting. B.J. Upton and protégé Desmond Jennings nabbed 31 bags apiece at the top of the order. Upton established a personal best with 28 circuit clouts. Evan Longoria batted .289 with 17 jacks despite missing 88 games due to a partially torn hamstring. John Jaso delivered a career-high .394 OBP and Jonny “Ironsides” Gomes swatted 18 big-flies. 

LINEUP POS WAR WS
B. J. Upton CF 2.56 19.64
Desmond Jennings LF 1.62 15.18
Josh Hamilton DH/CF 4.39 25.5
Evan Longoria 3B 2.39 11.12
Jonny Gomes RF/DH 2.05 13.04
John Jaso C/DH 2.83 15.96
Aubrey Huff 1B 0.07 1.14
Elliot Johnson 2B/SS 1.02 8.62
Reid Brignac SS -0.19 0.52
BENCH POS WAR WS
Carl Crawford LF 0.46 3.19
Jason Pridie RF 0.1 0.5
Matt Diaz LF -0.25 1.61
Stephen Vogt C -0.35 0.16
Delmon Young DH -1.37 6.95

The “Original” 2012 Tampa Bay Rays roster

NAME POS WAR WS General Manager Scouting Director
David Price SP 6.4 19.12 Andrew Friedman R.J. Harrison
Josh Hamilton CF 4.39 25.5 Chuck LaMar Dan Jennings
Jeremy Hellickson SP 3.57 11.21 Chuck LaMar Tim Wilken
James Shields SP 2.85 12.33 Chuck LaMar Dan Jennings
John Jaso DH 2.83 15.96 Chuck LaMar
Jason Hammel SP 2.82 9.74 Chuck LaMar Dan Jennings
B. J. Upton CF 2.56 19.64 Chuck LaMar Dan Jennings
Evan Longoria 3B 2.39 11.12 Andrew Friedman R.J. Harrison
Jonny Gomes DH 2.05 13.04 Chuck LaMar Dan Jennings
Matt Moore SP 1.76 8.07 Andrew Friedman R.J. Harrison
Desmond Jennings LF 1.62 15.18 Andrew Friedman R.J. Harrison
Alex Cobb SP 1.14 6.18 Andrew Friedman R.J. Harrison
Jake McGee RP 1.09 7.5 Chuck LaMar Cam Bonifay
Elliot Johnson SS 1.02 8.62 Chuck LaMar Dan Jennings
Wade Davis RP 0.9 6.43 Chuck LaMar Cam Bonifay
Jose Veras RP 0.75 5.01 Chuck LaMar Dan Jennings
Jeff Niemann SP 0.5 2.07 Chuck LaMar Cam Bonifay
Carl Crawford LF 0.46 3.19 Chuck LaMar Dan Jennings
Chris Seddon SW 0.37 1.97 Chuck LaMar Dan Jennings
Jason Pridie RF 0.1 0.5 Chuck LaMar Dan Jennings
Aubrey Huff 1B 0.07 1.14 Chuck LaMar Dan Jennings
Reid Brignac SS -0.19 0.52 Chuck LaMar Cam Bonifay
Matt Diaz LF -0.25 1.61 Chuck LaMar Dan Jennings
Stephen Vogt C -0.35 0.16 Andrew Friedman R.J. Harrison
Chad Gaudin RP -0.46 2.68 Chuck LaMar Dan Jennings
Dan Wheeler RP -0.68 0 Chuck LaMar
Delmon Young DH -1.37 6.95 Chuck LaMar

Honorable Mention

The “Original” 2008 Rays                   OWAR: 39.4     OWS: 276     OPW%: .528

Five members of the 2008 Tampa Bay Rays accrued at least 20 Win Shares including Josh Hamilton, B.J. Upton, Aubrey Huff, Evan Longoria and Akinori Iwamura. Hamilton hit .304 with 32 jacks and a League-leading 130 RBI. “Huff Daddy” launched 32 four-baggers and knocked in 108 baserunners. Longoria (.272/27/85) claimed Rookie of the Year honors and Upton swiped 44 bases.

On Deck

The “Original” 2009 Rockies

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database – Transaction a – Executive 

SB Nation – “Evan Longoria injury – 2012 return in question”

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive


Pitch Grades vs. Relative Pitch Grades

When deciding on the grade for a pitcher’s breaking pitch, a scout relies on the pitch’s velocity and movement (although command can be factored in as well).  These factors are combined into a single number on a scale from 20-80, with major league average as a 50 and a standard deviation recorded at 10.

Clayton Kershaw’s curveball has long been regarded as one of the best in the business, yet by my systematic calculation factoring in velocity, horizontal movement, and vertical movement, his curveball rated among the bottom third of curveballs.  Its below-average velocity and below-average horizontal movement held it back despite its above-average vertical movement.  While I was pondering this conundrum, I remembered another fact–Kershaw’s fastball happens to have a lot of rise.  What if movement was recorded by the difference between the pitcher’s fastball movement and the breaking ball movement instead of the breaking ball’s movement compared to an arbitrary point?  I was about to find out.

(This paragraph is solely methodology, so skip it if you wish.)  Using Baseball Prospectus’ excellent pitch f/x leaderboard, I selected all pitchers who threw at least 200 four-seam fastballs and at least 100 curveballs in 2014.  A breaking pitch’s horizontal and vertical movement was recorded as the difference between the pitch’s raw movement and the pitcher’s four-seam fastball’s movement.  I calculated the z-scores for the curveball’s velocity and the z score for the combination of the z scores of the curveball’s relative movements.  (I gave a 150% weight to vertical movement over horizontal movement).  Then, I combined the z scores of the velocity and combined relative movement to calculate a relative pitch score.  (I gave a 150% weight to combined relative movement over velocity).  Finally, I calculated a scouting grade on the 20-80 scale based off the relative pitch score.  Below is a table showing my results.

While I may not have solved the difference of evaluation among Kershaw’s curveball, the relative scouting grade at least opens discussion on how movement and velocity of pitches should be evaluated.  Is it better for a breaking pitch to be faster, or is it better to create a wider difference in velocity between the fastball and breaking ball?  Is it better for a pitcher’s breaking ball movement to be as different from the fastball as possible, or do some similarities create greater deception because a hitter can’t recognize the pitch earlier?

Player CU Vel CU H Mov CU V Mov Rel SG Unadj SG
Garrett Richards 79.68 5.85 -12.33 76.26 71.63
Sonny Gray 82.45 9.21 -5.44 76.09 67.11
Justin Grimm 81.63 7.15 -6.92 72.63 64.67
Blaine Hardy 78.62 1.55 -9.74 71.28 53
Adam Wainwright 75.37 9.34 -9.23 69.59 60.71
John Axford 78.77 4.37 -9.81 69.07 59.61
Carlos Torres 79.9 6.43 -9.09 68.1 64.79
Robbie Erlin 74.28 0.81 -10.64 68.01 43.53
Felix Hernandez 80.97 7.25 -8.16 67.84 66.62
Yu Darvish 78.16 8.5 -7.46 67.83 60.8
Yordano Ventura 83.76 2.28 -5.63 65.12 55.81
Clay Buchholz 78.25 8.88 -7.36 64.52 61.56
Brandon Workman 77.09 4.68 -9.35 64.24 55.08
Tyler Skaggs 77.58 6.64 -8.92 63.97 59.31
Jake Arrieta 80.12 5.81 -9.46 63.55 64.97
Juan Gutierrez 81.1 6.16 -6.77 62.94 60.89
Kevin Jepsen 84.52 3.9 -6.89 62.68 64.44
James Paxton 82.54 0.39 -2.69 62.49 41.05
Chris Tillman 76.22 3.31 -10.43 62.3 52.94
Craig Kimbrel 86.3 4.84 -5.95 62.14 68.18
Adam Warren 81.94 4.9 -6.62 61.45 59.77
Dellin Betances 83.85 7.04 -3.79 61.29 61.37
Edinson Volquez 80.67 5.95 -7.3 60.89 60.83
Wade Davis 85.56 3.29 -4.69 60.28 59.75
Tom Wilhelmsen 78.85 6.01 -7.58 60.26 57.4
Trevor May 77.76 7.56 -6.23 59.46 54.56
Cody Allen 86.88 5.09 -3.71 59.12 64.13
Tyler Thornburg 78.45 2.83 -7.21 59.01 48.63
Casey Janssen 74.81 9.84 -5.38 58.98 50.23
Trevor Bauer 79.18 5.17 -8.33 58.51 58.36
Brad Peacock 77.37 5.91 -7.39 57.85 53.18
Brett Oberholtzer 79.73 1.73 -3.22 57.83 38.69
Cole Hamels 79.01 3.23 -6.2 57.81 48.13
Josh Tomlin 76.76 4.39 -7.46 56.34 48.65
Drew Pomeranz 81.77 4.9 -7.41 56.24 61.47
Gio Gonzalez 78.4 5.35 -8.64 56.18 57.73
Andre Rienzo 79.26 6.3 -6.52 55.97 56.17
Scott Atchison 80.83 4.81 -8.52 55.78 62
Scott Kazmir 77.03 0.12 -3.35 55.77 29.19
Ian Kennedy 78.23 6.26 -9.1 55.71 60.51
Nick Tepesch 78.82 5.36 -8 55.71 57.04
Cory Rasmus 76.65 5.44 -7.84 55.67 51.66
Tom Koehler 79.92 5.1 -8.64 55.42 60.79
Marco Estrada 77.92 4.66 -6.27 55.31 48.81
Mike Fiers 72.93 3.7 -11.31 55.29 48.33
Odrisamer Despaigne 76.43 8.24 -7.25 55.08 55.59
Francisco Rodriguez 76.96 7.14 -6.73 54.97 53.1
Anthony Ranaudo 78.16 4.35 -7.93 54.82 53.13
Stephen Strasburg 80.69 7.59 -7.28 54.74 64.35
Mike Minor 81.66 0.98 -3.82 54.69 43.24
Yovani Gallardo 79.95 4.01 -6.75 54.68 53.49
Jeremy Hellickson 76.7 7.51 -9.53 54.53 60.71
Justin Verlander 79.98 4.97 -6.45 54.29 54.83
Dillon Gee 74.91 8.15 -7.75 54.19 53.13
J.A. Happ 78.27 2.13 -4.71 53.89 40.06
Kevin Quackenbush 77.17 5.05 -7.88 53.7 52.15
Roenis Elias 79.81 7.16 -6.09 53.3 58.18
Marcus Stroman 83.34 8.96 -2.3 53.27 60.33
Jason Vargas 75.68 1.63 -5.08 53.05 33.84
Clayton Kershaw 74.61 2.35 -8.93 52.85 43.08
Collin McHugh 73.68 8.26 -9 52.55 53.77
David Phelps 80.72 2.7 -5.06 52.51 48.01
Tommy Hunter 83.55 6.01 -3.14 52.48 56.72
Wesley Wright 79.8 3.86 -4.65 52.48 47.24
Miles Mikolas 75.4 6.22 -9.7 52.36 55.32
Javy Guerra 77.99 3.47 -7.42 52.2 49.48
Jason Hammel 77.19 6.3 -7.76 52.1 54.57
Vic Black 82.68 3.57 -3.91 51.95 51.46
Phil Coke 80.48 0.98 -3.37 51.9 39.25
Nick Martinez 76.92 3.84 -8.05 51.86 49.41
Jordan Zimmermann 79.71 5.65 -6.73 51.77 56.4
Phil Hughes 77.22 6.62 -7.84 51.53 55.54
Zack Greinke 72.88 6.58 -7.08 51.41 43.17
Colby Lewis 77.7 6.12 -5.81 50.97 50.21
Joe Kelly 79.88 6.94 -8.44 50.7 64.12
David Price 80.37 2.84 -1.58 50.68 38.24
Tim Lincecum 75.63 3.81 -8.37 50.62 47.15
Junichi Tazawa 76.12 6.87 -8.14 50.61 54.27
Matt Garza 75.25 4.66 -8.62 50.4 48.74
Jake Peavy 80.56 2.49 -1.91 50.39 38.81
Joba Chamberlain 79.74 5.1 -6.03 50.25 53.43
John Lackey 79.16 5.54 -5.39 50.13 51.3
Grant Balfour 82.74 2.44 -1.31 50.07 42.27
Wei-Yin Chen 74.9 3.11 -6 50 37.62
Danny Duffy 78.22 3.8 -6.76 49.95 48.98
Michael Wacha 75.4 4.99 -6.15 49.88 43.24
Zack Wheeler 79.61 6.2 -8.2 49.75 61.25
Anthony Varvaro 80.73 4.14 -5.43 49.68 52.11
Shelby Miller 77.76 7.65 -5.21 49.51 52.05
Edwin Jackson 79.94 1.31 -3.97 49.41 40.28
Miguel Gonzalez 77.47 5.77 -6.3 49.39 50.22
Anibal Sanchez 79.85 3.27 -3.53 49.35 43.11
Jesse Hahn 74.64 8.63 -8.05 49.33 54.32
Brandon McCarthy 82.24 5.96 -4.24 49 56.44
Mat Latos 76.95 3.64 -4.84 48.99 40.53
Tanner Roark 74.25 6.13 -9.04 48.69 50.65
Vance Worley 77.87 6.06 -5.44 48.53 49.5
J.J. Hoover 75.78 6.82 -6.21 48.49 48.24
Jose Fernandez 83.56 8.96 -1.82 48.41 59.58
Felix Doubront 75.43 2.87 -8.77 48.33 45.72
Jordan Lyles 81.77 2.16 -3.92 48.31 46.3
Santiago Casilla 82.01 4.48 -5.46 48.18 55.95
Kevin Correia 79.47 6.02 -3.46 48.15 47.94
Erik Bedard 74.86 4.96 -6.51 48.07 42.86
James Shields 80.39 3.11 -4.08 47.79 45.51
Gerrit Cole 84.65 6.34 -3.47 47.79 60.91
Chase Anderson 77.79 4.59 -7.32 47.24 51.15
Rick Porcello 78.15 7.45 -5.93 46.98 54.45
Travis Wood 72.92 1.78 -6.47 46.64 31.32
Samuel Deduno 81.59 6 -5.39 46.63 58.04
Johnny Cueto 81.53 2.02 -1.73 45.52 39.62
David Buchanan 77.95 4.31 -8.22 45.48 53.31
Ian Krol 79.05 5.17 -4.38 45.11 47.56
Erasmo Ramirez 80.4 3.22 -1.79 45.05 39.68
Charlie Morton 78.99 9.72 -7.1 45.02 64.43
Matt Cain 78.2 7.49 -5.26 44.99 52.88
Jose Quintana 80.94 2.22 -2.39 44.96 40.41
A.J. Burnett 82.4 4.23 -5.31 44.89 55.94
Vidal Nuno 77.29 4.84 -5.16 44.68 44.76
Daisuke Matsuzaka 75.37 8.38 -6.13 44.63 50.41
Hector Noesi 81.07 5.54 -4.12 44.59 52.45
Jake Odorizzi 70.24 4.97 -8.77 44.52 37.95
Joel Peralta 78.14 5.41 -3.91 44.15 44.68
Joe Nathan 82.63 2.69 -1.83 43.9 43.93
Carlos Carrasco 81.71 6.55 -5.33 43.55 59.35
Josh Beckett 73.88 7.73 -7.83 43.33 50
Fernando Salas 83.76 1.47 1.74 43.17 34.49
Lance Lynn 80.15 4.98 -5.79 43.07 53.5
Jered Weaver 69.96 6.47 -3.84 42.49 27.42
Will Smith 79.06 4.48 -4.32 41.86 45.94
Hector Santiago 77.64 3.59 -0.53 41.31 30.6
Jorge De La Rosa 74.81 4.56 -6.26 41.19 41.21
Hyun-jin Ryu 73.1 5.04 -7.88 41.17 42.5
Matt Shoemaker 76.24 6.77 -1.78 41.06 37.45
Fernando Abad 78.63 4.08 -4.74 40.75 45.18
Ryan Vogelsong 77.55 2.32 -4.13 40.71 37.22
Nathan Eovaldi 76.8 7.12 -7.37 40.45 54.37
Jon Niese 74.5 2.71 -5.81 39.94 35.31
Dan Haren 77.88 3.67 -3.65 39.66 39.63
Brad Hand 80.04 5.66 -2.5 39.15 45.96
Franklin Morales 74.71 5.81 -5.22 38.68 40.9
Alfredo Simon 78.2 5.32 -4.82 38.35 47.04
Eric Stults 68.63 1.8 -5.12 37.89 17.63
Madison Bumgarner 77.56 5.55 -4.39 37.85 44.88
Yusmeiro Petit 77.55 7.25 1.56 37.82 32.71
Jeremy Guthrie 76.14 4.86 -3.22 37.74 36.93
Masahiro Tanaka 74.41 5.28 -6.35 37.48 42.06
Gavin Floyd 81.7 5.13 -3.23 37.24 50.69
Aaron Harang 74.67 2.85 -4.36 36.79 32.16
Jacob deGrom 80.26 4.49 -1.82 36.32 42.16
Jon Lester 75.95 4.82 -4.15 36.25 38.87
Homer Bailey 80.44 5.72 -2.91 36.13 48.13
Jose Veras 76.8 10.11 -5.62 35.11 56.15
Jerry Blevins 74.84 6.77 -4.32 35.1 40.88
Drew Smyly 78.4 3.38 -0.21 34.68 31.1
C.J. Wilson 77.18 5.52 -4.61 34.67 44.5
John Danks 74.15 2.51 -1.84 34.63 23.5
Julio Teheran 74.02 6.32 -4.89 34.54 39.49
Jacob Turner 79.14 3.29 -2.46 34.4 38.63
Tommy Milone 75.46 4.13 -2.38 34.24 31.52
Tim Hudson 76.14 8.32 -4.29 33.02 47.21
Max Scherzer 78.2 5.92 -2.31 32.64 41.66
Hiroki Kuroda 77.15 4.58 -2.65 31.98 37.2
Paul Maholm 72.76 5.41 -5.55 29.91 36.31
Carlos Villanueva 76.58 4.05 -1.53 29.9 31.74
Scott Carroll 77.43 7.38 -2.75 25.13 44.15
Mark Buehrle 72.16 3.91 -3.74 23.02 26.85

 


Does Seeing More Pitches Lead to More Runs?

There are many notions or perceived notions in baseball that are commonly false. For example, pundits throughout time have often suggested that a good hitter provides protection for another good hitter. Studies have been done on this and it is false. Another commonly stated notion, is that seeing a lot of pitches is a good thing. This notion is not only stated by former players, making constant sets of statements based on no evidence or facts, or by TV broadcasters who use a never-ending array of cliché lines, but also by smart sabermetricians.

But is this notion true? Does seeing more pitches really lead to more runs? First and foremost, I want to thank Owen Watson, who on September 30th 2014, came out with an article for The Hardball Times displaying that there is a correlation between seeing pitches and drawing walks (you can find his article here). This is basically where I got the idea for this study. The study was well done, however, I don’t think it was asking the right question. While yes, there is a correlation between seeing pitches and walks, and walks are good, this doesn’t necessarily mean that seeing more pitches leads to more runs or that seeing more pitches is necessarily a good thing. There are other factors that one must consider in order to be able to come to this conclusion (Watson’s article was on pitching efficiency, and I want to make it clear that I’m only focusing on this specific aspect of the article).

For example, the Red Sox in 2014 saw a lot of pitches yet they weren’t one of the top teams when it came to run scoring. Also, the Royals went all the way to the finals last year, and they don’t exactly see a lot of pitches. In fact they’re famous for having a bunch of free swingers on the team. Finally, while getting into deep counts leads to more walks, it’s also very possible that it will lead to more strikeouts. This is what made me question whether seeing more pitches is a good thing. While Watson’s study looked at the correlation between walks and pitches per plate appearance,  it ignored several other factors that could contribute to seeing a lot of pitches being counterproductive.

Ok, now let’s get to the fun stuff. The way I constructed this study was rather simple and I basically used the same model Watson did for his study, I just changed the BB% to R/G (runs per game). Below is a chart that examines the correlation between Pit/PA (pitches per plate appearance) and R/G (runs per game) for every team, for the 2014 season. The X-axis represents the teams. Then you will notice two data points on the Y-axis — the blue represents R/G, and the red represents Pit/PA. Oh and if you don’t know what LgA is on the X-axis, that’s the league average.

123

So there it is. As you might be able to tell there is no real correlation between pitches seen and runs scored. The correlation coefficient, by the way, is R = -0.0486. If you are unfamiliar with correlation coefficients, all you really need to understand is a correlation coefficient of 0 displays no real correlation between the data. The correlation here is slightly negative but it’s too small or too close to zero to really be interpreted as a negative correlation.

You might, at this point, find this data hard to believe. Well, I would ask you to consider this; strikeouts as I’ve already mentioned, and can’t keep mentioning enough, are at an all-time high. Going deeper into counts therefore puts one at a higher risk of getting struck out. This may be one of the explanations for the data above. Also, seeing more pitches means you are wearing the starting pitcher out, meaning you are far more likely to face the bullpen. This is not necessarily a good thing! Bullpen pitchers are better than ever. Facing the bullpen, in today’s game, may actually be counterproductive.

Now let’s consider one final element. This study is not perfect and has a few flaws. Most notably, it only takes into account 2014. This after all may have just been a blip on the radar. I will therefore be looking at more of this data to truly examine whether this data is 100% accurate. I will also take a look at the correlation between pitches seen and K% to get a better and further understanding of whether it is beneficial to see a lot of pitches. I just thought that this data point was simply too interesting not to be shared especially as we head into a new season of baseball. Hopefully this will allow people to be more critical when they are watching the game and listening to pundits speak on TV. Remember, just because someone says something doesn’t mean it is true.

Thanks to Owen Watson for doing his study in The Hardball Times; he now writes for FanGraphs. The data was also all found at Baseball Reference.


The Most Signature Pitch of 2014

If you were feeling charitable, you could say this post owes a lot to Jeff Sullivan’s recent set of articles examining pitch comps. If you weren’t feeling charitable, you could say this post is a shameless appropriation of his ideas. Either way, you should read those articles! They were very good, and very entertaining, and directly inspired this post. There were seven, in total: here, here, here, here, here, here, and here. I’ll wait.

Back? Good! In the comments of the third article, someone asked Jeff about finding the “most signature” pitch, or the pitch with the worst/fewest comps. Jeff said: “Wouldn’t be surprised if it was Dickey or the Chapman fastball. That math… I’m afraid of that math, but I might make an attempt.” Jeff has looked at unique pitches twice (Carlos Carrasco’s changeup and Odrisamer Despaigne’s changeup, the last two articles linked above), but I wanted to attack the question in a less ad-hoc fashion, looking at all pitches rather than singling some out.

Jeff wasn’t wrong, though – the math is not simple. His methodology doesn’t really work here for a couple reasons. First of all, I’m looking for uniqueness rather than similarity. I could just flip Jeff’s method around and look for high comp scores, like what he did for the Carrasco/Despaigne changeups, but I also want to consider all pitch types. Again, Jeff sort of did this in the Despaigne article, by comparing his changeup to a few different pitch types, but that is not really feasible for every pitch thrown.

What this means is that a new method is needed to directly calculate dissimilarity. We could find the maximum distances from the mean (basically Jeff’s method), which would work for a single pitch type: if all the pitches are clustered together, with similar velocities and breaks, calculating the distance from the mean to find the weirdest pitch makes sense. But consider this hypothetical set of pitches, graphed on two axes for simplicity:

hypothetical pitches

Obviously, the pitch that corresponds to the red point is the sort of thing we’d like to identify as unique. It’s also exactly at the center of that dataset, and would show up as the least unique pitch, if distance from the mean was used to determine uniqueness. Luckily, there’s an algorithm that is designed to find outliers in a more rigorous way.

This is where the math gets scary. The algorithm is called Local Outlier Factor analysis, which identifies outliers in a dataset based on the density of data around that point as compared to its neighbors. In this context, the density around a point is a function of how similar the best comps are for each pitch. Each point gets a score, where anything near 1 indicates normal, and higher values indicate greater isolation. I’m not going to go into detail, but if anyone wants to learn more, feel free to ask in the comments, or just Google it. It’s fairly simple to run it on all pitches, with the relevant variables of velocity, horizontal break, and vertical break.

Any pitch thrown more than 100 times in 2014 was included, and righties and lefties were considered separately (since pitches that move the same way obviously are very different based on what side of the rubber they come from). But enough about methodology! Here are the top five most signature pitches, for righties and lefties, along with their LOF scores, followed by some gratuitous gifs.

RIGHTHANDERS

Name Pitch Velocity H.Mov V.Mov Outlier Score
R.A. Dickey Knuckleball 76.6 0.2 1.6 2.26
Mike Morin Change 73.7 2.0 5.7 2.16
Steven Wright Knuckleball 74.2 0.7 0.3 2.13
David Hale Fourseam 91.9 4.2 5.8 2.04
Pat Neshek Change 70.9 7.0 3.5 1.00

LEFTHANDERS

Name Pitch Velocity H.Mov V.Mov Outlier Score
Aroldis Chapman Fourseam 101.2 3.7 11.1 2.53
Erik Bedard Slider 73.6 2.0 4.1 2.19
Sean Marshall Curve 74.4 9.5 -6.7 1.91
Dan Jennings Fourseam 93.6 4.9 5.8 1.86
Zach Britton Sinker 96.2 8.6 4.7 1.85

 

 

Chapman fastball

It’s nice when things work exactly like you expect them to. The top pitches on the two lists are incredible, and incredibly unique, and while it’s not a surprise to see them here, it does provide some reassurance that this measure is doing what it’s supposed to. Everyone knows about Dickey’s knuckleball, and if anything, it’s underrated by this measure. Since it moves so randomly, the knuckle’s season averages end up being slow and pretty much neutral horizontally and vertically. While that’s enough to make them show up as very odd under this measure, the individual pitches don’t often follow that straight trajectory, as seen in the above gif. The same can be said for Steven Wright’s knuckleball in third, but it’s nice that this measure still picks them out as unique pitches.

As for Chapman, there’s not that much to say about his fastball that hasn’t already been said. It feels wrong in some way to call his fastball strange, since it is disturbingly direct in practice, but there was truly no pitch like it in 2014. The velocity is the carrying factor behind the massive outlier score, almost a full 2 MPH greater than the next fastest pitch. Interestingly, Chapman’s pitch was the only one in either top five with notably high velocity.

Looking at the weirdest pitches in baseball, what can we conclude about them as a group? First, the pitchers throwing them are generally not bad. While you’d expect someone to be at least halfway decent to get in the position to throw 100 pitches of a single type, the owners of these pitches averaged about 1 WAR in 2014. With eight of these 10 throwing primarily in relief, and having only 710.2 innings collectively, that comes out to a very respectable 2.4 WAR/200.

The pitches themselves varied in usage, from Neshek’s change, thrown 13.4% of the time, to Britton’s sinker, thrown 89.3% of the time. They also varied in effectiveness, as measured by run values, from Neshek’s 3.6/100 to Marshall’s -1.63/100. Overall, the best pitch is probably Chapman’s fastball, followed by Britton’s sinker, given both the results on those pitches and how often they use them, but as a group, these pitches are pretty good. Maybe that isn’t totally surprising, but weird does not necessarily equal effective. Any pitcher could immediately have the weirdest pitch in baseball, if he threw 40 MPH meatballs, but less absurdly, mix and control matter just as much as the movement of the pitch.

Finally, all this stuff tracks fairly well with what Jeff identified previously. Obviously, he called Dickey and Chapman, but he also wrote this article about how Zach Britton’s sinker is pretty much comp-less, and we see that very pitch in fifth for lefthanders. Odrisamer Despaigne’s change was 12th for righthanders. Interestingly, Carrasco’s change is 98th on that same list, indicating this method doesn’t think he’s incredibly unique. Overall, this was mostly just a fun exercise, but maybe there’s more to this list, so if you want to poke around, it’s in a public Google Doc here. And like I said, if you have any questions about the methodology or anything like that, I’d be glad to answer them in the comments.


Hardball Retrospective – The “Original” 1980 Kansas City Royals

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. Consequently, Babe Ruth is listed on the Red Sox roster for the duration of his career while the Orioles claim Eddie Murray and the Cubs declare Lou Brock. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the finest single-season rosters for every Major League organization based on overall rankings in OWAR and OWS along with the general managers and scouting directors that constructed the teams. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. Additional information and a discussion forum are available at TuataraSoftware.com.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

Assessment

The 1980 Kansas City Royals         OWAR: 42.6     OWS: 272     OPW%: .596

GM Cedric Tallis acquired two-thirds of the ballplayers on the 1980 Royals roster. The organization selected 24 of the 33 players during the Amateur Draft. Based on the revised standings the “Original” 1980 Royals amassed 97 victories and captured the American League pennant by a five-game margin over the Oakland Athletics.

George Brett was batting .337 when he returned to the lineup on July 10 following a month-long absence. “Mullet” went on an absolute tear, collecting 71 hits in 150 at-bats (.473 BA) and driving in 47 runs to boost his average to .401 on August 17. Brett hovered around the elusive .400 mark into the middle of September 1980 before settling for a .390 BA. In addition to securing his second batting title, he recorded personal-bests in RBI (118), OBP (.454) and SLG (.664) while collecting the American League MVP Award. Brett was selected to 13 consecutive All-Star contests (1976-1988), registered 3154 base hits and supplied a .305 career BA.

Fleet-footed left fielder Willie Wilson paced the Junior Circuit with 230 base knocks, 133 runs scored and 15 triples. He earned the Gold Glove Award, manufactured a .326 BA and nabbed 79 bags in 89 attempts after swiping 83 in the previous year. John “Duke” Wathan (.305/6/58) pilfered 17 bases and established a career-high in batting average while shortstop U.L. Washington contributed 11 three-baggers and stole 20 bases.

Outfield chores were handled by Wilson, Ruppert Jones, Clint Hurdle and Al Cowens. Jones backed the club’s baserunning endeavors with 18 stolen bases but otherwise yielded substandard output compared to the 21 home runs and 33 steals from his ’79 campaign. Cowens (.268/6/59) provided further proof that his runner-up finish in the 1977 AL MVP race was an outlier. Hurdle (.294/10/60) drilled 31 doubles and registered personal-bests in virtually every offensive category.

Slick-fielding second baseman Frank “Smooth” White collected six consecutive Gold Glove Awards from 1977-1982 while Rodney “Cool Breeze” Scott purloined 63 bases and legged out 13 three-base hits. Luis Salazar solidified the bench with a .337 BA following his mid-August promotion.

Brett placed second behind Mike Schmidt in “The New Bill James Historical Baseball Abstract” for the best third baseman of All-Time. White (31st) and Wilson (54th) finished in the top 100 at their positions while Dan Quisenberry placed sixty-eighth among pitchers. 

LINEUP POS WAR WS
Willie Wilson LF 7.86 31.52
Frank White 2B -0.08 12.93
George Brett 3B 8.36 36.2
Clint Hurdle RF 1.77 14.01
Al Cowens DH/RF -0.76 10.67
Ruppert Jones CF 0.84 7.2
John Wathan C 2.39 16.49
Ken Phelps 1B -0.06 0.01
U. L. Washington SS 2.1 16.13
BENCH POS WAR WS
Luis Salazar 3B 1.11 7.11
Rodney Scott 2B 0.36 13.18
Jim Wohlford LF 0.36 4.92
Jamie Quirk 3B 0.06 3.47
German Barranca 0 0
Onix Concepcion SS -0.18 0.05
Jeff Cox 2B -0.78 1.32

Dennis Leonard eclipsed the 20-win plateau for the third time in four seasons. Pacing the circuit with 38 starts, Leonard also served up the most gopher balls (30) and earned runs (118) in the American League. Rich Gale (13-9, 3.92), Renie Martin (10-10, 4.39) and Paul Splittorff (14-11, 4.05) provided adequate support in the starting rotation.

The back-end of the bullpen pitched “lights-out” ball for the Royal Blue crew. Dan Quisenberry perplexed the opposition with his unorthodox delivery. “Quiz” tallied 12 victories and topped the leader boards with 33 saves and 75 appearances. Rookie right-hander Doug Corbett (8-6, 1.98) saved 23 contests and finished third in the 1980 AL Rookie of the Year vote. Greg “Moon-Man” Minton added 19 saves and fashioned a 2.46 ERA while Aurelio “Señor Smoke” recorded 13 wins in relief.

ROTATION POS WAR WS
Dennis Leonard SP 3.28 17.1
Rich Gale SP 1.78 10.92
Paul Splittorff SP 1.48 10.31
Renie Martin SP -0.72 5.03
Steve Busby SP -0.61 0
BULLPEN POS WAR WS
Doug Corbett RP 5.8 23.88
Dan Quisenberry RP 2.38 19.09
Greg Minton RP 1.5 12.69
Bob McClure RP 1.42 7.9
Bobby Castillo RP 1.19 9.72
Doug Bird RP 0.82 4.89
Aurelio Lopez RP 0.79 12.85
Mark Souza RP -0.27 0
Craig Chamberlain RP -0.35 0
Mike C. Jones SP -0.41 0
Jeff Twitty RP -0.61 0.06
Mark Littell RP -0.67 0

 The “Original” 1980 Kansas City Royals roster

NAME POS WAR WS General Manager Scouting Director
George Brett 3B 8.36 36.2 Cedric Tallis Lou Gorman
Willie Wilson LF 7.86 31.52 Cedric Tallis Lou Gorman
Doug Corbett RP 5.8 23.88 Cedric Tallis Lou Gorman
Dennis Leonard SP 3.28 17.1 Cedric Tallis Lou Gorman
John Wathan C 2.39 16.49 Cedric Tallis Lou Gorman
Dan Quisenberry RP 2.38 19.09 Joe Burke Lou Gorman
U. L. Washington SS 2.1 16.13 Cedric Tallis Lou Gorman
Rich Gale SP 1.78 10.92 Joe Burke Lou Gorman
Clint Hurdle RF 1.77 14.01 Joe Burke Lou Gorman
Greg Minton RP 1.5 12.69 Cedric Tallis Lou Gorman
Paul Splittorff SP 1.48 10.31 Cedric Tallis Charlie Metro
Bob McClure RP 1.42 7.9 Cedric Tallis Lou Gorman
Bobby Castillo RP 1.19 9.72 Cedric Tallis Lou Gorman
Luis Salazar 3B 1.11 7.11 Cedric Tallis Lou Gorman
Ruppert Jones CF 0.84 7.2 Cedric Tallis Lou Gorman
Doug Bird RP 0.82 4.89 Cedric Tallis Charlie Metro
Aurelio Lopez RP 0.79 12.85 Joe Burke Lou Gorman
Jim Wohlford LF 0.36 4.92 Cedric Tallis Lou Gorman
Rodney Scott 2B 0.36 13.18 Cedric Tallis Lou Gorman
Jamie Quirk 3B 0.06 3.47 Cedric Tallis Lou Gorman
German Barranca 0 0 Joe Burke Lou Gorman
Ken Phelps 1B -0.06 0.01 Joe Burke
Frank White 2B -0.08 12.93 Cedric Tallis Lou Gorman
Onix Concepcion SS -0.18 0.05 Joe Burke
Mark Souza RP -0.27 0 Cedric Tallis Lou Gorman
Craig Chamberlain RP -0.35 0 Joe Burke John Schuerholz
Mike Jones SP -0.41 0 Joe Burke John Schuerholz
Steve Busby SP -0.61 0 Cedric Tallis Lou Gorman
Jeff Twitty RP -0.61 0.06 Joe Burke John Schuerholz
Mark Littell RP -0.67 0 Cedric Tallis Lou Gorman
Renie Martin SP -0.72 5.03 Joe Burke John Schuerholz
Al Cowens RF -0.76 10.67 Cedric Tallis Charlie Metro
Jeff Cox 2B -0.78 1.32 Cedric Tallis Lou Gorman

Honorable Mention

The “Original” 2009 Royals                OWAR: 45.7     OWS: 268     OPW%: .544

Zack Greinke (16-8, 2.16) claimed the 2009 AL Cy Young Award while pacing the League in ERA and WHIP (1.073). Carlos Beltran furnished a .325 BA despite missing all of July and August due to injury. Johnny Damon (.282/24/82) slashed 36 two-base hits and scored 107 runs. Billy “Country Breakfast” Butler clubbed 51 doubles and launched 21 long balls while batting .301.

On Deck

The “Original” 2012 Rays

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database – Transaction a – Executive

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive