Archive for Research

Linearization and Fantasy Baseball

Among the astounding phenomena abundant throughout calculus, linearization remains one of the least glamorous. It’s incredibly simple, taught in less than a day, and a more precise (and more complicated) method can often be substituted for it. On the other hand, it’s an incredibly powerful tool and one with weighty implications for fantasy baseball. Because of the concept’s relative simplicity, a reader with even the most basic inkling of what calculus actually is should be able to understand the idea of it, so don’t let a fear of mathematics deter you.

First, let’s think about graphs, functions, and derivatives. Put simply, continuous functions, whether they’re linear, quadratic, or exponential, will generally experience some rate of change — slope. Think of it as the change in the y direction per unit change in the x direction between two points. This is considered a secant line, or the average rate of change between two points. More interesting, however, is the concept of the tangent line, or the instantaneous rate of change at a given point. Note that the tangent line only touches the function at one point rather than two, meaning that we can easily evaluate and analyze the rate of change when comparing two points on a curve. Importantly, the magnitude of the slope of the tangent line tells us the rate by which the function is increasing or decreasing. So the greater the slope, the faster it is increasing (perhaps indicating an exponential function), and the lesser the slope, the more it is decreasing (a negative quadratic).

In calculus, the formula for linearization is:

L(x) = f(a) – f'(a)(x – a)

Here, given some value of a, we get a y-value, or f(a). From there, we subtract the product of the derivative of f(a) and the difference between the value we are estimating, x, and the value we already have, a. This gives the linear approximation and we get a pretty good estimate.

When rendered down to its most basic essence, linearization is a glorified form of estimation that gives credence to gut instinct through a formula. Using the tangent line at a certain point, one can make very incremental estimations, but it’s important to note that they must be very small. The farther from the initial point a that one travels to find an approximation of y, the less accurate the result will be.

It seems that this would have little application to baseball, but that’s incorrect. Recently, I started toying with a couple of formulas that could actually have some importance in the realm of amateur fantasy baseball with the usage of a regression line for an entire player’s career in pretty much any statistic.

L(x) = f(k) – f'(a)(x – a)

Here, f(k) is the actual value at the known point (k), f'(a) is the derivative of the predicted point on the regression line, x is the point for which we are predicting the value, and a is the value we start from.

L(x) = f(a) – f'(a)(x – a)

Differing here, f(a) is the predicted value at the regression line, f'(a) is the derivative of the predicted point on the regression line, x is the point for which we are predicting the value, and a is the value we start from.

I don’t know which would work best, but my guess is that first formula would be most accurate due to its mix of actual and predicted values. Neither of them would be terribly precise, but it’s a heck of a lot better than relying on what you feel might be best.

Regardless of which formula you might prefer, the implications of the linearization idea as applied to fantasy baseball are apparent. Probably best used for 10-day predictions, linearization mixes short-term performance with long-term talent to assess how well a player might perform for a short period of time — whether he’s likely to continue streaking, slumping, or somewhere in between. Rather than having to rely on gut instinct or dated and/or biased statistical analysis, a fantasy player could rely on some concrete math to make short-term decisions. This would be especially helpful in leagues that play for only a month, or can only alter their rosters once a week, or even at the end of a highly competitive season (perhaps making the risky move of dropping a slumping MVP for the streaking rookie).

It’s understandable if it’s unclear how to use one of the formulas at this point. To simplify matters, let’s use formula 1 to demonstrate how this might work in regard to something as simple as batting average. So what you might have is a regression line for a player of rolling 10-game predicted batting averages plotted along with actual values. In this case, x-values are 10-game rolling averages by each 0.01 (the intervals are arbitrary). So 1.1 is the x-value at 110 games played, while 1.2 is the x-value at 120 games. Let’s just say for simplicity that the player has played 110 games in his career, had an actual average of .264 during the last 10-game stretch, and the derivative of the regression line at this point is 0.12. We want to guess his average for the next 10 games, up to career game number 120.

L(1.2) = .264 – (0.12)(1.2 – 1.1)

L(1.2) = .254

We’d expect him to hit .254 over the next 10 games. Hopefully that makes some sense. Obviously it’s still in development and I haven’t done a whole lot of research yet, but expect some to come out later along with some clarifying material if necessary. Confusion is to be expected, but with some explanation applied linearization could potentially help a lot of people out next season in fantasy.


The Case for No Starting Pitchers in the National League

I’ve watched many a baseball game over my lifetime (that’s 50+ years), and I’ve cringed every time I see a National League manager send his starting pitcher up to bat any time prior to the seventh inning. Especially with runners on base! Doesn’t he know that pitchers can’t hit? Doesn’t he know that if he would just pinch-hit for the lame-batting starter he’d improve his team’s chances of winning?

So, after years of pondering this problem for five seconds at a time every couple of days, I decided to see if I could build a solid quantitative case for never letting a pitcher come to the plate for a National League team (obviously this is not an issue for the American League with their designated hitters). How would this change the look of the team’s pitching staff? And more importantly, how many more games would a team expect to win in a season if they adopted a “pitchers never bat” strategy?

The answer to the first question is pretty easy. The staff would “look” different. There were would be no more “starting pitchers.” A team’s pitching staff would consist only of “relievers.” Sure, one of the “relievers” would throw the first pitch of the game and could technically be called a “starter,” but given that he’ll be taken out of the game as soon as his spot in the batting line-up comes up, he’s effectively a “reliever,” just like the other 10 or 11 guys on the staff.

Now, the conventional wisdom would say that the current starting pitchers, especially the “aces,” get in a groove, and can give you six or seven solid innings. Why would anyone take them out the game in the second or third inning? Well, let’s do a “cost-benefit” analysis and see if we can make a case for “The Pitchers Never Bat” strategy.

 

Key Components of the Case:

The two primary components of the analysis are 1) how many more runs would a team expect to score in a season by pinch-hitting for every pitcher, and 2) how many more runs would a team expect to give up in a season because their starting pitchers are no longer going six, seven, or more innings in an outing? Or, maybe the team adopting such a strategy would actually give up FEWER runs per year by giving up on the century-old strategy of planning for the starting pitcher to pitch deep into the game.

A third component of the analysis could include the benefit of being able to choose from any of the team’s entire staff (probably 11 or 12 pitchers) and use only the ones that look like they’ve got their “stuff” while warming up before the game, instead of sticking with the “starter” who is scheduled to pitch today because it’s his turn in the “rotation.”

A fourth component of the analysis could include the benefit a team could achieve because the other team can no longer stack their starting batting order with a lot of lefties (to face a right-handed starter), or with lot of righties (to face a left-handed starter), because the team with no “starters” will pinch-hit for their first pitcher after one, two, or three innings. So, in total, the “handedness battle” tilts slightly more in favor of the team implementing the new strategy.

A fifth component could include the cost (or benefit) of reducing the size of the pitching staff by one or two, and adding one or two more everyday players, who would be needed to pinch-hit in the early innings.

A sixth component could be an added benefit that batters will not be able to get “used to” a pitcher by seeing them multiple times in a single game. Under the new strategy batters will see each pitcher once, or, at most, twice in a game.

I’m going to focus on the two primary components above, and let the lessor components alone for now. Perhaps others can weigh in on how to quantify the potential impacts of these changes.

 

Component #1: How much more offense will the “Pitchers Never Bat” strategy create?

This is the easiest of the components to quantify. I will use the wOBA (weighted On Base Average) statistic as defined and measured by FanGraphs to evaluate this component. Let’s start with some basic information and rules-of-thumb.

Using data from the National League for the 2015 season I find that pinch-hitters have a wOBA of .275 across the entire league, while pitchers, when batting, had a wOBA of just .148 across the entire league. The difference in wOBA between pinch-hitters and pitchers is .127 (that’s .275 minus .148.) Note that all position players in the NL combined for an average wOBA of .318 in 2015. I’m assuming that our new pinch-hitters won’t get anywhere near that figure, but will be comparable to the 2015 pinch-hitters, who came in way lower, at .275.

Now, let’s assume we can replace every pitcher’s plate appearance (PA) with a pinch-hitter. This improvement of .127 in wOBA needs to be applied 336 times per season, because that was the average number of times that a National League team sent their pitchers up to the plate in 2015. And lastly, we need to know two rules of thumb from FanGraphs that are needed to complete the analysis of the first component: 1) every additional 20 points in wOBA is expected to result in an additional 10 runs per 600 plate appearances, and 2) every 10 additional runs a team expects to score in season translates into one additional win per year. OK – so, let’s do the math:

If 20 additional points of wOBA translates into 10 runs per 600 PA, then our new pinch-hitters who are now batting for pitchers will provide the team with 63.5 incremental runs per 600 PA (which equals 127/20 * 10.) And since these pinch-hitters will be coming to the plate 336 times, not 600 times, we need to reduce the 63.5 incremental runs per season down to 35.6 incremental runs per season (which is 336 / 600 * 63.5).

Finally, the last step is to take our 35.6 incremental runs per season and translate that into incremental wins per year using the rule-of-thumb that ten runs equates to one win. Therefore, our 35.6 extra runs results in an expected 3.6 incremental wins per year. That’s a decent-sized pick-up in expected wins.

OK, so now, what about the pitching staff? Will replacing the conventional pitching staff with a staff consisting of no starters and all relievers cause the runs allowed to increase, and if so, by how much? Enough to offset our 3.6 extra wins that we just picked up on offense?

 

Component #2: How many more runs will pitchers give up using the “Pitchers Never Bat” strategy?

Imagine, for the moment, that a GM is to build his pitching staff from scratch. (We’ll worry about how to transition from a conventional staff to an all-reliever staff later.) And let’s just assume he’ll pick just 11 pitchers. (Most NL teams use 12-man staffs while some use 13, so that will give the team one or two additional position players.) Currently, starting pitchers typically throw 160-200 innings per season, and relievers tend to throw 50-80 innings per season. But with the new all-reliever strategy, and using only 11 pitchers, each of our new guys will need to average around 130 innings each, with perhaps some pitching as much as 160, and some as low as 100 innings per year. So, the GM is looking for 11 guys who can each contribute 100-160 innings per season. Each outing will be for about one to three innings for each pitcher. How will they fare?

Let’s look at the National League’s pitchers for 2015. Starting pitchers had an aggregate WHIP (Walks Plus Hits per Inning Pitched) of 1.299, while relievers, in total, recorded an identical WHIP of 1.299. So my takeaway from this is that the average starter was equally as good (or bad) as the average reliever. From this, I am going to take a leap of faith, and assume that a staff of 11 new-style relievers could be expected to perform equivalently. (And that doesn’t even factor in some of the lesser elements of the new strategy, as mentioned above, such as Components 3 and 4 of the analysis.)

From this, albeit simplified, evaluation of Component #2, I estimate that a team moving to an all-reliever pitching staff will have an expected change in Runs Allowed of zero, and therefore the change will neither offset, nor supplement, the offensive benefit evaluated in Component #1.

 

Conclusion and Final Thoughts

In summary, using the two primary components of my analysis, I estimate that adopting a “Pitchers Never Bat” strategy in the National League (a.k.a. an “All Reliever Pitching Staff” strategy) will improve a team’s offense by an expected 36 runs per year, which will increase the team’s expected win total by 3.6 games. I estimate that the impact on runs allowed will be near zero. Some lesser elements, Components #3 through #6, could also add some additional value to the strategy.

Implementing the strategy does not necessarily need to be a complete, 100% adoption of the “pitchers never bat” rule. Modifications can be made. Perhaps a pitcher is doing well through two innings and comes to bat with two out and no one on base. In this case the manager could let the pitcher bat, so that he can stay in and pitch another two or three innings. This would change the name of the strategy to something like the “Pitchers Very, Very Rarely Bat” strategy.

As far as transitioning to an all-reliever staff from a conventional staff, it could be done over time, or only in part, such that a team could maintain, say, its two top aces, and complement them with eight or nine relievers. This way, the aces could pitch as they do now, going six-plus innings, every fifth day, while limiting the “Pitchers Never Bat” strategy to the three out of the five days when the two starters are resting.

Finally, let’s try to put a dollar value on this new strategy. The guys at FanGraphs, and other places, have tried to estimate how much teams are willing to pay for each additional win. Without going into all the various estimates and approaches at trying to answer that question, let’s just go with a simple $8 million per win. I’m sure it could be argued to be more or less, but let’s just put $8 million out there as a base case. If that’s true, a 3.6-win strategy, such as the “Pitchers Never Bat” strategy, is worth about $29 million per year. Go ahead and implement the strategy now, and, if it takes, say, three years before any of the other NL teams catch on, you’ve just picked up a cool $87 million (3 * 29 million).

And if the other components of the analysis (#3 through #6) are quantified and it can be determined that they add another 0.5 wins per year, which I think is quite doable, then we can get the total up to 4.1 wins per year, for a value of $33 million per year, or just around a cool $100 million over the first three years. And that’s how you make $100 million without really trying!


Predicting the Next 300-Game Winner

With the special attention pitchers receive today, such as pitch counts, innings limits, as well as the host of PITCHf/x data that can notify teams of when a pitcher is fatigued, it seems like they days of 300-game winners have come and passed. And for the most part, some of this is true. We’ve seen pitchers be shut down during their earlier years to prevent injuries, such as the Nationals keeping a close eye on Stephen Strasburg. When we think of 300 wins, the math isn’t that hard. It’s some combination of 15+ seasons of 15+ wins over an entire career. Let’s dive in to what further breaks down these pitchers.

I gathered data on pitchers who finished their careers after 1980 as well as pitchers younger than that; I did this to avoid looking at pitchers such as Cy Young who are a little tough to compare to the modern day, with rule changes and the different run-scoring environments. In my query, I looked at pitchers with at least 250 wins. This gave me more data, and since 250-win pitchers are reasonably close to 300, it will allow me to get at what exactly creates a pitcher of this caliber.

My list included 19 names:

Greg Maddux

Roger Clemens

Steve Carlton

Nolan Ryan

Don Sutton

Phil Niekro

Gaylord Perry

Tom Seaver

Tom Glavine

Randy Johnson

Tommy John

Bert Blyleven

Fergie Jenkins

Jim Kaat

Mike Mussina

Jamie Moyer

Jim Palmer

Andy Pettitte

Some of these guys were absolute iron men, pitching over 5000 innings in their career. Maddux did this, as well as Carlton, Ryan, and Sutton. Most of this group barely reached 12 wins per season, showing that they reached the 300-club with longevity, not necessarily dominance. The other guys on this list, by default, either had higher win totals or pitched forever, but without racking up a ton of innings (Kaat, Moyer). Surprisingly, or perhaps not, only four of the 19 pitchers did not pitch for 20 seasons, so again, dominance might not be the key factor — instead, longevity.

I then looked at where these pitchers were at when they were 30 years old. Thirty years seems to be about a halfway point, but the data indicates otherwise. In fact, only three of these 19 pitchers had at least 150 wins at 30. This again drives home the point that these pitchers do not necessarily have to be untouchable every single year they pitched; it just means they have to be pitchers that stay healthy and can pitch for a long, long time. At the same time, the average pitcher on this list had 115 wins at 30, so they did need to have a productive youth in terms of racking up wins.

Here is a table displaying the careers of our 19 pitchers:

screen-shot-2016-10-26-at-4-07-45-pm

The amazing part, at least in my opinion, is that these pitchers almost seemed to get better with age, at least in terms of wins. I know that wins is not a good stat for tracking the effectiveness of pitchers, but since we are talking the 300-win club, it is what we have in front of us. Anyways, 17 of these 19 pitchers had more wins after 30 than they did before. Again, this hammers home the idea that longevity and durability is more important than complete dominance. Yes, you have to be a good, if not great, pitcher, but you also have to stay healthy.

So when looking at current pitchers that possibly have a chance at 300, I filtered through active pitchers fulfilling a few different qualifications. First, the pitcher must have at least 190 innings pitcher per year, including years of injuries (this helps get at longevity and durability). Also, the pitcher must also average at least 12 wins per year. I came up with a group of pitchers who where close to matching these requirements. From this list of 14 pitchers, I think eight or so have the best chance of eclipsing 300.

Here is a table of possible contenders:

screen-shot-2016-10-26-at-4-06-43-pm

This list includes: Clayton Kershaw, Chris Sale, Justin Verlander, Madison Bumgarner, David Price, Rick Porcello, Jon Lester, and Felix Hernandez. CC Sabathia, although at 223 career wins, does not make this list, since I don’t think he has 5-8 more seasons of decent pitching in front of him. I will go into each pitcher in more detail to describe what each pitcher needs to do to have a chance.

I’m going to start with Lester. Lester is currently at 146 wins, with 2003 regular-season innings pitched. He has been great through his first 11 seasons, in nine of which he was a full-time starter. In those nine seasons, he failed to pitch 200 innings just once, when he posted 191.2 innings pitched. He has been an iron man, and at age 32, the recipe is simple. He just needs to stay healthy and he needs his game to age well. This is going to be a repetitive theme, but to be honest, that’s what we would expect. Things helping Lester? Well, playing for the Cubs is one. Not only do they have a great defense, but they also create great run support, which can help Lester pick up a lot of wins. He was 19-5 this past year, matching his career high in Boston in 2010.

Now on to Justin Verlander. After an injury-riddled 2015, Verlander was great this year, posting a 16-9 record and an ERA of 3.04 (FIP of 3.48). Currently, he sits at 173 wins and is 33 years old. I mentioned his injury struggles in 2015. He only pitched 133 innings. In his 11 years as a full-time starter, that was the only the second time he failed to reach 200 innings pitched. People may worry that Verlander is starting to lose his velocity, which could mitigate his effectiveness, but in 2016, he struck out batters at a career-high rate and also had a career-best strikeout to walk ratio. Verlander is back with the elite, and if he can avoid injury trouble, he deserves to be in the discussion for a possible 300-win flirtation.

I’ll now move on to Clayton Kershaw. Kershaw has been the best pitcher in baseball for the past five years, and has only struggled with injuries for this past year, when he hit the DL with back issues. He still picked up 12 wins, and looked like peak Kershaw when he came back. Kershaw continues to strike out hitters and not allow walks, and in his shortened 2016, he posted a career-best FIP. Kershaw currently sits at 126 wins, and is 28 years old, in the middle of his prime. I think there are two factors that could keep Kershaw from getting close. The first one is his back. The Dodgers shut Kershaw down for half the year, and hopefully it heals, but if it is one of those lingering injuries that can also affect his timing a delivery as well as his overall health, he won’t be able to age his game to the necessary limits needed to hit 300. Also, he should get more wins. I’m not sure this will be a big factor now that the Dodgers have Andrew Friedman at the helm, but if he cannot get the run support he needs, that could lead to two or three fewer wins every year.

Chris Sale is next. Sale sits at 74 wins and is 27. He has some work to do. He has been relatively healthy, however, over his five full years as a starter. I think the best bet for Sale is to get out of Chicago, or at least the White Sox, and get on a team that can give him some good defense and offense. His win totals just aren’t high enough, but he is young enough where if he finds a new team and can age well, he might be able to hit 250.

I’ll do Bumgarner next. He really hasn’t had any injury trouble in his six years as a full-time starter. He is 27 and has 100 wins. He is a little harder to project, but I would say he’s got a better shot than Sale. I mean, he is already at 100 and only 27. Kershaw might have a leg up on him, but MadBum has been able to stay healthy. To be honest, Kershaw had been healthy too before this year, which somewhat shows that pitching 20 full seasons does not happen to often. Anyways, Bumgarner hasn’t quite been as dominant as some of the other names on this list, but he has been very good, and has stayed healthy. He is on a solid team with a good defense. The conditions are correct, he just needs to age well and stay healthy. I still like Kershaw’s odds a little more, but Bumgarner’s are not far behind.

Now I’ll move on to David Price. Price is 31, has 121 wins, and has pitched relatively healthy for seven full seasons. He is on the Red Sox now, which — although their poor defense won’t help some of his pitching metrics, they should give him the run support he needs. He wasn’t terrible this year; I have a feeling people think he fell off the map. He had 17 wins, and a ERA of 3.99 and a FIP of 3.60. His ERA and FIP were at career highs, but the FIP really wasn’t too far off what we’d expect. I’d credit the higher ERA to playing in Fenway with not the best defense behind him. Price may not be as dominant as he once was, but the Red Sox should give him support. He might be a little behind pace, but he could be the next CC Sabathia or Mike Mussina, where upon retirement, we say, “I didn’t realize he had 260 wins!” For the record, I doubt CC gets there, but the point is that if Price can stay healthy and moderately effective on a team that will support him, he may be able to move up in the wins chart. Will he hit 300? I don’t see it, but realistically, I’m not sure any of these guys will.

Now I’ll move on to the other Red Sox pitcher on this list: Rick Porcello. Porcello had a modest beginning in Detroit, but his FIP always seemed to outperform his ERA, so he has that going for him. Porcello is only 27 and somehow has 107 wins already. Although he is on the Red Sox, who can support him, Porcello really hasn’t been able to stay healthy over his career, and only eclipsed 200 innings pitched in a season twice: 2014 in Detroit, and this past season in Boston. Still, he is young, and if he can hang around awhile, he might be able to pick up 100 wins or more if he can stay decent on an offensive team. Again, he doesn’t need to contend for the Cy Young, but he has to stay relatively effective, so he keeps his starting spot and racks up wins.

Finally, I move on to my dark horse, King Felix Hernandez. Felix is only 30, but has been a full-time starter for 11 years. He sits at 154 wins. I feel like as a baseball community, we tend to forget about Felix. He has been very durable, although he hit the DL this past season by injuring his calf when celebrating a win. But hey, forgive the guy; he plays in Seattle, who hadn’t given him much help until recently. He is my dark horse on the list. He now plays on a good Seattle team, so he should be able to pick up wins. He might not be as good as he once was, but if he can stay effective, he has the best chance of anyone on this list. He can age well, he has stayed healthy, and he now plays on a winning team. The conditions are there, and I think he has the best shot of anyone on this list.

Realistically, if I had to choose between none of them winning 300 or one of them winning, that would be a much harder choice than picking one out of the group. Realistically, do I think any of these guys have a shot? Sure, but a shot is a lot different than actually getting there. Who knows, maybe one of these guys will age well and will stay healthy. Your guess may be as good as mine.


Hardball Retrospective – What Might Have Been – The “Original” 1902 Orphans

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the teams with the biggest single-season difference in the WAR and Win Shares for the “Original” vs. “Actual” rosters for every Major League organization. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com.

Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

AWAR – Wins Above Replacement for players on “actual” teams

AWS – Win Shares for players on “actual” teams

APW% – Pythagorean Won-Loss record for the “actual” teams

Assessment

The 1902 Chicago Orphans 

OWAR: 37.4     OWS: 280     OPW%: .527     (74-66)

AWAR: 29.9      AWS: 203     APW%: .496     (68-69)

WARdiff: 7.5                        WSdiff: 77  

The 1902 “Original” Orphans finished in third place, ten games behind the Reds. Bill Bradley (.340/11/77) thrived against opposing hurlers, notching career-bests in base hits (187), runs scored (104), doubles (39), home runs and batting average. “Bad” Bill Dahlen drilled 25 two-baggers and swiped 20 bags. Danny Green delivered a .302 BA and pilfered 35 bases. Jimmy “Pony” Ryan slashed 32 two-base knocks and produced a .320 BA. Johnny “Noisy” Kling succeeded on 25 stolen base attempts. Jimmy “Rabbit” Slagle executed 41 thefts and supplied a .315 BA for the “Actual” Orphans.

Bill Dahlen rated twenty-first among shortstops in the “The New Bill James Historical Baseball Abstract” top 100 player rankings. “Original” Orphans teammates registered in the “NBJHBA” top 100 rankings include Frank Chance (25th-1B), Johnny Evers (25th-2B), Jimmy Ryan (26th-CF), Joe Tinker (33rd-SS), Bill Bradley (46th-3B), Johnny Kling (48th-C) and Tom Daly (55th-2B). “Actuals” second-sacker Bobby Lowe placed fifty-sixth.

  Original 1902 Orphans                          Actual 1902 Orphans

STARTING LINEUP POS OWAR OWS STARTING LINEUP POS OWAR OWS
Jimmy Ryan LF/CF 3.26 18.59 Jimmy Slagle LF 5.11 22.25
Davy Jones CF 2.67 13.4 Davy Jones CF 2.67 13.4
Danny Green RF 3.52 20.73 John Dobbs RF/CF 0.8 8.31
Frank Chance 1B 2.66 12.37 Frank Chance 1B 2.66 12.37
Tom Daly 2B -1.87 10.46 Bobby Lowe 2B 0.79 10.24
Bill Dahlen SS 4.65 21.9 Joe Tinker SS 3.31 16.58
Bill Bradley 3B 5.38 25.61 Charlie Dexter 3B -0.47 4.12
Johnny Kling C 2.47 17.06 Johnny Kling C 2.47 17.06
BENCH POS OWAR OWS BENCH POS OWAR OWS
Charlie Irwin 3B 0.74 17.4 Dusty Miller LF -0.25 3.95
Joe Tinker SS 3.31 16.58 Art Williams RF -0.33 2.46
Harry Wolverton 3B 0.41 10.43 Larry Schlafly RF 0.46 2.15
Frank Isbell 1B -0.32 9.1 Bunk Congalton RF -0.99 1.55
Art Nichols 1B 0.09 8.68 Johnny Evers 2B -0.17 1.27
Malachi Kittridge C 0.59 8.44 Hal O’Hagan 1B -0.06 1.09
Duke Farrell C -0.06 5.46 Jack Hendricks RF 0.19 0.91
Dusty Miller LF -0.25 3.95 Germany Schaefer 3B -2.27 0.71
Art Williams RF -0.33 2.46 Sammy Strang 3B 0.07 0.42
Larry Schlafly RF 0.46 2.15 Jim Murray RF -0.52 0.27
Zaza Harvey RF 0.15 1.58 Mike Jacobs SS -0.15 0.18
Bunk Congalton RF -0.99 1.55 Mike Lynch CF -0.34 0.14
Johnny Evers 2B -0.17 1.27 Snapper Kennedy CF -0.06 0.14
Germany Schaefer 3B -2.27 0.71 Ed Glenn SS -0.08 0.1
Jim Murray RF -0.52 0.27 Mike Kahoe C -0.11 0.09
Mike Jacobs SS -0.15 0.18 Pete Lamer C -0.06 0.07
Mike Lynch CF -0.34 0.14 Dad Clark 1B -0.31 0.05
Snapper Kennedy CF -0.06 0.14 Chick Pedroes RF -0.1 0.03
Jim Delahanty RF -0.14 0.09 R.E. Hillebrand RF -0.06 0.01
Pete Lamer C -0.06 0.07 Joe Hughes RF -0.05 0
Dad Clark 1B -0.31 0.05
Chick Pedroes RF -0.1 0.03
R.E. Hillebrand RF -0.06 0.01
Joe Hughes RF -0.05 0

Jack W. Taylor (23-11, 1.29) paced the National League in ERA, shutouts (8) and WHIP (0.953). Mal “Kid” Eason contributed 10 victories with a 2.76 ERA and Carl Lundgren (9-9, 1.97) completed 17 of 18 starts during his rookie campaign. Jock Menefee (12-10, 2.42) and Pop Williams (11-16, 2.49) rounded out the rotation for the “Actuals”.

  Original 1902 Orphans                         Actual 1902 Orphans

ROTATION POS OWAR OWS ROTATION POS OWAR OWS
Jack Taylor SP 7.47 31.24 Jack Taylor SP 7.47 31.24
Mal Eason SP 0.55 12.06 Jock Menefee SP 1.82 14.41
Carl Lundgren SP 0.89 10.79 Pop Williams SP 0.7 13.84
Tom Hughes SP 1.4 9 Carl Lundgren SP 0.89 10.79
BULLPEN POS OWAR OWS BULLPEN POS OWAR OWS
Jim St.Vrain SP 0.51 5.85 Jim St.Vrain SP 0.51 5.85
Bob Rhoads SP -1.48 3.4 Bob Rhoads SP -1.48 3.4
Jack Katoll SP -1.74 3.04 Frank Morrissey SP 0.05 2.12
Alex Hardy SP -0.29 1.16 Mal Eason SP 0.13 1.41
Fred Glade SP -0.49 0.27 Alex Hardy SP -0.29 1.16
Jim Gardner SP -0.1 1.01
Fred Glade SP -0.49 0.27

 

Notable Transactions

Bill Bradley 

Before 1901 Season: Jumped from the Chicago Orphans to the Cleveland Blues. 

Bill Dahlen 

January 25, 1899: Traded by the Chicago Orphans to the Baltimore Orioles for Gene DeMontreville.

March 11, 1899: Assigned to the Brooklyn Superbas by the Baltimore Orioles. 

Danny Green 

Before 1902 Season: Jumped from the Chicago Orphans to the Chicago White Sox. 

Jimmy Ryan

Before 1902 Season: To the Washington Senators in unknown transaction.

Charlie Irwin

July 11, 1901: Released by the Cincinnati Reds.

July 12, 1901: Signed as a Free Agent with the Brooklyn Superbas.

Honorable Mention

The 1966 Chicago Cubs 

OWAR: 43.3     OWS: 235     OPW%: .510     (83-79)

AWAR: 27.1       AWS: 176      APW%: .364    (59-103)

WARdiff: 16.2                        WSdiff: 59

The “Original” 1966 Cubs placed fourth with a record north of .500 yet fifteen games off the pace of the Giants. Ron Santo (.312/30/94) merited Gold Glove honors for the third straight season and paced the circuit with 95 bases on balls and a .412 OBP. Lou Brock aka “The Franchise” tallied 94 runs and topped the National League with 74 stolen bases. “Sweet Swingin’” Billy L. Williams socked 29 long balls and registered 100 runs scored. Al “Red” Worthington (2.46, 16 SV) fashioned a 1.018 WHIP and secured the late-inning leads. Ernie “Mr. Cub” Banks contributed 23 two-baggers and a .272 BA. Ken Holtzman collected 11 victories while furnishing an ERA of 3.79 in his inaugural season.

On Deck

What Might Have Been – The “Original” 1921 Tigers

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive


2016 ALCS Game One: Batter vs. Pitcher Stats

The FanGraphs Twitter page tweeted out a bingo card for Game One of the ALCS. As I looked through it, I thought it was a terrific idea by Michelle Jay and a fun way to follow the game that night. I was going to play along, but then I had another idea. Some slots were much more likely to happen, such as the “Pitcher v hitter stats are mentioned” slot. I figured I would let somebody else receive a t-shirt and just count up exactly how many times the TBS broadcast team mentioned batter vs. pitcher stats. We all know announcers love doing this, and we all know that it’s pretty useless for predicting the outcome of that particular at-bat. I just thought it would be cool to experiment and see how many times they actually mentioned these stats.

First, I’ll just go over the final numbers for batter vs. pitcher stats. There were 65 batters in this game, and batter vs. pitcher stats were either mentioned by the announcers or shown on a graphic for eight of those batters.  There were two separate times where they showed a graphic and then mentioned the stats later in the plate appearance, or vice versa. Four of the eight instances occurred when the Jays were hitting against Corey Kluber, three of the eight came when Andrew Miller was pitching, and the last one came when Marco Estrada was on the mound. It’s interesting that they would mention those stats more often when a reliever is pitching, considering the sample size is sure to be even smaller against relievers, rather than starters.

For fun, I marked each occurrence and tried to quickly type out how the announcer mentioned these stats:

  1. Top 1, Josh Donaldson vs. Corey Kluber: “He’s got some pretty good numbers, 6 for 16 with a jack, so he sees him well” -Cal Ripken
  2. Top 1, Russell Martin vs. Corey Kluber: “Martin is only 2 for 10 in his career against Kluber, both home runs…in fact, two of his last seven off Kluber have been home runs” -Ernie Johnson (graphic added later in the plate appearance reading “2 for last 7 off Kluber with 2 HR”
  3. Top 2, Michael Saunders vs. Corey Kluber: “Saunders steps in, he’s 3 for 8 in his career against Kluber, and he fouls it off” -Ernie Johnson
  4. Top 6, Michael Saunders vs. Corey Kluber: “Saunders with his two hits, now 5 for 10 off Kluber” -Ron Darling
  5. Bottom 6, Jason Kipnis vs. Marco Estrada: graphic shown reading “0 for 7 4 K VS ESTRADA”
  6. Top 7, Melvin Upton Jr. vs. Andrew Miller: “Upton’s got some numbers against Miller, 5 for 12 with three home runs” -Ron Darling (“That is some numbers” -Cal Ripken)
  7. Top 8, Edwin Encarnacion vs. Andrew Miller: “Encarnacion in his last six at-bats against Miller a couple of home runs and a double” -Ernie Johnson
  8. Top 8, Jose Bautista vs. Andrew Miller: graphic shown reading “.286 (2 for 7) 1 HR 2 BB VS MILLER” (later in the plate appearance: “One of the two hits that Bautista has off Miller…long ball” -Ron Darling

I’m not trying to knock these announcers by saying that they’re not good at what they do or anything. I would be a terrible announcer. I just think these stats are pretty useless and it was interesting to see how many times they actually mentioned them during a game. Mike Petriello pointed out on Twitter an example of why these numbers aren’t good to look at.

This would be kind of fun to track during the regular season for the really good ones, such as “so and so: 1 for 2 (.500), single career vs. so and so.” Maybe this can be a new metric or something, bpBAAR (batter pitcher Baseball Announcer Above Replacement).


Clustering Pitchers With PITCHf/x

At any point, feel free to scroll down to the bottom to see some of the tables of pitcher clusters.

Clustering Pitches

Clustering individual pitches using data from PITCHf/x is a fairly simple task. All you need to do is pick out the important attributes that you believe define a pitch (velocity, movement, etc.) and use a clustering algorithm, such as K-Means clustering.

With K-Means clustering, you decide what K (the number of clusters) should be. For my analysis, I chose K to be 500 (rather arbitrarily). Different pitch clusters can represent the same type of pitch (i.e. fastball) but with varying attributes. For example, clusters 50 and 100 might both correspond to fastballs, but cluster 50 might be a typical Chris Young fastball whereas cluster 100 might be a typical Aroldis Chapman fastball.

One important point to remember is that you, the analyst, must decide what the clusters represent. By looking at attributes of the pitches in a given cluster, you might identity the cluster as “lefty changeups” or “submariner fastballs” (which is actually a category you will discover).

The Problem of Clustering Pitchers

We can identify every pitch that a pitcher throws as belonging to a cluster from 1 to 500. Therefore, we know the distribution of pitch clusters for a given pitcher. The difficult problem, however, is how do we compare two pitchers using this information? Let’s say we have two pitchers:

  • Pitcher A’s pitches are 50% from cluster 1 and 50% from cluster 200.
  • Pitcher B’s pitches are 33% from cluster 1, 33% from cluster 300, and 33% from cluster 139.

The question remains, are Pitcher A and Pitcher B similar pitchers?

The problem of clustering pitchers is a more complicated one than clustering pitches because we now have a collection of pitches instead of just individual pitches to compare. In order to cluster pitchers, I use a model that is typically used for topic modeling called Latent Dirichlet Allocation (LDA).

An Aside on LDA

In LDA for topic modeling, our data is a collection of documents.

Let’s imagine that our collection of documents is articles from the New York Times. There are global topics that govern how these articles are generated. For example, if you think of a newspaper, the topics might be sports, finance, health, politics, etc. Additionally, each article can be a mixture of these topics. We might imagine there is an article in the sports section titled, “Yankees payroll exceeds $300 million”, which our algorithm may discover is 50% about sports and 50% about finance.

Similar to what is mentioned above, the analyst must figure out what the topics actually are. You do not tell the algorithm that there is a sports topic. You discover that the topic is sports by observing that the most probable words are “baseball”, “Jeter”, “LeBron”, “touchdown”, etc. The algorithm will tell you that a particular document is 50% about topic 1 and 50% about topic 20, but you must ultimately infer what topics 1 and topics 20 are.

I am harping on this point mainly just to mention that there is no magic to these clustering algorithms. An algorithm can cluster data, but it cannot tell you what these clusters mean.

Relevance of LDA to Pitchers

Anyway, how can this model be used to analyze pitchers? We just need to use our imagination. Instead of a collection of documents, we now have a collection of pitcher seasons. Whereas each document is made up of a collection of words, each pitcher season is made up of a collection of pitches. We have already discretized each pitch using K-Means clustering in order to create our own “dictionary” of pitches. In our baseball model, we imagine that each pitcher is a mixture of repertoires, whereas in topic modeling, each document was a mixture of topics. We can then cluster pitchers together by figuring out who has the most similar repertoires.

Nitty Gritty Details

If you are not interested in getting into the nitty gritty details, feel free to skip ahead to the next section to just see the cluster groupings.

  • Data used is from 2007-2014.
  • The dictionary of pitches (500 clusters) was created by running K-Means using all of the pitches from 2014. The choice of 2014 is arbitrary, but I used just one year’s worth of data because I thought it might be a sufficient amount and it was much quicker to run K-Means.
  • The PITCHf/x attributes that were used to cluster pitches were start_speed, pfx_x/pfx_z (horizontal/vertical movement), px/pz (horizontal/vertical location), vx0/vz0 (components of velocity).
  • For each pitcher from 2007-2014, each pitch was assigned to its closest cluster (determined by distance to the cluster center). I filtered out pitcher seasons in which the pitcher threw fewer than 500 pitches.
  • I then ran LDA on pitcher seasons, choosing the number of repertoires (topics) to be 5.
  • I used the method from this paper to get a vector representation of each pitcher season. I could have used the inferred repertoire proportions as my vector representations, but for various reasons, this did not produce as nice of clusters.
  • Finally, I ran K-Means (K=100) on these vectors to get clusters of pitchers.
  • Whereas in topic modeling, it is often interesting to interpret what the global topics actually are, I am not really interested in what the global “repertoires” are for the model. I am really using LDA as a dimensionality reduction technique to produce smaller vectors (5 vs. 500) that can be clustered together.

Some Observations

The actual clusters along with some relevant FanGraphs statistics are provided below. Each table is sortable. For brevity, I have only included clusters in which there are 10 or fewer pitchers. Only the first cluster shown (cluster 3) has more than 10 pitchers, which I simply included to demonstrate that a cluster could be quite big.

  • As is probably expected, clusters are almost always entirely righties or lefties even though this is not an input to the model.
  • Guys with similar numbers of batters faced cluster together. This is by design, as the way I determined the repertoire proportions accounts for the number of times a particular pitch is thrown.
  • Sometimes weird clusters can form, such as Cluster 37, which contains both Chapman and Wakefield. Cluster 37 is mostly cohesive with hard-throwing left-handers and I believe Wakefield ends up here simply because he did not fit well into any cluster.
  • This is not to say that the algorithm cannot find clusters of knuckleballers. Cluster 14 is all R.A. Dickey from years 2011-2014.
  • There are also other clusters that contain exclusively one (or almost one) pitcher. Cluster 8 is 5 Kershaw years and one Hamels year. Cluster 68 is 5 Verlander years. I believe these clusters form partially because their stuff is so good. There are other pitchers who fall into almost exclusively one cluster but who are joined by many other pitchers. Another factor is that they might be able to repeat their mechanics so well that they remain in the same cluster because they are always throwing the same pitch types.
  • Clusters of individual pitchers also happens if a pitcher has an incredibly unique style. Justin Masterson has his own cluster because he is such an extreme ground-ball pitcher. Josh Collmenter does as well due to the extreme rise he generates on his “fastball”.
  • Cluster 29 contains just Kershaw’s 2014 season and J.A. Happ’s 2009 season. If you do a Ctrl-F for J.A. Happ, he finds himself in some pretty flattering clusters. This is especially interesting because from 2007-2014, he does not have particularly good seasons, but he has been quite good the last two years. This is not to suggest that these clusters can uncover hidden gems, but it’s not fully out of the realm of possibility.
  • Most clusters produce quite similar ground-ball percentages. One of the factors that goes into clustering pitches (and therefore pitchers) is horizontal and vertical movement, which play a huge factor in a pitcher’s ability to produce ground-balls.
  • Submarine pitchers always end up together. Check out Clusters 9, 60, and 92.

Overall, I think this is pretty interesting stuff. I was honestly surprised that the clusters turned out to be as cohesive as they were. Additionally, besides being a descriptive tool, I have to wonder whether this information can be used for predictive purposes. For example, we often talk about regression to the mean when discussing a player’s performance, whether it be a pitcher of a batter. It is possible that the appropriate mean for many pitchers is the cluster mean that they happen to fall into.

Cluster 3

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Chris Carpenter Cardinals 750 6.73 1.78 0.33 55.0 28.0 4.6 5.5
2010 Hiroki Kuroda Dodgers 810 7.29 2.20 0.69 51.1 32.1 8.0 4.3
2010 Gavin Floyd White Sox 798 7.25 2.79 0.67 49.9 32.1 7.6 4.1
2008 Hiroki Kuroda Dodgers 776 5.69 2.06 0.64 51.3 28.6 7.6 3.6
2012 Doug Fister Tigers 673 7.63 2.06 0.84 51.0 26.7 11.6 3.4
2011 Josh Beckett Red Sox 767 8.16 2.42 0.98 40.1 42.2 9.6 3.3
2011 Michael Pineda Mariners 696 9.11 2.89 0.95 36.3 44.8 9.0 3.2
2012 A.J. Burnett Pirates 851 8.01 2.76 0.80 56.9 24.3 12.7 3.0
2013 Rick Porcello Tigers 736 7.22 2.14 0.92 55.3 23.7 14.1 2.9
2008 Carlos Zambrano Cubs 796 6.20 3.43 0.86 47.2 34.9 9.0 2.8
2013 Andrew Cashner Padres 707 6.58 2.42 0.62 52.5 28.7 8.1 2.7
2012 Jeff Samardzija Cubs 723 9.27 2.89 1.03 44.6 33.1 12.8 2.7
2010 Scott Baker Twins 725 7.82 2.27 1.22 35.6 43.5 10.2 2.6
2014 Kyle Gibson Twins 757 5.37 2.86 0.60 54.4 26.6 7.8 2.3
2012 Tim Hudson Braves 749 5.13 2.41 0.60 55.5 25.2 8.3 2.1
2014 Henderson Alvarez Marlins 772 5.34 1.59 0.67 53.8 24.3 9.5 2.1
2008 Todd Wellemeyer Cardinals 807 6.29 2.91 1.17 39.3 39.8 10.6 2.0
2010 Rick Porcello Tigers 700 4.65 2.10 1.00 50.3 32.1 9.9 1.7
2011 Luke Hochevar Royals 835 5.82 2.82 1.05 49.8 32.2 11.5 1.7
2008 Jason Marquis Cubs 738 4.90 3.77 0.81 47.6 32.5 8.3 1.7
2014 Charlie Morton Pirates 666 7.21 3.26 0.51 55.7 22.8 8.8 1.6
2012 Luis Mendoza Royals 709 5.64 3.20 0.81 52.1 27.1 10.6 1.5
2009 Aaron Cook Rockies 675 4.44 2.68 1.08 56.5 24.7 14.2 1.4
2014 Doug Fister Nationals 662 5.38 1.32 0.99 48.9 34.2 10.1 1.4
2010 Mitch Talbot Indians 696 4.97 3.90 0.73 47.8 35.3 7.0 1.2
2008 Armando Galarraga Tigers 746 6.35 3.07 1.41 43.5 39.7 13.0 1.2
2008 Carlos Silva Mariners 689 4.05 1.88 1.17 44.0 33.3 10.4 1.2
2009 Ross Ohlendorf Pirates 725 5.55 2.70 1.27 40.6 42.1 11.1 1.2
2008 Vicente Padilla Rangers 757 6.68 3.42 1.37 42.7 38.1 12.5 1.1
2012 Luke Hochevar Royals 800 6.99 2.96 1.31 43.3 35.0 13.5 1.1
2012 Derek Lowe – – – 640 3.47 3.22 0.63 59.2 21.0 9.1 1.0
2013 Edinson Volquez – – – 777 7.50 4.07 1.00 47.6 29.6 11.9 0.9
2011 Chris Volstad Marlins 719 6.36 2.66 1.25 52.3 27.7 15.5 0.7
2010 Jeremy Bonderman Tigers 754 5.89 3.16 1.32 44.7 39.2 11.4 0.7
2010 Brad Bergesen Orioles 746 4.29 2.70 1.38 48.7 36.6 11.9 0.6
2014 Hector Noesi – – – 733 6.42 2.92 1.46 38.0 40.6 12.7 0.3
2009 Armando Galarraga Tigers 642 5.95 4.20 1.50 39.9 38.6 13.3 0.2
2008 Kyle Kendrick Phillies 722 3.93 3.30 1.33 44.3 28.7 14.0 0.1
2014 Roberto Hernandez – – – 722 5.74 3.99 1.04 49.7 29.9 12.2 0.0
2013 Lucas Harrell Astros 707 5.21 5.15 1.17 51.5 27.4 14.3 -0.8

 

Cluster 5

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2010 Cliff Lee – – – 843 7.84 0.76 0.68 41.9 40.4 6.3 7.0
2011 Cliff Lee Phillies 920 9.21 1.62 0.70 46.3 32.4 9.0 6.8
2009 Jon Lester Red Sox 843 9.96 2.83 0.89 47.7 34.5 10.6 5.3
2014 Jose Quintana White Sox 830 8.00 2.34 0.45 44.7 33.2 5.1 5.1
2013 Derek Holland Rangers 894 7.99 2.70 0.85 40.8 36.4 8.8 4.3
2012 Matt Moore Rays 759 8.88 4.11 0.91 37.4 42.9 8.6 2.7
2013 Wade Miley Diamondbacks 847 6.53 2.93 0.93 52.0 27.2 12.5 1.8

 

Cluster 6

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2007 CC Sabathia Indians 975 7.80 1.38 0.75 45.0 36.6 7.8 6.4
2014 Jake McGee Rays 274 11.36 2.02 0.25 38.0 42.9 2.9 2.6
2014 Tyler Matzek Rockies 503 6.96 3.37 0.69 49.7 30.3 8.3 1.7
2013 J.A. Happ Blue Jays 415 7.48 4.37 0.97 36.5 46.0 7.6 1.1
2010 J.A. Happ – – – 374 7.21 4.84 0.82 39.0 43.4 7.4 1.0
2009 Sean West Marlins 467 6.10 3.83 0.96 40.2 40.8 8.0 1.0
2009 Andrew Miller Marlins 366 6.64 4.84 0.79 48.0 30.0 9.3 0.7
2012 Drew Pomeranz Rockies 434 7.73 4.28 1.30 43.9 35.9 13.6 0.7
2013 Jake McGee Rays 260 10.77 3.16 1.15 42.5 38.8 12.9 0.6
2008 Jo-Jo Reyes Braves 512 6.21 4.14 1.43 48.5 31.8 15.5 0.2

 

Cluster 8

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2013 Clayton Kershaw Dodgers 908 8.85 1.98 0.42 46.0 31.3 5.8 7.1
2011 Clayton Kershaw Dodgers 912 9.57 2.08 0.58 43.2 38.6 6.7 7.1
2012 Clayton Kershaw Dodgers 901 9.05 2.49 0.63 46.9 34.0 8.1 5.9
2010 Clayton Kershaw Dodgers 848 9.34 3.57 0.57 40.1 42.1 5.8 4.7
2009 Clayton Kershaw Dodgers 701 9.74 4.79 0.37 39.4 41.6 4.1 4.4
2010 Cole Hamels Phillies 856 9.10 2.63 1.12 45.4 37.9 12.3 3.5

 

Cluster 9

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Peter Moylan Braves 309 7.52 4.32 0.00 62.4 19.5 0.0 1.4
2014 Joe Smith Angels 285 8.20 1.81 0.48 59.1 25.9 8.0 1.0
2011 Joe Smith Indians 267 6.04 2.82 0.13 56.6 23.5 2.2 1.0
2009 Brad Ziegler Athletics 313 6.63 3.44 0.25 62.3 19.7 4.4 1.0
2013 Brad Ziegler Diamondbacks 297 5.42 2.71 0.37 70.4 10.8 12.5 0.6
2012 Brad Ziegler Diamondbacks 263 5.50 2.75 0.26 75.5 7.7 13.3 0.6
2012 Joe Smith Indians 278 7.12 3.36 0.54 58.0 24.9 8.3 0.6
2008 Cla Meredith Padres 302 6.27 3.07 0.77 66.8 17.3 15.8 0.3
2010 Peter Moylan Braves 271 7.35 5.23 0.71 67.8 21.3 13.5 -0.3

 

Cluster 14

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 R.A. Dickey Mets 927 8.86 2.08 0.92 46.1 34.1 11.3 5.0
2011 R.A. Dickey Mets 876 5.78 2.33 0.78 50.8 32.9 8.3 2.5
2014 R.A. Dickey Blue Jays 914 7.22 3.09 1.09 42.0 37.6 10.7 1.7
2013 R.A. Dickey Blue Jays 943 7.09 2.84 1.40 40.3 40.5 12.7 1.7

 

Cluster 16

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2013 Max Scherzer Tigers 836 10.08 2.35 0.76 36.3 44.6 7.6 6.1
2014 Max Scherzer Tigers 904 10.29 2.57 0.74 36.7 41.6 7.5 5.2
2011 Daniel Hudson Diamondbacks 921 6.85 2.03 0.69 41.7 39.1 6.4 4.6
2012 Max Scherzer Tigers 787 11.08 2.88 1.10 36.5 41.5 11.6 4.4
2014 Jeff Samardzija – – – 879 8.28 1.76 0.82 50.2 30.5 10.6 4.1
2014 Lance Lynn Cardinals 866 8.00 3.18 0.57 44.3 36.0 6.1 3.4

 

Cluster 18

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Brandon Webb Diamondbacks 944 7.27 2.58 0.52 64.4 20.4 9.6 5.5
2013 Justin Masterson Indians 803 9.09 3.54 0.61 58.0 24.2 10.7 3.5
2012 Justin Masterson Indians 906 6.94 3.84 0.79 55.7 25.0 11.4 2.3
2011 Derek Lowe Braves 830 6.59 3.37 0.67 59.0 22.5 10.2 2.1

 

Cluster 20

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2010 John Danks White Sox 878 6.85 2.96 0.76 45.4 38.9 7.4 4.4
2010 Brian Matusz Orioles 760 7.33 3.23 0.97 36.2 45.0 7.9 3.0
2009 John Danks White Sox 839 6.69 3.28 1.26 44.2 40.9 11.5 2.7
2013 Felix Doubront Red Sox 705 7.71 3.94 0.72 45.6 34.4 7.8 2.2
2014 J.A. Happ Blue Jays 673 7.58 2.91 1.25 40.6 39.5 11.5 1.0

 

Cluster 24

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 CC Sabathia – – – 1023 8.93 2.10 0.68 46.6 31.7 8.8 7.3
2011 CC Sabathia Yankees 985 8.72 2.31 0.64 46.6 30.3 8.4 6.4
2010 David Price Rays 861 8.11 3.41 0.65 43.7 39.6 6.5 4.2

 

Cluster 29

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Clayton Kershaw Dodgers 749 10.85 1.41 0.41 51.8 29.2 6.6 7.6
2009 J.A. Happ Phillies 685 6.45 3.04 1.08 38.4 42.9 9.5 1.7

 

Cluster 35

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Chris Young Mariners 688 5.89 3.27 1.42 22.3 58.7 8.8 0.1
2014 Marco Estrada Brewers 624 7.59 2.63 1.73 32.7 49.5 13.2 -0.1

 

Cluster 36

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Justin Masterson Indians 908 6.58 2.71 0.46 55.1 26.7 6.3 4.2
2010 Justin Masterson Indians 802 7.00 3.65 0.70 59.9 24.9 10.0 2.3

 

Cluster 37

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 Aroldis Chapman Reds 276 15.32 2.89 0.50 37.3 42.9 7.4 3.3
2009 Matt Thornton White Sox 291 10.82 2.49 0.62 46.4 36.3 7.7 2.3
2008 Matt Thornton White Sox 268 10.29 2.54 0.67 53.0 27.4 10.9 1.7
2012 Drew Smyly Tigers 416 8.52 2.99 1.09 39.9 41.3 10.3 1.7
2008 Clayton Kershaw Dodgers 470 8.36 4.35 0.92 48.0 31.3 11.6 1.5
2008 Tim Wakefield Red Sox 754 5.82 2.98 1.24 35.5 48.9 9.1 1.1
2011 Tim Wakefield Red Sox 677 5.41 2.73 1.45 38.4 45.8 10.5 0.2

 

Cluster 38

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2013 Cliff Lee Phillies 876 8.97 1.29 0.89 44.3 33.3 10.9 5.5
2008 Johan Santana Mets 964 7.91 2.42 0.88 41.2 36.4 9.4 5.3
2010 Jon Lester Red Sox 861 9.74 3.59 0.61 53.6 29.6 8.9 4.8
2012 CC Sabathia Yankees 833 8.87 1.98 0.99 48.2 30.7 12.5 4.7
2008 Jon Lester Red Sox 874 6.50 2.82 0.60 47.5 31.6 7.0 4.1
2013 Hyun-Jin Ryu Dodgers 783 7.22 2.30 0.70 50.6 30.5 8.7 3.6
2014 Wei-Yin Chen Orioles 772 6.59 1.70 1.11 41.0 37.5 10.5 2.4
2010 Jonathan Sanchez Giants 812 9.54 4.47 0.98 41.5 43.7 9.8 2.3
2014 Wade Miley Diamondbacks 866 8.18 3.35 1.03 51.1 28.0 13.9 1.6

 

Cluster 44

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Cole Hamels Phillies 850 8.08 1.83 0.79 52.3 32.6 9.9 4.9
2008 Cole Hamels Phillies 914 7.76 2.10 1.11 39.5 38.7 11.2 4.8
2008 John Danks White Sox 804 7.34 2.63 0.69 42.8 35.4 7.4 4.8
2009 Cole Hamels Phillies 814 7.81 2.00 1.12 40.4 38.7 10.7 3.9
2014 Danny Duffy Royals 606 6.81 3.19 0.72 35.8 46.0 6.1 1.9
2011 J.A. Happ Astros 698 7.71 4.78 1.21 33.0 44.2 10.2 0.6

 

Cluster 46

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2010 Roy Halladay Phillies 993 7.86 1.08 0.86 51.2 29.7 11.3 6.1
2013 Lance Lynn Cardinals 856 8.84 3.39 0.62 43.1 34.4 7.4 3.7
2008 Mike Pelfrey Mets 851 4.93 2.87 0.54 49.6 29.6 6.3 3.1
2009 A.J. Burnett Yankees 896 8.48 4.22 1.09 42.8 39.2 10.8 3.0
2010 Roberto Hernandez Indians 880 5.31 3.08 0.73 55.6 30.8 8.3 2.6
2009 Derek Lowe Braves 855 5.13 2.91 0.74 56.3 25.8 9.4 2.5
2010 Derek Lowe Braves 824 6.32 2.83 0.84 58.8 22.6 13.1 2.2

 

Cluster 49

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Aroldis Chapman Reds 202 17.67 4.00 0.17 43.5 34.8 4.2 2.8
2014 James Paxton Mariners 303 7.18 3.53 0.36 54.8 22.6 6.4 1.2
2013 Rex Brothers Rockies 281 10.16 4.81 0.67 48.8 32.5 9.3 0.9
2012 Antonio Bastardo Phillies 224 14.02 4.50 1.21 27.7 50.0 12.5 0.8
2012 Tim Collins Royals 295 12.01 4.39 1.03 40.9 42.8 11.8 0.7
2012 Christian Friedrich Rockies 377 7.87 3.19 1.49 42.2 34.6 15.4 0.7
2013 Justin Wilson Pirates 295 7.21 3.42 0.49 53.0 30.0 6.7 0.6
2011 Aroldis Chapman Reds 207 12.78 7.38 0.36 52.7 30.8 7.1 0.5
2014 Justin Wilson Pirates 256 9.15 4.50 0.60 51.3 34.4 7.3 0.2
2011 Mike Dunn Marlins 267 9.71 4.43 1.29 38.5 46.0 12.2 -0.2

 

Cluster 51

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Cliff Lee – – – 969 7.03 1.67 0.66 41.3 36.5 6.5 6.3
2009 CC Sabathia Yankees 938 7.71 2.62 0.70 42.9 37.3 7.4 5.9
2010 CC Sabathia Yankees 970 7.46 2.80 0.76 50.7 34.1 8.6 5.1

 

Cluster 54

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Hisashi Iwakuma Mariners 709 7.74 1.06 1.01 50.2 28.7 13.2 3.1
2009 Justin Masterson – – – 568 8.28 4.18 0.84 53.6 31.4 10.4 1.5
2014 Justin Masterson – – – 592 8.11 4.83 0.84 58.2 21.6 14.6 0.4

 

Cluster 58

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 David Price – – – 1009 9.82 1.38 0.91 41.2 38.1 9.7 6.0
2014 Jon Lester – – – 885 9.01 1.97 0.66 42.4 37.0 7.2 5.6
2012 Gio Gonzalez Nationals 822 9.35 3.43 0.41 48.2 30.0 5.8 5.0
2011 David Price Rays 918 8.75 2.53 0.88 44.3 36.9 9.7 4.4
2013 Gio Gonzalez Nationals 819 8.83 3.50 0.78 43.9 33.3 9.7 3.2
2011 Gio Gonzalez Athletics 864 8.78 4.05 0.76 47.5 34.1 8.9 3.1
2010 Gio Gonzalez Athletics 851 7.67 4.13 0.67 49.3 35.3 7.4 3.1

 

Cluster 60

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Brad Ziegler – – – 239 6.79 2.93 0.00 68.6 13.4 0.0 1.0
2007 Cla Meredith Padres 342 6.67 1.92 0.68 72.0 13.6 17.1 1.0
2008 Brad Ziegler Athletics 229 4.53 3.32 0.30 64.7 18.8 6.3 0.5
2013 Joe Smith Indians 259 7.71 3.29 0.71 49.1 30.1 9.6 0.5
2008 Chad Bradford – – – 241 2.58 2.28 0.46 66.5 16.0 9.4 0.4
2012 Cody Eppley Yankees 194 6.26 3.33 0.59 60.3 19.1 11.1 0.3
2008 Joe Smith Mets 271 7.39 4.41 0.57 62.6 17.9 12.5 0.3
2009 Cla Meredith – – – 283 5.10 3.44 0.55 62.9 21.1 8.9 0.2
2010 Brad Ziegler Athletics 257 6.08 4.15 0.59 54.4 26.9 8.2 0.1
2014 Brad Ziegler Diamondbacks 281 7.25 3.22 0.67 63.8 18.9 13.5 0.1

 

Cluster 68

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Justin Verlander Tigers 982 10.09 2.36 0.75 36.0 42.8 7.4 7.7
2012 Justin Verlander Tigers 956 9.03 2.27 0.72 42.3 35.6 8.3 6.8
2011 Justin Verlander Tigers 969 8.96 2.04 0.86 40.2 42.1 8.8 6.4
2010 Justin Verlander Tigers 925 8.79 2.85 0.56 41.0 40.3 5.6 6.3
2013 Justin Verlander Tigers 925 8.95 3.09 0.78 38.4 38.9 7.8 4.9

 

Cluster 69

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Manny Parra Brewers 741 7.97 4.07 0.98 51.6 26.6 13.5 2.3
2014 Drew Smyly – – – 618 7.82 2.47 1.06 36.6 43.4 9.5 2.2
2012 J.A. Happ – – – 627 8.96 3.48 1.18 44.0 38.9 11.9 1.9

 

Cluster 70

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Gerrit Cole Pirates 571 9.00 2.61 0.72 49.2 31.8 9.4 2.3
2009 Luke Hochevar Royals 631 6.67 2.90 1.45 46.6 35.8 13.8 1.0
2012 Joe Kelly Cardinals 457 6.31 3.03 0.84 51.7 27.5 11.0 0.9
2008 Sidney Ponson – – – 612 3.85 3.18 0.93 54.5 26.2 10.9 0.9
2013 Joe Kelly Cardinals 532 5.73 3.19 0.73 51.1 28.2 8.9 0.7
2009 Roberto Hernandez Indians 596 5.67 5.03 1.15 55.2 27.0 13.7 0.0

 

Cluster 71

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Chris Young Padres 434 8.18 4.22 1.14 21.7 53.4 8.7 1.4
2012 Chris Young Mets 493 6.26 2.82 1.25 22.3 58.2 7.7 1.2
2013 Josh Collmenter Diamondbacks 384 8.32 3.23 0.78 32.7 46.8 6.9 1.0
2012 Josh Collmenter Diamondbacks 375 7.97 2.19 1.30 37.4 43.1 11.5 0.8
2009 Chris Young Padres 336 5.92 4.74 1.42 30.2 51.7 10.0 0.0

 

Cluster 72

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Madison Bumgarner Giants 873 9.07 1.78 0.87 44.4 35.8 10.0 4.0
2013 Jon Lester Red Sox 903 7.47 2.83 0.80 45.0 35.4 8.3 3.5

 

Cluster 77

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Josh Collmenter Diamondbacks 621 5.83 1.63 0.99 33.3 47.0 7.7 2.3
2014 Josh Collmenter Diamondbacks 719 5.77 1.96 0.90 38.8 39.9 8.3 1.9

 

Cluster 78

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2007 Rich Hill Cubs 812 8.45 2.91 1.25 36.0 42.9 11.7 3.1
2014 Tyler Skaggs Angels 464 6.85 2.39 0.72 50.1 30.9 8.7 1.5
2011 Danny Duffy Royals 474 7.43 4.36 1.28 37.5 40.3 11.5 0.5
2010 Manny Parra Brewers 560 9.52 4.65 1.33 47.2 34.5 14.8 0.3

 

Cluster 79

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 David Price Rays 836 8.74 2.52 0.68 53.1 27.0 10.5 5.0
2011 C.J. Wilson Rangers 915 8.30 2.98 0.64 49.3 31.9 8.2 4.9
2010 C.J. Wilson Rangers 850 7.50 4.10 0.44 49.2 33.5 5.3 4.1
2013 C.J. Wilson Angels 913 7.97 3.60 0.64 44.4 33.4 7.2 3.2
2012 Madison Bumgarner Giants 849 8.25 2.12 0.99 47.9 33.3 11.7 3.1
2011 Derek Holland Rangers 843 7.36 3.05 1.00 46.4 33.6 11.0 3.0
2012 Wandy Rodriguez – – – 875 6.08 2.45 0.92 48.0 31.6 10.1 2.5
2014 Jason Vargas Royals 790 6.16 1.97 0.91 38.3 38.7 8.2 2.2
2012 C.J. Wilson Angels 865 7.70 4.05 0.85 50.3 29.9 10.8 2.2

 

Cluster 85

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 Cliff Lee Phillies 847 8.83 1.19 1.11 45.0 36.9 11.8 5.0
2014 Cole Hamels Phillies 829 8.71 2.59 0.62 46.4 31.1 8.2 4.3
2009 Wandy Rodriguez Astros 849 8.45 2.76 0.92 44.9 37.1 9.9 4.1
2012 Wade Miley Diamondbacks 807 6.66 1.71 0.65 43.3 33.7 6.9 4.1
2013 Jose Quintana White Sox 832 7.38 2.52 1.03 42.5 37.4 10.2 3.5
2009 Andy Pettitte Yankees 834 6.84 3.51 0.92 42.9 37.8 8.9 3.4
2012 Wei-Yin Chen Orioles 818 7.19 2.66 1.35 37.1 42.1 11.7 2.3

 

Cluster 86

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Josh Beckett Red Sox 883 8.43 2.33 1.06 47.2 31.7 12.8 4.2
2010 Max Scherzer Tigers 800 8.46 3.22 0.92 40.3 40.0 9.6 3.7
2014 Nathan Eovaldi Marlins 854 6.40 1.94 0.63 44.8 32.9 6.6 2.9
2012 Lucas Harrell Astros 827 6.51 3.62 0.60 57.2 22.5 9.7 2.8
2013 Jeff Samardzija Cubs 914 9.01 3.29 1.05 48.2 31.4 13.3 2.7
2011 Max Scherzer Tigers 833 8.03 2.58 1.34 40.3 39.5 12.6 2.2
2009 Mike Pelfrey Mets 824 5.22 3.22 0.88 51.3 30.0 9.5 1.7
2011 Roberto Hernandez Indians 833 5.20 2.86 1.05 54.8 26.6 13.0 0.9

 

Cluster 92

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Steve Cishek Marlins 275 11.57 2.89 0.41 42.7 31.1 5.9 2.0
2007 Sean Green Mariners 304 7.01 4.50 0.26 60.9 18.8 5.1 0.7
2008 Sean Green Mariners 358 7.06 4.10 0.34 63.3 19.5 6.1 0.7
2011 Shawn Camp Blue Jays 292 4.34 2.98 0.41 53.5 25.7 5.2 0.3
2010 Shawn Camp Blue Jays 298 5.72 2.24 1.00 52.0 31.4 11.1 0.2

 

Cluster 95

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Cliff Lee Indians 891 6.85 1.37 0.48 45.9 35.1 5.1 6.7
2012 Cole Hamels Phillies 867 9.03 2.17 1.00 43.4 35.1 11.9 4.6
2013 Cole Hamels Phillies 905 8.26 2.05 0.86 42.7 36.7 9.1 4.5
2008 Scott Kazmir Rays 641 9.81 4.14 1.36 30.8 48.9 12.0 2.0

 

Cluster 97

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Jered Weaver Angels 926 7.56 2.14 0.76 32.5 48.6 6.3 5.7
2009 Jered Weaver Angels 882 7.42 2.82 1.11 30.9 50.4 8.3 3.9
2014 Chris Tillman Orioles 871 6.51 2.86 0.91 40.6 39.3 8.3 2.3
2009 Joe Blanton Phillies 837 7.51 2.72 1.38 40.6 39.5 12.9 2.2
2013 Chris Tillman Orioles 845 7.81 2.97 1.44 38.6 39.8 14.2 1.9

 


A Year In xISO

For the type of baseball fan I’ve become — one who follows the sport as a whole rather than focuses on a particular team — 2016 was the season of Statcast. Even for those who watch the hometown team’s broadcast on a nightly basis, exit velocity and launch angle have probably become familiar terms. While Statcast was around last season, it seems fans and commentators alike have really embraced it in 2016.

Personally, I commend MLB for democratizing Statcast data, at least partially, especially when they are under no apparent obligation to do so. I’ve enjoyed the Statcast Podcast this season, but most of all, I’ve benefited from the tools available at Baseball Savant. For it is that tool which has allowed me to explore xISO. I first introduced an attempt to incorporate exit velocity into a player’s expected isolated slugging (xISO). I subsequently updated the model and discussed some notable first half players. Alex Chamberlain was kind enough to include my version of xISO in the RotoGraphs x-stats Omnibus, and I’ve been maintaining a daily updated xISO resource ever since.

Happily for science, all of my 2016 first half “Overperformers” saw ISO declines in the second half, while most of my first half “Underperformers” saw large drops in second half playing time. Rather than focus on individuals, though, let’s try to estimate the predictive value of xISO in 2016.

Yuck. This plot shows how well first-half ISO predicted second-half ISO, compared to how well first-half xISO predicted the same, for 2016 first AND second-half qualified hitters. Both of these are calculated using the model as it was at the All-Star break. There are two takeaways: First-half ISO was a pretty bad predictor of second-half ISO, and first-half xISO was also a pretty bad predictor of second-half ISO. Mercifully though, first-half xISO was a bit better than ISO at predicting future ISO. This is consistent with the findings in my first article, and a basic requirement I set out to satisfy.

Now, an interesting thing happened recently. After weeks of hinting, Mike Petriello unveiled “Barrels”. Put simply, Barrels are meant to be a classification of the best kind of batted balls. Shortly thereafter, Baseball Savant began tabulating total Barrels, Barrels per batted ball (Brls/BBE), and Barrels per plate appearance (Brls/PA). In a way, this is similar to Andrew Perpetua’s approach to using granular batted-ball data to track expected outcomes for each batted ball, except that the Statcast folks have taken only a slice of launch angles and exit velocities to report as Barrels.

By definition, these angles and velocities are those for which the expected slugging percentage is over 1.500, so it would appear that this stat could be a direct replacement for my xISO. Not so fast! First of all, because ISO is on a per at-bat (AB) basis, we definitely need to calculate Brls/AB from Brls/PA. This is not so hard if we export a quick FanGraphs leaderboard. Let’s check how well Brls/AB works in a single-predictor linear model for ISO:

Not too bad. The plot reports both R-squared and adjusted R-squared, for comparison with multiple regression models. I won’t show it, but this is almost exactly the coefficient of determination that my original xISO achieves with the same training data. I still notice a hint of nonlinearity, and I bet we can do better.

Hey now, that’s nice. In terms of adjusted R-squared, we’ve picked up about 0.06, which is not insignificant. The correlation plot also looks better to my eye. So what did I do? As is my way, I added a second-order term, and sprinkled in FB% and GB% as predictors. The latter two are perhaps controversial inclusions. FB% and/or GB% might be suspected to be strongly correlated with Brls/AB, introducing some undesired multicollinearity. While I won’t show the plots, it doesn’t actually turn out to be a big problem in this case. Both FB% and GB% have Pearson correlation coefficients close to 0.5 with Brls/AB (negative correlation in the case of GB%). Here’s the functional form of the multiple regression model plotted above, which was trained on all 2016 qualified hitters:

To be honest, there is something about my first model that I liked better. This version, using Barrels, feels like a bit of a half-measure between Andrew Perpetua’s bucketed approach and my previous philosophy of using only average exit-velocity values and batted-ball mix. My original intent was to create a metric that could be easily calculated from readily available resources, so in that sense, I’m still succeeding. Going forward, I will be calculating both versions on my spreadsheet. I’m excited to see which version serves the community better heading into 2017!

As always, I’m happy to entertain comments, questions, or criticisms.


Did the Cubs and Giants Have the Best Pitcher-Hitting Series Ever?

With a wild comeback in Game 4 on Tuesday night, the Cubs secured their spot in the NLCS for the second straight season. Considering where the team was just five years ago, this is obviously an impressive achievement. But maybe more impressive is how they reached that second consecutive NLCS. The Cubs scored 17 runs against the Giants in their NLDS showdown, and six of those were driven in by their pitchers! That’s an absurd 35% of the Cubs’ run output coming from the guys who usually do the run prevention.

When Travis Wood hit his incredible home run as a relief pitcher in Game 2, it was the first postseason home run from a pitcher since Joe Blanton took Edwin Jackson deep in Game 4 of the 2008 World Series, and the first postseason home run from a reliever since 1924.

When Jake Arrieta left the yard in the first inning of the very next game, it became the first postseason series with multiple home runs off the bats of pitchers since the 1968 World Series, when Mickey Lolich and Bob Gibson each went deep in a seven-game series. Of course, Lolich and Gibson were rivals, not teammates, making the Wood-Arrieta accomplishment even more impressive — and rare. In fact, it was only the second time in the history of baseball (per Baseball-Reference Play Index) that two pitchers, on the same team, hit home runs in the same series. The only other time with in the 1924 World Series, when New York Giant teammates, and pitchers, Jack Bentley and Rosy Ryan homered in Games 3 and 5 of the epic seven-game series. Wood and Arrieta were the only ones to do so in back-to-back games.

* * *

Now, it wasn’t just the Cubs pitchers getting in on the fun. For a while Tuesday night, it looked as though Giants starter, Matt Moore, was going to be a two-fold hero. Shutting down the Cubs offense from the mound, and knocking in the first run of the game for the Giants in the bottom of the fourth. While that was the only hit from Giants pitchers in the series, it was still enough to set the combined hitting totals for the two teams to: .250 batting average, with a .625 slugging percentage, while knocking in 23 percent of the total runs scored.

Those are some pretty crazy totals, but are they the best ever?

Using the aforementioned Play Index search of all-time postseason home runs from pitchers, there are 18 different series (including the 2016 NLDS) in which a pitcher homered. In those series, on three occasions, the pitcher who hit the home run was the only pitcher to get a hit in the entire series (1984 Rick Sutcliffe, 1978 Steve Carlton, 1975 Don Gullet). Only twice did pitchers combine for more than the 10 total bases from the Giants and Cubs, and only once did they drive in more than the seven runs (and they never topped the percent of runs driven in). Let’s go to the chart:

Top Team Pitcher Performances in the Playoffs

Year Hits AB BA TB SLG RBI Series runs % of RBI
2016 NLDS 4 16 0.250 10 0.625 7 30 23.33
2008 WS 2 13 0.154 5 0.385 1 39 2.56
2006 NLCS 2 25 0.080 5 0.200 1 55 1.82
2003 NLCS 3 28 0.107 6 0.214 3 82 3.66
1984 NLCS 4 17 0.235 7 0.412 1 48 2.08
1978 NLCS 2 17 0.118 5 0.294 4 38 10.53
1975 NLCS 2 12 0.167 5 0.417 3 26 11.54
1974 WS 4 20 0.200 8 0.400 1 27 3.70
1970 WS 2 25 0.080 5 0.200 4 53 7.55
1970 ALCS 5 18 0.278 10 0.556 6 37 16.22
1969 WS 5 26 0.192 10 0.385 5 24 20.83
1968 WS 5 36 0.139 11 0.306 4 63 6.35
1967 WS 2 30 0.067 8 0.267 2 46 4.35
1965 WS 5 32 0.156 9 0.281 6 44 13.64
1958 WS 7 37 0.189 10 0.270 8 54 14.81
1940 WS 3 39 0.077 7 0.179 2 50 4.00
1926 WS 4 39 0.103 8 0.205 2 52 3.85
1924 WS 8 42 0.190 14 0.333 5 53 9.43
1920 WS 6 39 0.154 9 0.231 3 29 10.34

After a brief peruse, it’s clear that there are only a few cases in which the pitchers in a series can even come close to what we just saw. Let’s take a look at the five best, in ascending order:

1968 World Series

This was one of the three series before the 2016 NLDS in which multiple pitchers hit home runs. In 1968, it was, as noted above, Bob Gibson and Mickey Lolich who homered in the series, one each for the Cardinals and Tigers. The reason this series is in fifth in the challengers to Cubs-Giants is because those two pitchers were really it. They drove in the only four runs from pitchers in the series (three of the four RBI coming on the two home-run swings), and there was only hit to hit come from a non-Gibson/Lolich pitcher.

1969 World Series

Just a year after our first entry into this challenge, the Mets and Orioles played in the first World Series to be led off with a League Championship Series. The extra-long season didn’t stop the Mets and Orioles pitchers from contributing all over the diamond, however, as they crammed five hits, 10 total bases, and five RBI into just a five-game series. Because of the abbreviated length of the series, this is one of the few series that can challenge the 2016 NLDS in terms of percentages. That being said, the Cubs-Giants pitchers take all three percentage categories, leaving there no real room for debate on this one.

1958 World Series

The 1958 series stands out in that it was the highest RBI total for pitchers in any postseason series to date. That was thanks in large part to top two pitchers for the Braves, Warren Spahn and Lew Burdette, tallying three RBI apiece. Burdette did it with the long ball, while Spahn preferred the death-by-a-thousand-cuts method, tallying his three RBI on four hits in the series. The Yankees got two RBI of their own from Bob Turley, but I’m not quite willing to give these guys the edge over the Cubs-Giants pitchers. The easiest argument for this year’s NLDS is that the Cubs-Giants pitchers tallied as many total bases and only one less RBI in three fewer games, as the 1958 World Series went to seven games, while this year’s NLDS went just four games.

1924 World Series

Here’s where the challenge gets real stiff. The 1924 World Series is the other series in which we have two home runs from pitchers, the aforementioned Bentley and Ryan teammates for the Giants. This series tops our charts in hits (8) and total bases (14), and is a reasonable choice for best-hitting series from a group of pitchers. I’m still giving the edge to Cubs-Giants in this showdown, though, and for a couple of reasons. Actually, really one reason with a couple different explanations: opportunity. Similar to the 1958 World Series, the 1924 World Series went to seven games, meaning that pitchers had far more games to rack up those hits and total bases. Pitchers were also left in games far longer in the 1920s, and as such, tallied almost three times as many at bats as the 2016 NLDS pitchers. When comparing batting average (.250 to .190) and, even more so, slugging percentage (.625 to .333) it becomes clear that this year’s Cubs-Giants pitchers still reign supreme.

1970 ALCS

Here’s our winner. The only series that I believe tops the recently concluded Cubs-Giants NLDS in terms of output from pitchers at the plate. This was an even shorter series than Cubs-Giants, as the Orioles only needed three games to dispatch the Twins. And their pitchers were a good chunk of the reason why. The Orioles used just four pitchers in the series, but all four got hits, combining for all of the offense you see above. (Twins pitchers were 0-for-5 in the series.) Not only did all four get hits, but all three starters got extra-base hits, as Dave McNally, Jim Palmer, and Mike Cuellar (Dick Hall was the reliever) all showed what they were capable of on the other side of the ball. Of course, the very next season, these three starters, along with Pat Dobson, would form just the second-ever set of four 20-game winners on the same team, proving just how awesome the late `60s and early `70s Orioles really were. They reign supreme for now, but let’s see how those Cubs starting pitchers do for the rest of the 2016 playoffs.


Hardball Retrospective – What Might Have Been – The “Original” 2002 Blue Jays

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the teams with the biggest single-season difference in the WAR and Win Shares for the “Original” vs. “Actual” rosters for every Major League organization. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com.

Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

AWAR – Wins Above Replacement for players on “actual” teams

AWS – Win Shares for players on “actual” teams

APW% – Pythagorean Won-Loss record for the “actual” teams

Assessment

The 2002 Toronto Blue Jays 

OWAR: 51.4     OWS: 312     OPW%: .572     (93-69)

AWAR: 34.2      AWS: 234     APW%: .481     (78-84)

WARdiff: 17.2                        WSdiff: 78  

The 2002 “Original” Blue Jays breezed to the American League East title, vanquishing the Yankees by a nine-game margin. Toronto topped the American League in OWAR and OWS. Shawn Green (.285/42/114) registered 110 tallies, achieved his second All-Star appearance and finished fifth in the MVP balloting. Jeff Kent (.313/37/108) drilled 42 doubles and attained a career-high in home runs. Carlos Delgado belted 33 round-trippers and coaxed 102 bases on balls. John Olerud (.300/22/102) laced 39 two-base hits and collected the Gold Glove Award. In the midst of five straight seasons with a batting average above .300, Shannon Stewart sliced 38 doubles and scored 103 runs. Vernon Wells reached the century mark in RBI and added 34 two-base knocks in his first full season. The “Actual” squad featured 2002 AL Rookie of the Year Eric Hinske (.279/24/84) at the hot corner.

Jeff Kent placed forty-eighth among second-sackers in the “The New Bill James Historical Baseball Abstract” top 100 player rankings while John Olerud secured the 53rd slot at first base.

Original 2002 Blue Jays                            Actual 2002 Blue Jays

STARTING LINEUP POS OWAR OWS STARTING LINEUP POS AWAR AWS
Shannon Stewart LF 2.37 18.47 Shannon Stewart LF 2.37 18.47
Vernon Wells CF 0.83 16.7 Vernon Wells CF 0.83 16.7
Shawn Green RF 6.18 32.07 Jose L. Cruz RF/LF 1.73 12.62
John Olerud DH/1B 4.64 25.92 Josh Phelps DH 1.46 9.8
Carlos Delgado 1B 4.76 25.97 Carlos Delgado 1B 4.76 25.97
Jeff Kent 2B 6.04 29.93 Dave Berg 2B 0.18 8.61
Alex S. Gonzalez SS 2.78 14.36 Chris Woodward SS 2.17 11.74
Chris Stynes 3B -0.02 3.46 Eric Hinske 3B 3.8 21.81
Greg Myers C 0.57 5.57 Tom Wilson C 0.43 5.88
BENCH POS OWAR OWS BENCH POS AWAR AWS
Jay Gibbons RF 0.59 11.97 Raul Mondesi RF 0.08 6.33
Chris Woodward SS 2.17 11.74 Orlando Hudson 2B 1.17 5.89
Craig A. Wilson RF 0.95 10.78 Felipe Lopez SS 0.08 5.8
Michael Young 2B -0.63 10.72 Ken Huckaby C -1.24 1.78
Josh Phelps DH 1.46 9.8 Joe Lawrence 2B -0.83 1.48
Orlando Hudson 2B 1.17 5.89 Dewayne Wise RF -0.42 1.39
Felipe Lopez SS 0.08 5.8 Jayson Werth RF 0.04 0.77
Brent Abernathy 2B -0.44 4.99 Homer Bush 2B -0.27 0.75
Abraham Nunez 2B 0.04 4.88 Darrin Fletcher C -0.44 0.64
Cesar Izturis SS -0.68 3.77 Brian Lesher 1B -0.5 0.23
Ryan Thompson LF 0.14 2.84 Kevin Cash C -0.14 0.08
Joe Lawrence 2B -0.83 1.48 Pedro Swann DH -0.18 0
Pat Borders DH 0.06 0.36
Mike Coolbaugh 3B -0.17 0.16
Casey Blake 3B -0.11 0.11
Kevin Cash C -0.14 0.08

Roy “Doc” Halladay (19-7, 2.93) warranted his first All-Star invitation and led the American League with 239.1 innings pitched. David “Boomer” Wells compiled 19 victories with a 3.75 ERA. Toronto’s superb bullpen staff was anchored by Billy Koch (3.27, 44 SV) and Jose Mesa (2.97, 45 SV). The setup corps consisted of Steve Karsay (3.26, 12 SV), Ben Weber (7-2, 2.54) and Kelvim Escobar (4.27, 38 SV).

Original 2002 Blue Jays                          Actual 2002 Blue Jays

ROTATION POS OWAR OWS ROTATION POS AWAR AWS
Roy Halladay SP 6.74 21.67 Roy Halladay SP 6.74 21.67
David Wells SP 3.99 14.79 Pete Walker SP 1.85 8.74
Woody Williams SP 3.2 9.65 Mark Hendrickson SP 1.23 4.01
Gary Glover SP 0.03 4.54 Esteban Loaiza SP -0.15 3.86
Mark Hendrickson SP 1.23 4.01 Justin Miller SP -0.23 3.4
BULLPEN POS OWAR OWS BULLPEN POS AWAR AWS
Billy Koch RP 1.44 18.37 Kelvim Escobar RP 0.53 9.14
Jose Mesa RP 1.28 12.4 Cliff Politte RP 1.05 6.49
Steve Karsay RP 2.01 11 Corey Thurman RP 0.54 3.66
Ben Weber RP 1.33 10.48 Felix Heredia RP 0.09 3.12
Kelvim Escobar RP 0.53 9.14 Scott Eyre RP 0.11 2.83
Mike Timlin RP 1 8.04 Chris Carpenter SP 0.41 2.73
Giovanni Carrara RP 0.62 6.77 Steve Parris SP 0 1.88
David Weathers RP 1.02 6.68 Scott Cassidy RP -0.43 1.67
Chris Carpenter SP 0.41 2.73 Dan Plesac RP 0.33 1.39
Graeme Lloyd RP -0.53 1.89 Brian Bowles RP 0.04 1.37
Scott Cassidy RP -0.43 1.67 Jason Kershner RP 0.12 0.65
Jose Silva RP 0.11 1.38 Pedro Borbon RP -0.07 0.48
Brian Bowles RP 0.04 1.37 Scott Wiggins RP 0.05 0.2
Mark Lukasiewicz RP 0 1.17 Pasqual Coco RP -0.13 0
Jim Mann RP 0.18 1.02 Brian Cooper SP -0.59 0
Carlos Almanzar SW 0.24 0.94 Bob File RP -0.47 0
Tom Davey RP -0.36 0.17 Brandon Lyon SP -0.56 0
Pasqual Coco RP -0.13 0 Luke Prokopec SP -0.91 0
Bob File RP -0.47 0 Mike Smith SP -0.45 0
Pat Hentgen SP -0.54 0
Brandon Lyon SP -0.56 0
Aaron Small RP -0.08 0
Mike Smith SP -0.45 0
Todd Stottlemyre SP -0.38 0

Notable Transactions

Shawn Green 

November 8, 1999: Traded by the Toronto Blue Jays with Jorge Nunez (minors) to the Los Angeles Dodgers for Pedro Borbon and Raul Mondesi. 

Jeff Kent 

August 27, 1992: Traded by the Toronto Blue Jays with a player to be named later to the New York Mets for David Cone. The Toronto Blue Jays sent Ryan Thompson (September 1, 1992) to the New York Mets to complete the trade.

July 29, 1996: Traded by the New York Mets with Jose Vizcaino to the Cleveland Indians for Carlos Baerga and Alvaro Espinoza.

November 13, 1996: Traded by the Cleveland Indians with a player to be named later, Julian Tavarez and Jose Vizcaino to the San Francisco Giants for a player to be named later and Matt Williams. The Cleveland Indians sent Joe Roa (December 16, 1996) to the San Francisco Giants to complete the trade. The San Francisco Giants sent Trent Hubbard (December 16, 1996) to the Cleveland Indians to complete the trade. 

John Olerud 

December 20, 1996: Traded by the Toronto Blue Jays with cash to the New York Mets for Robert Person.

October 27, 1997: Granted Free Agency.

November 24, 1997: Signed as a Free Agent with the New York Mets.

October 29, 1999: Granted Free Agency.

December 15, 1999: Signed as a Free Agent with the Seattle Mariners. 

Billy Koch

December 7, 2001: Traded by the Toronto Blue Jays to the Oakland Athletics for Eric Hinske and Justin Miller.

Honorable Mention

The 1995 Toronto Blue Jays 

OWAR: 27.1     OWS: 208     OPW%: .469     (76-86)

AWAR: 25.4       AWS: 168      APW%: .389    (56-88)

WARdiff: 1.7                        WSdiff: 40

The “Original” ’95 Jays plodded to a fourth-place finish in the AL East, eleven games behind the Orioles while the horrific “Actuals” placed 30 games behind the Red Sox. David Wells delivered a 16-8 record with a 3.24 ERA and made his first appearance at the Mid-Summer Classic. Jose Mesa (1.13, 46 SV) blossomed in the closer’s role, meriting second place in the Cy Young Award balloting along with a fourth-place finish in the MVP race. Derek Bell pilfered 27 bases and established personal-bests in BA (.334) and OBP (.385). Fellow outfielder Glenallen Hill clubbed 24 long balls and set career-highs with 86 RBI and 25 stolen bases. Geronimo Berroa clubbed 22 taters and knocked in 88 runs. Jeff Kent contributed 20 dingers and John Olerud socked 32 doubles.

On Deck

What Might Have Been – The “Original” 1902 Cubs

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive

 


Defense Is Cheap — and It Wins

One of the most common phrases in all of sports is “defense wins championships.” Defense isn’t flashy; it doesn’t put people in the seats (unless you’re a desperate Twins fan wanting to see Byron Buxton do more of this — or this). People like to see the home runs, the strikeouts. People also like to see the diving plays, but diving plays are a poor indicator of a team’s total defensive quality. So even the plays on defense that do put people in the seats aren’t indicative of a team’s overall level of defense. Other sports are the same way. People don’t realize the ins and outs of NBA defenses; they only see the steals and the lockdown plays — or lack thereof. NFL fans love to see big hits, but sometimes these big hits could be avoided if a team had defended the play better and stopped the ball carrier earlier.

Yes, it is true the nuances of defense can be monotonous, and this is true through all sports. Another factor about defense is the lack of a way to quantify defensive skill. Some metrics, like RPM (shameless plug to my boy Ricky Rubio, clearly a top-5 PG), try to do this for basketball. But in baseball, defense really is quantifiable, using different metrics that track can track how effective a defensive player or team is against league average. For example, read up on UZR, just one of the metrics that can put a number on a defense.

I came to this thinking on the undervaluation of defense through a different path. I had always wondered if an incredible defense could bail out an average pitching staff. I had always been interested in this facet; to reminisce, I once created an outfield of Torii Hunter, Rocco Baldelli, and Carl Crawford on MVP Baseball 2004. These were the best and fastest fielders in the game, and it seemed like they could get any fly ball. As much as I want to credit EA Sports for making an accurate game, I obviously cannot deduce the real-world effectiveness from a video game. Instead, I turned to the numbers.

To quantify how much a defense could “bail out” their pitching staff, I looked at the team’s average ERA compared to its average FIP. The difference between these numbers can somewhat quantify how much a team’s defense (and other factors) influence pitching from what we would expect it to be. For example, if a team had a FIP of 4.00, and an ERA of 3.50, this would indicate that a good defense was able to reach more balls than an average defense, meaning the team’s ERA should be lower, as there were more recorded outs than what we expect. The opposite, a team’s ERA being greater than its FIP, would indicate that a poor defense hurt their pitching staff’s performance, as they should have been able to get more balls that they did. To sum up, my hypothesis was that the teams with the largest FIP-ERA differences had great defenses, while teams that had the lowest FIP-ERA differences (negative values), had poor defenses. Now, I understand that many factors outside of defense can influence ERA, and that FIP does not perfectly match what a pitcher’s ERA would be with an average defense, but these anomalies will be canceled out in a large enough data set.

For the data, I measured playoff-contending teams (at least 85 wins) since 2002 (the furthest back I could get a value for a defensive rating) through 2015. From these teams, I parsed values for ERA, FIP, and defense, as well as the team’s payroll, runs scored, runs allowed, and run differential.

While taking my initial walks through the data, I saw two types of teams on this list. There were teams that scored few runs, but allowed even fewer, and there were teams that scored a host of runs, although they conceded a large, but lesser amount. The teams that scored little and allowed less had a common trend: they had great defenses and ERAs generally lower than FIPs. On the other hand, the teams that blasted the seams off the ball and had no problems putting runs on the scoreboard tended to have poor defenses, and their FIP-ERA difference was negative.

Using this data, I decided to run a regression analysis between a team’s defense and this FIP-ERA difference. There was a solid relationship between these two variables, with an r-squared of 0.48. This indicates that the difference between a team’s FIP-ERA difference tends to increase as the skill level of their defense increases.

fiperatodef

Now we know correlation does not imply causation, but this relationship indicates the strength within this relationship. The better a team’s defense is, the more likely their defense will be able to positively influence their pitching staff’s performance. These were teams like the 2002 Atlanta Braves, the 2011 Tampa Bay Rays, or the 2004 and 2005 St. Louis Cardinals. These teams didn’t have great offenses, but they had great defenses, they had good team ERAs, and they prevented teams from scoring runs.

On the other hand, there were teams like the 2003 and 2004 Red Sox as well as the Mid-2000s Yankees. These teams were those with massive payrolls that paid a premium for a punishing lineup. These lineups, however, lacked defensive talent, causing their pitching staffs to underperform their expected performances, as their teams’ ERAs were higher than FIPs.

So how related is this FIP-ERA difference to the amount of runs allowed? Well, pretty strong, with an r-squared of 0.46. Again, a strong relationship, this time negative, indicating that as a team’s FIP-ERA increases, the runs that team allows decrease.

fiperadiftora

To reinforce this relationship, I looked at defense and runs allowed. Again, this relationship showed a good, not great relationship, with an r-squared at 0.28.

ratodef

From these relationships, we can deduce that as a team’s defense rises in skill, the runs they allow tend to decrease and their team FIP-ERA difference tends to increase. Similarly, as a team’s FIP-ERA increases, the amount of runs a team allows decreases. From these relationships, we can conclude that these three variables are related.

As a team’s defense increases, they can positively influence the effectiveness of their pitching staff and will decrease their runs allowed. This may seem like common sense, and it probably is.

Now when we look at Bill James’ Pythagorean Win Expectation and other similar formulae, we notice that a team’s expected winning percentage is not dependent on the runs they score, but rather, their run differential. So yes, if you want to, you can construct a team like the Bronx Bombers and spend millions to assemble the some of the best lineups of recent history. If you’ll do that, you’ll hit score a host of runs, and with decent pitching and decent fielding (or below-average defense and good pitching — like those mid-2000s Yankees teams), you’ll be able to outscore your opponents and have a high run differential.

Or, you can assemble a team that will limit the amount of runs you’ll give up, by investing in defense. You will be able to compensate for average hitting and pitching, as you will boost your pitching staff’s effectiveness, and you will reduce the need for your offense to put up great numbers. Again, we have seen teams like this. The 2002 Braves were a combination of good defense, great pitching (aided by that defense), and average or perhaps even below-average offense; yet, this team won 101 games by scoring a mediocre 702 runs on the season (the average for the NL was 720 that season, 747 for all of baseball). Similarly, the 2011 Tampa Bay Rays put up 707 runs, against an American League average of 723, and still put up 91 wins and made the playoffs with good pitching and better defense. In fact, FIP would indicate their pitching was expected to perform right at American League average, a 4.08 ERA, yet they posted a 3.58 ERA.

Moreover, in that same season, the Los Angeles Angels won 86 games on just 667 runs, as they had even better pitching than the Rays. FIP would indicate the Angels’ pitching would be around a 3.94 ERA with league-average defense, but it was at a 3.57 ERA. The impact of good pitching paired with defense clearly is high, and I can’t think of one better, final example than the 2010 World Series-winning San Francisco Giants, who couldn’t have reiterated this structure any better: great pitching, great defense, and below-average offense.

So when one is trying to construct a team, and, unlike with the Yankees or Red Sox, money is a constraint, one might want to consider investing in defense. I say this because I looked directly at the relationship of a team’s payroll and their defensive ability, and it actually produced a negative relationship.

salarytodef

I know this data may be influenced by the fact that salaries have increased essentially every year in the span between 2002-2015, but if this truly did influence the graph, it would show either two things. Teams recently may have lessened their focus on defense and spent on hitting and pitching (explaining why defense-oriented teams had smaller payrolls); or, even with the rising caps, teams have still been able to assemble winning rosters by focusing on defense. Whether it is the first condition or the second, or perhaps a combination of both, perhaps defense is undervalued in today’s MLB. I doubt I’m the first to figure this one out, but the Cubs have far and away the best defense in baseball. Also, the Red Sox and Indians have stellar gloves as well, forming a solid second-tier level of defense that has put them in playoff position. So maybe Jason Heyward’s contract shouldn’t look so bad after all.

You don’t have to score a ton of runs to be a playoff baseball team. You just have to score more than the other team does, which can be done through limiting the amount of runs they score. It may seem like common sense, but common sense eludes us all at times.

There are many ways to construct a baseball team, and this might be just one more. And for stingy owners, it wouldn’t break the bank.