What Is a Pitcher? What Is a Batter?
When we consider individual baseball players, we think that we understand how to divide them into pitchers and hitters. Clayton Kershaw is a pitcher, we say confidently, and the recently-traded Nori Aoki is a hitter. But from the perspective of the statistical record, the question can be a little harder to answer. Kershaw, after all, had appeared in six more games as a pinch-hitter or pinch-runner than he has as a pitcher through 2016, and Mr. Aoki stood on the mound and induced a fly out from Aaron Judge earlier this season (among other less satisfactory results). Is there a programmatic way to divide baseball players into hitters and pitchers from the perspective of the statistical record?
For me, the question isn’t merely academic: I am building a baseball trivia game, and it is very important for the rules of the game that I be able to programmatically divide baseball players into hitters and pitchers. In particular, I need to divide players into pitchers and hitters over the course of their careers, not merely from the standpoint of a particular season or game. And I need to do so definitively: a player can’t be both a pitcher and a hitter. The data that I am working with for my baseball trivia game comes from Sean Lahman’s database, and includes batter seasons and pitcher seasons back to the 1870s.
The Lahman database does not attempt to disambiguate between hitters and pitchers, merely including hitting seasons and pitching seasons. If a player only hit, that’s a hitter; if he only pitched, that’s a pitcher. Easy enough, but there are of course complications. Pitchers bat in real baseball, so there are lots of hitting seasons by pitchers in the data. And sometimes, as noted above, hitters pitch in blowouts, so there are pitching seasons by batters included as well.
Then there’s Babe Ruth, who really was both a pitcher and a hitter, you might say, throwing lots of innings in the 1910s before becoming a full-time hitter in the ‘20s. What does it mean to pitch and hit “a lot”? Carlos Zambrano was a pitcher, informed baseball fans presumably agree. He was also a decent hitter and was used as a pinch-hitter fairly often. He’s not a batter, though. Right?
Here’s the programmatic metric that I’ve decided on and used to divide players in my game:
According to the Lahman database, there have been 5,277,522 batter games and 1,064,580 pitcher games in baseball history through 2016. That’s a ratio of about 4.95 batter games to 1 pitcher game. Any player with a higher ratio should be classified as a hitter, any lower as a pitcher. That is my claim: any player with a higher ratio of “Games appeared in as a batter” to “Games appeared in as pitcher” is a batter, and the player is otherwise a pitcher. Some data points that fall out of this classification:
Ruth: 2503 hitter games, 193 pitcher games: 12.9 ratio: Hitter
Kershaw (through 2016): 288 hitter, 282 pitcher: 1.02 ratio: Pitcher (Kershaw has been used as a pinch-hitter and pinch-runner, stupidly, from time to time)
Zambrano: 384 hitter, 354 pitcher, 1.08 ratio: Pitcher
Rick Ankiel: 653 hitter, 51 pitcher: 12.8 ratio: Hitter
We might be interested in “hittery” pitchers or “pitchery” hitters: players whose ratio of batter games to pitcher games approach the dividing ratio of 4.95 to 1. By this metric, the “hittery-est pitcher” with a career of any length is Jimmy “Nixey” Callahan, who pitched and played left field for various Chicago teams and the Phillies in the late 1890s and early 1900s.
The “Pitchery-est hitter” is John Ward, who was mostly a pitcher for the Providence Grays for seven years and then a middle infielder for various New York teams for a decade. He’s about twice as “pitchery” of a hitter as Ruth.
Most of the real double-duty guys played in the dead-ball era. A man named Hal Jeffcoat played CF and often provided relief innings for some lousy Cubs and Reds teams in the 1950s. Eno Sarris mentioned him in the context of an article on two-way players earlier this year. In our modern era of extreme specialization, not-too-good OF turned not-too-good pitcher Brooks Kieschnick is about as close as it gets.
It might be slightly more precise to use Innings Pitched and Innings As A Batter Or Fielder, but that would introduce some problems (that I am eliding here) and probably wouldn’t move the ratio very much. What do you think? How would you programmatically and consistently divide players into batters and pitchers?
If you’d like to be a beta tester for the trivia game, or be kept in the loop for when the game is released, sign up here.