Foundations of Batting Analysis – Part 1: Genesis

This was originally written as a single piece of research, but as it grew in length far beyond what I originally anticipated, I’ve broken it into three parts for ease of digestion. In each part, I have linked to images of the original source material when possible. There has been nothing quite as frustrating in researching the creation of baseball statistics as being misled by faulty citations, so I figured including actual copies of the original material would mitigate this issue for future researchers. Full bibliographic citations will be included for the entirety of the paper at the conclusion of Part III.

“[Statistics’] object is the amelioration of man’s condition by the exhibition of facts whereby the administrative powers are guided and controlled by the lights of reason, and the impulses of humanity impelled to throb in the right direction.”

–Joseph C. G. Kennedy, Superintendent of the United States Census, 1859

In a Thursday afternoon game in Marlins Park last season, Yasiel Puig faced Henderson Alvarez in the top of the fourth inning and demolished a first-pitch slider to straight-away center field. As Puig flipped his bat with characteristic flair and began to trot towards first base, remnants of the ball soared over the head of Justin Ruggiano and hit the highest point on the 16-foot wall, 418-feet away from home plate; Puig coasted into second base with a stand-up double.

Two months earlier, in another afternoon game, this time at Yankee Stadium, Puig hit the ball sharply onto the ground between Reid Brignac and second base causing it to roll into left-center field. Puig sprinted towards first base, rounding the bag hard before Brett Gardner was able to gather the ball. Gardner made a strong, accurate throw into second base, but it was a moment too late; Puig slid into second, safe with a double.

In MLB 13: The Show, virtual Yasiel Puig faced virtual Justin Verlander in Game Seven of the Digital World Series. Verlander had managed to get two outs in the inning, but the bases were loaded as Puig came to the plate. The Tiger ace reared back and threw the 100-mph heat the Dodger phenom was expecting. Puig began his swing but, at the moment of contact, there was a glitch in the game. Suddenly, Puig was standing on second base, all three baserunners had scored, and Verlander had the ball again; “DOUBLE” flashed on the scoreboard.

If the outcome is the same, is there any difference between a monster fly ball, a well-placed groundball, and a glitch in the matrix?

Analysis of batting presented over the past 150 years has suggested that the answer is no – a double is a double. However, with detailed play-by-play information compiled over the last few decades, we can show that the traditional concepts of the “clean hit” and “effective batting” have limited our ability to accurately measure value produced by batters. I’d like to begin by examining how the hit found its way into the baseball lexicon and how it has impacted player valuation for the entire history of the professional game.

The earliest account of a baseball game that included a statistical chart, the first primordial box score, appeared in the 22 October 1845 issue of the New York Morning News edited by J. L. O’Sullivan. This “abstract” recorded two statistics—runs scored and “hands out”—for the eight players on each team (the number of players wasn’t standardized to nine until 1857). Runs scored was the same as it is today, while hands out counted the total number of outs a player made both as a batter and as a baserunner.

For the next two decades, statistical accounting of baseball games was limited to these two statistics and basic variations of them. Through the bulk of this period, the box score was little more than an addendum to the game story – a way to highlight specific contributions made by each player in a game. It wasn’t until 1859 that a music teacher turned sports journalist took the first steps in developing methods to examine the general effectiveness of batters.

Henry Chadwick had immigrated to Brooklyn from Exeter, England with his parents and younger sister a few weeks before his 13th birthday in 1837. He came from a family of reformists guided by the Age of Enlightenment. Henry’s grandfather, Andrew, was a friend and follower of John Wesley, who helped form a movement within the Church of England in the mid-18th century aimed at combining theological reflection with rational analysis that became known as Methodism. Henry’s father, James, spent time in Paris in the late-18th century in support of the French Revolution and stressed the importance of education to learn how to “distinguish truth from error to combat the evil propensities of our nature.” Henry’s half-brother, Edwin, 24 years Henry’s senior, was a disciple of Jeremy Bentham, whose philosophies on reason, efficiency, and utilitarianism inspired Edwin’s work on improving sanitation and conditions for the poor in England, eventually earning him knighthood. This rational approach to reform that was so prevalent in his family will be easily seen in Henry Chadwick’s future promotion of baseball.

Chadwick’s work as a journalist began at least as early as 1843 with the Long Island Star, when he was just 19 years old, but he worked primarily as a music teacher and composer as a young adult. By the 1850s, his focus had shifted primarily to journalism. While his early writing was on cricket, he eventually shifted to covering baseball in assorted New York City and Brooklyn periodicals. Retrospectively, Chadwick described his initial interest in promoting baseball, and outdoor games and sports in general, as a way to improve public health, both physically and psychologically. In The Game of Base Ball, published in 1868, Chadwick recounted a thought he had had over a decade earlier:

“…that from this game of ball a powerful lever might be made by which our people could be lifted into a position of more devotion to physical exercise and healthful out-door recreation than they had hitherto, as a people, been noted for.”

From his writing on baseball during the 1850s, Chadwick became such a significant voice for the sport that, in 1857, he was invited to suggest amendments at the meeting of the “Committee to Draft a Code of Laws on the Game of Base Ball” for a convention of delegates representing 16 baseball clubs (two of which were absent) based in and around New York City and Brooklyn. The Convention of 1857 laid down rules standardizing games played by those clubs, including setting the number of innings in a game to nine, the number of players on a side to nine, and the distance between the bases to 90 feet. The following year, another convention was held, now with delegates from 25 teams, which formed the first permanent organizing body for baseball: the National Association of Base Ball Players (NABBP).[i] The “Constitution,” “By-Laws,” and “Rules and Regulations of the Game of Base Ball” adopted by the NABBP for that year were printed in the 8 May 1858 issue of the New York Clipper.

As the rules were being unified among New York teams, the methods used to recount games were evolving. By 1856, early versions of the line score, an inning-by-inning tally of the number of runs scored by each team, were being tested in periodicals, like this one from the 9 August issue of the Clipper. On 13 June 1857, the Clipper included its first use of a traditional line score for the opening game of the season between the Knickerbockers and the Eagles.[ii] In August 1858, Chadwick—who by this time had become the Clipper’s baseball reporter—began testing out various other statistics, noting the types of outs each player was making and the number of pitches by each pitcher. A game on 7 August 1858, between the Resolutes and the Niagaras, featured 812 total pitches in eight innings before the game was called due to darkness.

In 1859, Chadwick conducted a seasonal analysis of the performance of baseball players—the first of its kind. In the 10 December issue of the Clipper, the Excelsior Club’s performance during the prior season was analyzed through a pair of charts titled, “Analysis of the Batting” and “Analysis of the Fielding.” Most notably, within the “Analysis of the Batting” were two columns, both titled “Average and Over.” These columns reflected the number of runs per game and outs per game by each player during the season – the forebears of batting average. The averages were written in the cricket style of X—Y, where X is the number of runs or outs per game divided evenly (the “average”) and Y is the remainder (the “over”). For instance, Henry Polhemus scored 31 runs in 14 games for the Excelsiors in the 1859 season, an average of 2—3 (14 divides evenly into 31 twice, leaving a remainder of 3). Runs and outs per game became standard inclusions in annual batting analyses over the next decade.

These seasonal averages marked a significant leap forward for baseball analysis, and yet, their foundation, runs and outs, was the same as that used for nearly every statistic in baseball’s brief history. It’s important to note that the baseball players and journalists covering the sport in this period all generally had a cricket background.[iii] In cricket, there are three possible outcomes on any pitch: a run is scored, an out is made, or nothing changes. When the batter successfully moves from base to base in cricket, he is scoring a run; there are no intermediary bases states like those that exist in baseball. Consequently, the number of runs a cricket player scores tends to be a very accurate representation of the value he provided his team as a batter.

In baseball, batters rarely score due solely to their performance at the plate. Excluding outside-the-park home runs, successfully rounding the bases to score a run requires baserunning, fielding, help from teammates, and the general randomness that happens in games. It was 22 years after the appearance of that first box score in the New York Morning News before an attempt was made to isolate a player’s batting performance.

In June 1867, Chadwick began editing a weekly periodical called The Ball Players’ Chronicle – the first newspaper devoted “to the interest of the American game of base ball and kindred sports of the field.” To open the first issue on 6 June, a three-game series between the Harvard College Club and the Lowell Club of Boston was recounted. The deciding game, a 39-28 Harvard victory to win the “Championship of New England,” received a detailed, inning-by-inning recap of the events, followed by a box score. The primary columns of the chart featured runs and outs, as always. What was noteworthy about this box score, though, was the inclusion of a list titled “Bases Made on Hits,” reflecting the number of times each player reached first base on a clean hit. Writers had described batters reaching base on hits in their game accounts since the 1850s, but it was always just a rhetorical device to describe the action of the game. This was the first time anyone counted those occurrences as a measurement of batting performance.

Three months after this game account, in the 19 September issue of the Chronicle, Chadwick explained his rationale for counting hits in an editorial titled “The True Test of Batting”:

“Our plan of adding to the score of outs and runs the number of times…bases are made on clean hits will be found the only fair and correct test of batting; and the reason is, that there can be no mistake about the question of a batsman’s making his first base, that is, whether by effective batting, or by errors in the field…whereas a man may reach his second or third base, or even get home, through…errors which do not come under the same category as those by which a batsman makes his first base…

In the score the number of bases made on hits should be, of course, estimated, but as a general thing, and especially in recording the figures by the side of the outs and runs, the only estimate should be that of the number of times in a game on which bases are made on clean hits, and not the number of bases made.”

Taking his own advice, Chadwick printed “the number of times in a game on which bases are made on clean hits” side-by-side with runs and outs for the first time in the same 19 September issue of the Chronicle.[iv] Over the next few months, most major newspapers covering baseball were including hits in the main body of their box scores as well. The hit had become baseball’s first unique statistic.

By 1868, hits had permeated the realm of averages. On 5 December of that year, the Clipper included a chart on the “Club Averages” for the Cincinnati Club.[v] In addition to listing runs per game and outs per game for each player, the chart included “Average to game of bases on hits,” the progenitor of the modern batting average. All three of these averages were listed in decimal form for the first time in the Clipper. A year later, on 4 December 1869, “Average total bases on hits to a game” appeared as well in the Clipper, the precursor to slugging average.

As hits per game became the standard measurement of “effective batting” over the next few seasons, H. A. Dobson of the Clipper noted an issue with this “batting average” in a letter he wrote to Nick E. Young, the Secretary of the Olympic Club in Washington D.C.—and future president of the National League— who would be attending the Secretaries’ Meeting of the newly formed National Association of Professional Base Ball Players (NAPBBP).[vi] The letter, which was published in the Clipper on 11 March 1871 was “on the subject of a new and accurate method of making out batting averages.”

Dobson was a strong proponent of using hits to form batting averages, noting that “times first base on clean hits…is the correct basis from which to work a batting average, as he who makes his first base by safe hitting does more to win a game than he who makes his score by a scratch. This is evident.” He notes, though, that measuring the average on a per-game basis does not allow for comparison of teammates, as the “members of the same nine do not have the same or equal chance to run up a good score,” and it does not allow the comparison of players across teams, “as the clubs seldom play an equal number of games.” Dobson continues:

“In view of these difficulties, what is the correct way of determining an average so that justice may be done to all players?

This question is quickly answered, and the method easily shown.

According to a man’s chances, so should his record be. Every time he goes to the bat he either has an out, a run, or is left on his base. If he does not go out he makes his base, either by his own merit or by an error of some fielder. Now his merit column is found in ‘times first base on clean hits,’ and his average is found by dividing his total ‘times first base on clean hits’ by his total number of times he went to the bat. Then what is true of one player is true of all…In this way, and in no other, can the average of players be compared…

It is more trouble to make up an average this way than up the other way. One is erroneous, one is right.”

At the end of the letter, Dobson includes a calculation, albeit for theoretical players, of hits per at-bat—the first time it was ever published.

Thus, the modern batting average was born.[vii]


[i] The Chicago Cubs can trace their lineage back to the Chicago White Stockings who formed in 1870 and are the lone surviving member of the NABBP. The Great Chicago Fire in 1871 destroyed all of their equipment and their new stadium, the Union Base-Ball Grounds, only a few months after it opened, holding them out of competition for two years. If not for the fire, the Cubs would be the oldest, continually-operating franchise in American sports. That honor instead goes to the Atlanta Braves which were founding members of the National Association of Professional Base Ball Players (NAPBBP) in 1871 as the Boston Red Stockings.

[ii] Though the game was described as the “first regular match of Base Ball played this season,” it did not abide by the rules set forth in the Convention of 1857 that occurred just a few months prior. Rather, the teams appear to have been playing under the 1854 rules agreed to by the Knickerbockers, Gothams, and Eagles where the winner was the first to score 21 runs.

[iii] The first known issue of cricket rules was formalized in 1744 in London, England and brought to America in 1754 by Benjamin Franklin, 91 years before William R. Wheaton and William H. Tucker drafted the Rules and Regulations of the Knickerbocker Base Ball Club, the first set of baseball rules officially adopted by a club. Years later, Wheaton claimed to have written rules for the Gotham Base Ball Club in 1837, on which the Knickerbocker rules were based, but there is no existing copy of those rules. Early forms of cricket and baseball were played well before each of their rules were officially adopted, but trying to put a start date on each game before the formal inception of its rules is effectively impossible.

[iv] There is an oft-cited article written by H. H. Westlake in the March 1925 issue of Baseball Magazine, titled “First Baseball Box Score Ever Published,” in which Westlake claims that Chadwick invented the modern box score, one that included runs, hits, put outs, assists, and errors, in a “summer issue” of the New York Clipper in 1859. However, the box score provided by Westlake doesn’t actually exist, at least not in the Clipper. For comparison, here is the Westlake box score printed side-by-side with a box score printed in the 10 September 1859 issue of the Clipper. While the players are listed in the same order, and the run totals are identical (and the total put outs are nearly identical), the other statistics are completely imaginary.

[v] This club, featuring the renowned Harry Wright, became the first professional club in the following season, 1869, when the NABBP began to allow professionalism.

[vi] The NAPBBP is more commonly known today as, simply, the National Association (NA). However, before the NAPBBP formed, the common name for the NABBP was also the National Association.  It seems somewhat disingenuous after the fact to call the later league the National Association, but I suppose it’s easier than saying all those letters.

[vii] I immediately take this back, but only on a technicality. “Hits per at-bat” is the modern form of batting average, but at-bats as defined by Dobson are not the same as what we use today. Dobson defined a time at bat as the number of times a batter makes an “out, a run, or is left on his base.” In the subsequent decades after the article was published, “times at bat” began to exclude certain events. Notably, walks were excluded beginning in 1877 (with a quick reappearance in 1887 when they were counted the same as hits), times hit by the pitcher were excluded in 1887, sacrifice bunts in 1894, catcher’s interference in 1907, and sacrifice flies in 1908 (though, sacrifice flies went in and out of the rules multiple times over the next few decades, and weren’t firmly excluded until 1954).





3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Dang
9 years ago

Fantastic article. Seriously awesome.

Locke
9 years ago

Very well researched, I’m interested to see where we go in part 2.

Nick Mandarano
9 years ago

Absolutely amazing article. Weird to think such simple and common sense statistics were once groundbreaking. Makes me feel confident that statistics are ever-growing and we have to widen our horizons when it comes to sabermetrics.