Archive for Player Analysis

Foundations of Batting Analysis – Part 3: Run Creation

I’ve decided to break this final section in half and address the early development of run estimation statistics first, and then examine new ways to make these estimations next week. In Part 1, we examined the early development of batting statistics. In Part 2, we broke down the weaknesses of these statistics and introduced new averages based on “real and indisputable facts.” In Part 3, we will examine methods used to estimate the value of batting events in terms of their fundamental purpose: run creation.

The two main objectives of batters are to not cause an out and to advance as many bases as possible. These objectives exist as a way for batters to accomplish the most fundamental purpose of all players on offense: to create runs. The basic effective averages presented in Part 2 provide a simple way to observe the rate at which batters succeed at their main objectives, but they do not inform us on how those successes lead to the creation of runs. To gather this information, we’ll apply a method of estimating the run values of events that can trace its roots back nearly a century.

The earliest attempt to estimate the run value of batting events came in the March 1916 issue of Baseball Magazine. F.C. Lane, editor of the magazine, discussed the weakness of batting average as a measure of batting effectiveness in an article titled “Why the System of Batting Averages Should be Changed”:

“The system of keeping batting averages…gives the comparative number of times a player makes a hit without paying any attention to the importance of that hit. Home runs and scratch singles are all bulged together on the same footing, when everybody knows that one is vastly more important than the other.”

To address this issue, Lane considered the fundamental purpose of making hits.

“Hits are not made as mere spectacular displays of batting ability; they are made for a purpose, namely, to assist in the all-important labor of scoring runs. Their entire value lies in their value as run producers.”

In order to measure the “comparative ability” of batters, Lane suggests a general rule for evaluating hits:

“It would be grossly inaccurate to claim that a hit should be rated in value solely upon its direct and immediate effect in producing runs. The only rule to be applied is the average value of a hit in terms of runs produced under average conditions throughout a season.”

He then proposed a method to estimate the value of each type of hit based on the number of bases that the batter and all baserunners advanced on average during each type of hit. Lane’s premise was that each base was worth one-fourth of a run, as it takes the advancement through four bases for a player to secure a run. By accounting for all of the bases advanced by a batter and the baserunners due to a hit, he could determine the number of runs that the hit created. However, as the data necessary to actually implement this method did not exist in March 1916, the work done in this article was little more than a back-of-the-envelope calculation built on assumptions concerning how often baserunners were on base during hits and how far they tended to advance because of those hits.

As he wanted to conduct a rigorous analysis with this method, Lane spent the summer of 1916 compiling data on 1,000 hits from “a little over sixty-two games”[i] to aid him in this work. During these games, he would note “how far the man making the hit advanced, whether or not he scored, and also how far he advanced other runners, if any, who were occupying the bases at the time.” Additionally, in any instance when a batter who had made a hit was removed from the base paths due to a subsequent fielder’s choice, he would note how far the replacement baserunner advanced.

Lane presented this data in the January 1917 issue of Baseball Magazine in an article titled similarly to his earlier work: “Why the System of Batting Averages Should be Reformed.” Using the collected data, Lane developed two methods for estimating the run value that each type of hit provided for a team on average. The first method, the one he initially presented in March 1916, which I’ll call the “advancement” method,[ii] counted the total number of bases that the batter and the baserunners advanced during a hit, and any bases that were advanced to by batters on a fielder’s choice following a hit (an addition not included in the first article). For example, of the 1,000 hits Lane observed, 789 were singles. Those singles resulted in the batter advancing 789 bases, runners on base at the time of the singles advancing 603 bases, and batters on fielder’s choice plays following the singles advancing to 154 bases – a total of 1,546 bases. With each base estimated as being worth one-fourth of a run, these 1,546 bases yielded 386.5 runs – an average value of .490 runs per single. Lane repeated this process for doubles (.772 runs), triples (1.150 runs), and home runs (1.258 runs).

This was the method Lane first developed in his March 1916 article, but at some point during his research he decided that a second method, which I’ll call the “instrumentality” method, was more preferable.[iii] In this method, Lane considered the number of runs that were scored because of each hit (RBI), the runs scored by the batters that made each hit, and the runs scored by baserunners that reached on a fielder’s choice following a hit. For instance, of the 789 singles that Lane observed, there were 163 runs batted in, 182 runs scored by the batters that hit the singles, and 16 runs scored by runners that reached on a fielder’s choice following a single. The 361 runs “created” by the 789 singles yielded an average value of .457 runs per single. This method was repeated for doubles (.786 runs), triples (1.150), and home runs (1.551 runs).

In March 1917, Lane went one step further. In an article titled “The Base on Balls,” Lane decried the treatment of walks by the official statisticians and aimed to estimate their value. In 1887, the National League had counted walks as hits in an effort to reward batters for safely reaching base, but the sudden rise in batting averages was so off-putting that the method was quickly abandoned following the season. As Lane put it:

“…the same potent intellects who had been responsible for this wild orgy of batting reversed their august decision and declared that a base on balls was of no account, generally worthless and henceforth even forever should not redound to the credit of the batter who was responsible for such free transportation to first base.

The magnates of that far distant date evidently had never heard of such a thing as a happy medium…‘Whole hog or none’ was the noble slogan of the magnates of ’87. Having tried the ‘whole’ they decreed the ‘none’ and ‘none’ it has been ever since…

‘The easiest way’ might be adopted as a motto in baseball. It was simpler to say a base on balls was valueless than to find out what its value was.”

Lane attempted to correct this disservice by applying his instrumentality method to walks. Over the same sample of 63 games in which he collected information on the 1,000 hits, he observed 283 walks. Those walks yielded six runs batted in, 64 runs scored by the batter, and two runs scored by runners that replaced the initial batter due to a fielder’s choice. Through this method, Lane calculated the average value of a walk as .254 runs.[iv]

Each method Lane used was certainly affected by his limited sample of data. The proportions of each type of hit that he observed were similar to the annual rates in 1916, but the examination of only 1,000 hits made it easy for randomness to affect the calculation, particularly for the low-frequency events. Had five fewer runners been on first base at the time of the 29 home runs observed by Lane, the average value of a home run would have dropped from 1.258 runs to 1.129 runs using the advancement method and from 1.551 runs to 1.379 runs using the instrumentality method. It’s hard to trust values that are that so easily affected by a slight change in circumstances.

Lane was well aware of these limitations, but treated the work more as an exercise to prove the merit of his rationale, rather than an official calculation of the run values. In an article in the February 1917 issue of Baseball Magazine titled, “A Brand New System of Batting Averages,” he notes:

“Our sample home runs, which numbered but 29, were of course less accurate. But we did not even suggest that the values which were derived from the 1,000 hits should be incorporated as they stand in the batting averages. Our labors were undertaken merely to show what might be done by keeping a sufficiently comprehensive record of the various hits…our data on home runs, though less complete than we could wish, probably wouldn’t vary a great deal from the general averages.”

In the same article, Lane applied the values calculated with the instrumentality method to the batting statistics of players from the 1916 season, creating a statistic he called Batting Effectiveness, which measured the number of runs per at-bat that a player created through hits. The leaderboard he included is the first example of batters being ranked with a run average since runs per game in the 1870s.

Lane didn’t have a wide audience ready to appreciate a run estimation of this kind, and it gained little notoriety going forward. In his March 1916 article, Lane referenced an exchange he had with the Secretary of the National League, John Heydler, concerning how batting average treats all hits equally. Heydler responded:

“…the system of giving as much credit to singles as to home runs is inaccurate…But it has never seemed practicable to use any other system. How, for instance, are you going to give the comparative values of home runs and singles?”

Seven years later, by which point Heydler had become President of the National League, the method to address this issue was chosen. In 1923, the National League adopted the slugging average—total bases on hits per at-bat—as its second official average.

While Lane’s work on run estimation faded away, another method to estimate the run value of individual batting events was introduced nearly five decades later in the July/August 1963 issue of Operations Research. A Canadian military strategist, with a passion for baseball, named George R. Lindsey wrote an article for the journal titled, “An Investigation of Strategies in Baseball.” In this article, Lindsey proposed a novel approach to measure the value of any event in baseball, including batting events.

The construction of Lindsey’s method began by observing all or parts of 373 games from 1959 through 1960 by radio, television, or personal attendance, compiling 6,399 half-innings of play-by-play data. With this information, he calculated P(r|T,B), “the probability that, between the time that a batter comes to the plate with T men out and the bases in state B,[v] and the end of the half-inning, the team will score exactly r runs.” For example, P(0|0,0), that is, the probability of exactly zero runs being scored from the time a batter comes to the plate with zero outs and the bases empty through the end of the half-inning, was found to be 74.7 percent; P(1|0,0) was 13.6 percent, P(2|0,0) was 6.8 percent, etc.

Lindsey used these probabilities to calculate the average number of runs a team could expect to score following the start of a plate appearance in each of the 24 out/base states: E(T,B).[vi] The table that Lindsey produced including these expected run averages reflects the earliest example of what we now call a run expectancy matrix.

With this tool in hand, Lindsey began tackling assorted questions in his paper, culminating with a section on “A Measure of Batting Effectiveness.” He suggested an approach to assessing batting effectiveness based on three assumptions:

“(a) that the ultimate purpose of the batter is to cause runs to be scored

(b) that the measure of the batting effectiveness of an individual should not depend on the situations that faced him when he came to the plate (since they were not brought about by his own actions), and

(c) that the probability of the batter making different kinds of hits is independent of the situation on the bases.”

Lindsey focused his measurement of batting effectiveness on hits. To estimate the run values of each type of hit, Lindsey observed that “a hit which converts situation {T,B} into {T,B} increases the expected number of runs by E(T,B) – E(T,B).” For example, a single hit in out/base state {0,0} will yield out/base state {0,1}. If you consult the table that I linked above, you’ll note that this creates a change in run expectancy, as calculated by Lindsey, of .352 runs (.813 – .461). By repeating this process for each of the 24 out/base states, and weighting the values based on the relative frequency in which each out/base state occurred, the average value of a single was found to be 0.41 runs.[vii] This was repeated for doubles (0.82 runs), triples (1.06 runs), and home runs (1.42 runs). By applying these weights to a player’s seasonal statistics, Lindsey created a measurement of batting effectiveness in terms of “equivalent runs” per time at bat.

Like with Lane’s methods, the work done by Lindsey was not widely appreciated at first. However, 21 years after his article was published in Operations Research, his system was repurposed and presented in The Hidden Game of Baseball by John Thorn and Pete Palmer—the man who helped make on base average an official statistic just a few years earlier. Using play-by-play accounts of 34 World Series games from 1956 through 1960,[viii] and simulations of games based on data from 1901 through 1977, Palmer rebuilt the run expectancy matrix that Lindsey introduced two decades earlier.

In addition to measuring the average value of singles (.46 runs), doubles (.80 runs), triples (1.02 runs), and home runs (1.40 runs) as Lindsey had done, Palmer also measured the value of walks and times hit by the pitcher (0.33 runs), as well as at-bats that ended with a batting “failure,” i.e. outs and reaches on an error (-0.25 runs). While I’ve already addressed issues with counting times reached on an error as a failure in Part 2, the principle of acknowledging the value produced when the batter failed was an important step forward from Lindsey’s work, and Lane’s before him. When an out occurs in a batter’s plate appearance, the batting team’s expected run total for the remainder of the half-inning decreases. When the batter fails to reach base safely, he not only doesn’t produce runs for his team, he takes away potential run production that was expected to occur. In this way, we can say that the batter created negative value—a decrease in expected runs—for the batting team.

Palmer applied these weights to a player’s seasonal totals, as Lindsey had done, and formed a statistic called Batter Runs reflecting the number of runs above average that a player produced in a season. Palmer’s work came during a significant period for the advancement of baseball statistics. Bill James had gained a wide audience with his annual Baseball Abstract by the early-1980s and The Hidden Game of Baseball was published in the midst of this new appreciation for complex analysis of baseball systems. While Lindsey and Lane’s work had been cast aside, there was finally an audience ready to acknowledge the value of run estimation.

Perhaps the most important effect of this new era of baseball analysis was the massive collection of data that began to occur in the background. Beginning in the 1980s, play-by-play accounts were being constructed to cover entire seasons of games. Lane had tracked 1,000 hits, Lindsey had observed 6,399 half-innings, and Palmer had used just 34 games (along with computer simulations) to estimate the run values of batting events. By the 2000s, play-by-play accounts of tens of thousands of games were publically available online.

Gone were the days of estimations weakened by small sample sizes. With complete play-by-play data available for every game over a given time period, the construction of a run expectancy matrix was effectively no longer an estimation. Rather, it could now reflect, over that period of games, the average number of runs that scored between a given out/base state and the end of the half-inning, with near absolute accuracy.[ix] Similarly, assumptions about how baserunners moved around the bases during batting events were no longer necessary. Information concerning the specific effects on the out/base state caused by every event in every baseball game over many seasons could be found with relative ease.

In 2007, Tom M. Tango,[x] Mitchel G. Lichtman, and Andrew E. Dolphin took advantage of this gluttony of information and reconstructed Lindsey’s “linear weights” method (as named by Palmer) in The Book: Playing the Percentages in Baseball. Tango et al. used data from every game from 1999 through 2002 to build an updated run expectancy matrix. Using it, along with the play-by-play data from the same period, they calculated the average value of a variety of events, most notably eight batting events: singles (.475 runs), doubles (.776 runs), triples (1.070 runs), home runs (1.397 runs), non-intentional walks (.323 runs), times hit by the pitcher (.352 runs), times reached on an error (.508 runs). and outs (-.299 runs). These events were isolated to form an estimate of a player’s general batting effectiveness called weighted On Base Average (wOBA).

Across 90 years, here were five different attempts to estimate the number of runs that batters created, with varying amounts of data, using varying methods of analysis, in varying run scoring environments, and yet the estimations all end up looking quite similar.

Method / Event

Advancement Instrumentality Equivalent Runs Batter Runs

wOBA

Single

.490

.457

.41 .46

.475

Double

.772 .786 .82 .80

.776

Triple

1.150 1.150 1.06 1.02

1.070

Home Run

1.258

1.551

1.42

1.40

1.397

Non-Intentional Walk

—–

.254

—–

.33

.323

Intentional Walk —–

.254

—– .33 .179
Hit by Pitch —– —– —– .33

.352

Reach on Error

—–

—–

—–

-.25

.508

Out

—– —– —– -.25

-.299

 

Beyond the general goal of measuring the run value of certain batting events, each of these methods had another thing in common: each method was designed to measure the effectiveness of batters. Lane and Lindsey focused exclusively on hits,  the traditional measures of batting effectiveness.[xi] Palmer added in the “on base” statistics of walks and times hit by the pitcher, while also accounting for the value of those times the batter showed ineffectiveness. Tango et al. threw away intentional walks as irrelevant events when it came to testing a batter’s skill, while crediting the positive value created by batters when reaching on an error.

The same inconsistencies present in the traditional averages for deciding when to reward batters for succeeding and when to punish them for failing are present in these run estimators. In the same way we created the basic effective averages in Part 2, we should establish a baseline for the total production in terms of runs caused by a batter’s plate appearances, independent of whether that production occurred due to batting effectiveness. We can later judge how much of that value we believe was caused by outside forces, but we should begin with this foundation. This will be the goal of the final part of this paper.


[i] In his article the next month, Lane says explicitly that he observed 63 games, but I prefer his unnecessarily roundabout description in the January 1917 article.

[ii] I’ve named these methods because Lane didn’t, and it can get confusing to keep going back and forth between the two methods without using distinguishing names.

[iii] Lane never explains why exactly he prefers this method, and just states that it “may be safely employed as the more exact value of the two.” He continues, “the better method of determining the value of a hit is…in the number of runs which score through its instrumentality than through the number of bases piled-up for the team which made it.” This may be true, but he never proves it explicitly. Nevertheless, the “instrumentality” method was the only one he used going forward.

[iv] This value has often been misrepresented as .164 runs in past research due to a separate table from Lane’s article. That table reflected the value of each hit, and walks, with respect to the value of a home run. Walks were worth 16.4 percent of the value a home run (.254 / 1.551), but this is obviously not the same as the run value of a base on balls.

[v] The base states, B, are the various arrangements of runners on the bases: bases empty (0), man-on-first (1), man-on-second (2), man-on-third (3), men-on-first-and-second (12), men-on-first-and-third (13), men-on-second-and-third (23), and the bases loaded (123).

[vi] The calculation of these expected run averages involved an infinite summation of each possible number of runs that could score (0, 1, 2, 3,…) with respect to the probability that that number of runs would score. For instance,  here are some of the terms for E(0,0):

E(0,0) = (0 runs * P(0|0,0)) + (1 run * P(1|0,0)) + (2 runs * P(2|0,0)) + … + (∞ runs * P(∞|0,0))

E(0,0) = (0 runs * .747) + (1 run * .136) + (2 runs* .068) + … + (∞ runs * .000)

E(0,0) = .461 runs

Lindsey could have just as easily found E(T,B) by finding the total number of runs that scored following the beginning of all plate appearances in a given out/base state through the end of the inning, R(T,B), and dividing that by the number of plate appearances to occur in that out/base state, N(T,B), as follows:

E(T,B) = Total Runs (T,B) / Plate Appearances (T,B) = R(T,B) / N(T,B)

This is the method generally used today to construct run expectancy matrices, but Lindsey’s approach works just as well.

[vii] To simplify his estimations, Lindsey made certain assumptions about how baserunners tend to move during hits, similar to the assumptions Lane made in his initial March 1916 article. Specifically, he assumed that “runners always score from second or third base on any safe hit, score from first on a triple, go from first to third on 50 per cent of doubles, and score from first on the other 50 per cent of doubles.” While he did not track the movement of players in the same detail which Lane eventually employed, the total error caused by these assumptions did not have a significant effect on his results.

[viii] In The Hidden Game of Baseball, Thorn wrote that Palmer used data from “over 100 World Series contests,” but in the foreword to The Book: Playing the Percentages in Baseball, Palmer wrote that “the data I used which ended up in The Hidden Game of Baseball in the 1980s was obtained from the play-by-play accounts of thirty-five World Series games from 1956 to 1960 in the annual Sporting News Baseball Guides.” I’ll lean towards Palmer’s own words, though I’ve adjusted “thirty-five” down to 34 since there were only 34 World Series games over the period Palmer referenced.

[ix] The only limiting factor in the accuracy of a run expectancy matrix in the modern “big data” era is in the accuracy of those who record the play-by-play information and in the quality of the programs written to interpret the data. Additionally, the standard practice when building these matrices is to exclude all data from the home halves of the ninth inning or later, and any other partial innings. These innings do not follow the standard rules observed in every other half-inning, namely that they must end with three outs, and thus introduce bias into the data if included.

[x] The only nom de plume I’ve included in this history, as far as I’m aware.

[xi] Lane didn’t include walks in his Batting Effectiveness statistic, despite eventually calculating their value.


Pitch Win Values for Starting Pitchers – May 2014

Introduction

A few weeks back, I introduced a new method of calculating pitch values using a FIP-based WAR methodology.  That post details the basic framework of these calculations and  can be found here.  This post is simply the May 2014 update of the same data.  What follows is predominantly data-heavy but should still provide useful talking points for discussion.  Let’s dive in and see what we can find.  Please note that the same caveats apply as last month.  We’re at the mercy of pitch classification.  I’m sure your favorite pitcher doesn’t throw that pitch that has been rated as incredibly below average, but we have to go off of the data that is available.  Also, Baseball Prospectus’s PitchF/x leaderboards list only nine pitches (Four-Seam Fastball, Sinker, Cutter, Splitter, Curveball, Slider, Changeup, Screwball, and Knuckleball).  Anything that may be classified outside of these categories is not included.  Also, anything classified as a “slow curve” (here’s looking at you, Yu) is not included in Baseball Prospectus’s curveball data.

Constants

Before we begin, we must first update the constants used in calculation for May.  As a refresher, we need three different constants for calculation: strikes per strikeout, balls per walk, and a FIP constant to bring the values onto the right scale.  We will tackle them each individually.

First, let’s discuss the strikeout constant.  In May, there were 52,100 strikes thrown by starting pitchers.  Of these 52,100 strikes, 5,005 were turned into hits and 15,110 outs were recorded.  Of these 15,110 outs, 4,058 were converted via the strikeout, leaving us with 11,052 ball-in-play outs.  11,052 ball-in-play strikes and 5,005 hits sum to 16,057 balls-in-play.  Subtracting 16,057 balls-in-play from our original 52,100 strikes leaves us with 36,043 strikes to distribute over our 4,058 strikeouts.  That’s a ratio of 8.88 strikes per strikeout.  This is up from 8.47 strikes per strikeout in March and April.  Hitters were slightly harder to strikeout in May that the previous two months.

The next two constants are much easier to ascertain.  In May, there were 29,567 balls thrown by starters and 1,575 walked batters.  That’s a ratio of 18.77 balls per walk, up from 18.50 balls per walk in March and April.  This data would suggest that hitters were slightly less likely to walk in May than previously.  The FIP subtotal for all pitches in May was 0.75.  The MLB Run Average for May was 4.32, meaning our FIP constant for May is 3.58.

Constant Value
Strikes/K 8.88
Balls/BB 18.77
cFIP 3.58

 

Pitch Values – May 2014

For reference, the following table details the FIP for each pitch type in the month of May.

Pitch FIP
Four-Seam 4.43
Sinker 4.29
Cutter 4.13
Splitter 4.03
Curveball 4.01
Slider 4.13
Changeup 4.80
Screwball 2.56
Knuckleball 3.38
MLB RA 4.32

As we can see, only two pitches would be classified as below average for the month of May: four-seam fastballs and changeups.  Sinkers also came in right around league average.  Pitchers that were able to stand out in these categories tended to have better overall months than pitchers who excelled at the other pitches.  Now, let’s proceed to the data for the month of May.

Four-Seam Fastball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Phil Hughes 0.7 185 Vidal Nuno -0.3
2 Ian Kennedy 0.6 186 Doug Fister -0.3
3 Jose Quintana 0.6 187 Wei-Yin Chen -0.3
4 Tom Koehler 0.5 188 John Danks -0.3
5 Lance Lynn 0.5 189 Mike Minor -0.4

Sinker

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Mike Leake 0.5 171 Brandon Maurer -0.2
2 Dallas Keuchel 0.4 172 Wandy Rodriguez -0.2
3 Tyson Ross 0.4 173 Tom Koehler -0.2
4 Charlie Morton 0.4 174 Kyle Lohse -0.3
5 Chris Archer 0.4 175 Edinson Volquez -0.6

Cutter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Corey Kluber 0.5 74 Shelby Miller -0.1
2 Josh Collmenter 0.4 75 Kevin Correia -0.1
3 Adam Wainwright 0.4 76 Hector Santiago -0.1
4 Jarred Cosart 0.4 77 Brandon McCarthy -0.2
5 Madison Bumgarner 0.3 78 Cliff Lee -0.2

Splitter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Masahiro Tanaka 0.3 27 Alfredo Simon -0.1
2 Hisashi Iwakuma 0.2 28 Franklin Morales -0.1
3 Hiroki Kuroda 0.2 29 Clay Buchholz -0.1
4 Jake Odorizzi 0.2 30 Jorge De La Rosa -0.1
5 Ubaldo Jimenez 0.2 31 Danny Salazar -0.2

Curveball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Sonny Gray 0.3 160 Clay Buchholz -0.1
2 Brandon McCarthy 0.2 161 Tyler Lyons -0.1
3 Ryan Vogelsong 0.2 162 Dan Straily -0.1
4 Tyler Skaggs 0.2 163 Yordano Ventura -0.1
5 Collin McHugh 0.2 164 Franklin Morales -0.2

Slider

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jason Hammel 0.4 120 Robbie Erlin -0.1
2 Ricky Nolasco 0.3 121 Kyle Gibson -0.2
3 Garrett Richards 0.3 122 Julio Teheran -0.2
4 Bud Norris 0.3 123 Johnny Cueto -0.2
5 Edwin Jackson 0.3 124 Yovani Gallardo -0.3

Changeup

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 0.3 165 Josh Collmenter -0.3
2 Stephen Strasburg 0.3 166 Jake Peavy -0.3
3 Francisco Liriano 0.2 167 Danny Duffy -0.3
4 Henderson Alvarez 0.2 168 Drew Smyly -0.3
5 Eric Stults 0.2 169 Marco Estrada -0.7

Screwball

Rank Pitcher Pitch Value
1 Alfredo Simon 0.0
2 Trevor Bauer 0.0
3 Hector Santiago 0.0

Knuckleball

Rank Pitcher Pitch Value
1 R.A. Dickey 0.6
2 C.J. Wilson 0.0

Overall

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 1.2 192 Edinson Volquez -0.3
2 Mike Leake 1.1 193 Alfredo Simon -0.3
3 Jason Hammel 1.0 194 CC Sabathia -0.3
4 Dallas Keuchel 1.0 195 Franklin Morales -0.4
5 Masahiro Tanaka 0.9 196 Marco Estrada -0.7

Pitch Ratings – May 2014

Four-Seam Fastball

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Jason Hammel 60 86 Brandon Maurer 38
2 Aaron Harang 60 87 John Danks 36
3 Phil Hughes 59 88 Trevor Bauer 35
4 Yordano Ventura 59 89 Rafael Montero 35
5 Jose Quintana 59 90 Mike Minor 28

Sinker

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Jeff Samardzija 58 71 Alfredo Simon 41
2 Jake Arrieta 58 72 Kyle Lohse 39
3 Aaron Harang 58 73 Ricky Nolasco 37
4 Blake Treinen 58 74 James Shields 37
5 Matt Shoemaker 57 75 Edinson Volquez 22

Cutter

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Josh Tomlin 60 26 Ryan Vogelsong 46
2 Corey Kluber 60 27 Josh Beckett 45
3 Franklin Morales 59 28 Dan Haren 44
4 David Price 58 29 Kevin Correia 41
5 Jorge De La Rosa 58 30 Jesse Chavez 40

Splitter

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Jake Odorizzi 60 10 Ricky Nolasco 54
2 Masahiro Tanaka 59 11 Tim Lincecum 53
3 Wei-Yen Chen 58 12 Kyle Kendrick 46
4 Ubaldo Jimenez 57 13 Dan Haren 43
5 Alex Cobb 57 14 Jorge De La Rosa 40

Curveball

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Felix Hernandez 60 61 Roenis Elias 42
2 John Lackey 59 62 Tommy Milone 41
3 Collin McHugh 58 63 Wei-Yen Chen 40
4 Jose Fernandez 58 64 Yordano Ventura 36
5 Mike Minor 58 65 Scott Carroll 35

Slider

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Yu Darvish 61 46 Jeremy Guthrie 40
2 Jhoulys Chacin 61 47 Homer Bailey 38
3 Corey Kluber 60 48 Julio Teheran 35
4 Edwin Jackson 60 49 Yovani Gallardo 31
5 Gavin Floyd 59 50 Kyle Gibson 30

Changeup

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Stephen Strasburg 59 59 Hector Noesi 33
2 Wade Miley 58 60 Cesar Ramos 30
3 Justin Verlander 58 61 Josh Collmenter 26
4 Francisco Liriano 57 62 Ian Kennedy 23
5 Anibal Sanchez 57 63 Marco Estrada 20

Screwball

Rank Pitcher Pitch Rating
1 Alfredo Simon 57
2 Hector Santiago 56
3 Trevor Bauer 56

Knuckleball

Rank Pitcher Pitch Rating
1 R.A. Dickey 55

Monthly Discussion

As we can see, Felix Hernandez ascended to the throne for this month riding the overall quality of his entire repertoire.  Hernandez was classified as throwing five different pitches in May (Four-Seam, Sinker, Curveball, Slider, and Changeup) and managed to earn at least 0.1 WAR in each category.  His best two pitches were his Sinker (0.4 WAR) and Changeup (0.3 WAR).  The most valuable pitch overall in May was the Four-Seam Fastball thrown by Phil Hughes.  The least valuable was Marco Estrada’s changeup.  As far as offspeed pitches, R.A. Dickey’s 0.6 WAR from his knuckleball lead the way.  Excluding Dickey’s knuckleball due to the sheer number of times it was thrown, the most valuable offspeed pitch was Jason Hammel’s slider.  The least valuable fastball was Edinson Volquez’s sinker.

On our 20-80 scale pitch ratings, the highest rated qualifying pitch was Yu Darvish’s slider.  Unsurprisingly, the lowest rated was Marco Estrada’s changeup.  It’s difficult to generate -0.7 WAR with a single pitch unless it was just awful.  The highest rated fastball Jake Odorizzi’s splitter, and the lowest rated fastball was Edinson Volquez’s sinker.

Pitch Values – 2014 Season

Four-Seam Fastball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Ian Kennedy 1.0 210 Doug Fister -0.3
2 Phil Hughes 1.0 211 Marco Estrada -0.3
3 Michael Wacha 0.9 212 Eric Stults -0.3
4 Jose Quintana 0.9 213 Dan Straily -0.4
5 Lance Lynn 0.7 214 Mike Minor -0.4

Sinker

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Cliff Lee 1.0 195 Mike Pelfrey -0.3
2 Charlie Morton 0.9 196 Edinson Volquez -0.3
3 Felix Hernandez 0.8 197 Erasmo Ramirez -0.3
4 Dallas Keuchel 0.8 198 Dan Straily -0.3
5 Justin Masterson 0.7 199 Wandy Rodriguez -0.3

Cutter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Madison Bumgarner 0.7 88 Shelby Miller -0.2
2 Adam Wainwright 0.7 89 Brandon McCarthy -0.2
3 Corey Kluber 0.7 90 Felipe Paulino -0.2
4 Clay Buchholz 0.5 91 Johnny Cueto -0.3
5 Josh Collmenter 0.4 92 C.J. Wilson -0.3

Splitter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Masahiro Tanaka 0.5 27 Jorge De La Rosa -0.1
2 Tim Hudson 0.3 28 Alfredo Simon -0.2
3 Hisashi Iwakuma 0.2 29 Franklin Morales -0.2
4 Hiroki Kuroda 0.2 30 Clay Buchholz -0.2
5 Wei-Yin Chen 0.2 31 Danny Salazar -0.3

Curveball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jose Fernandez 0.6 182 Ivan Nova -0.1
2 Sonny Gray 0.6 183 Bronson Arroyo -0.2
3 A.J. Burnett 0.5 184 Clay Buchholz -0.2
4 Brandon McCarthy 0.5 185 Franklin Morales -0.2
5 Stephen Strasburg 0.4 186 Felipe Paulino -0.3

Slider

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Edwin Jackson 0.5 139 Yovani Gallardo -0.2
2 Bud Norris 0.5 140 Tim Lincecum -0.2
3 Jason Hammel 0.4 141 Jeremy Guthrie -0.2
4 Aaron Harang 0.4 142 Erasmo Ramirez -0.2
5 Garrett Richards 0.4 143 Danny Salazar -0.4

Changeup

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Stephen Strasburg 0.5 191 Matt Cain -0.2
2 Francisco Liriano 0.5 192 Danny Duffy -0.3
3 Felix Hernandez 0.4 193 Drew Smyly -0.3
4 Eric Stults 0.4 194 Wandy Rodriguez -0.4
5 John Danks 0.4 195 Marco Estrada -0.6

Screwball

Rank Pitcher Pitch Value
1 Alfredo Simon 0.0
2 Trevor Bauer 0.0
3 Hector Santiago 0.0

Knuckleball

Rank Pitcher Pitch Value
1 R.A. Dickey 1.1
2 C.J. Wilson 0.0

Overall

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 1.8 216 Franklin Morales -0.4
2 Adam Wainwright 1.7 217 Dan Straily -0.4
3 Corey Kluber 1.6 218 Felipe Paulino -0.5
4 Aaron Harang 1.5 219 Marco Estrada -0.7
5 Jeff Samardzija 1.5 220 Wandy Rodriguez -0.8

Year-to-Date Discussion

If we look at the year-to-date numbers, Felix Hernandez still sits in the top spot.  Current AL and NL FIP leaders Corey Kluber and Aaron Harang rank third and fourth respectively.  The least valuable starter has been Wandy Rodriguez.  On a per-pitch basis, the most valuable pitch has been R.A. Dickey’s knuckleball, which should be the case for much of the season due to the heavy pitch totals.  Other than Dickey, the most valuable pitch has been Ian Kennedy’s four-seam fastball.  I guess there’s something to the idea of throwing a lot of fastballs in an extreme pitcher’s park after all.  The most valuable offspeed pitch has been Jose Fernandez’s curveball.  The fact that he still tops this list even after being injured and missing starts is simply astounding.  Get healthly Jose, we all miss your brilliance.  The least valuable pitch has been Marco Estrada’s changeup.  The least value fastball has been Mike Minor’s four-seam.  Qualitatively, I feel fairly encouraged by the year-to-date results so far.  The leaderboard is topped by two no-doubt aces, with the current FIP leaders coming in right behind them.  For reference, the top five in the year-to-date overall rankings are currently 1st, 6th, 2nd, 14th, and 22nd on the FanGraphs WAR leaderboards respectively.  Please feel free to provide feedback in the comments section.


Peter O’Brien’s Raw Power: Estimating Batted-Ball Velocities in the Minor Leagues

On May 20th Peter O’Brien hit a massive home run to straight away center clearing the 32 foot tall batter’s eye at Arm & Hammer Park more the 400 feet from home plate.  O’Brien is currently 1 home run behind Joey Gallo, in what looks to be an exciting competition for the minor league home run title.  O’Brien isn’t as highly touted a prospect as Gallo, but he still has some of the most impressive power in the minor leagues.  Reggie Jackson saw O’Brien’s home run and said it was one of hardest hit balls in the minor leagues that he had ever seen (and Reggie knows a thing or two about tape measure home runs).

How hard was that ball actually hit?  It is impossible to figure out exactly how hard and how far the ball was hit from the available information.  You can however use basic physics to make a reasonable estimation.

Below I explain the assumptions and thought process I used to get to an estimate of how hard the ball was hit.  If that does not interest you, then just skip to the end to find out what it takes to impress Reggie Jackson. But, if you’re curios or skeptical stick around.

OBSERVATIONS

I started off by watching the video to see what information I could gather (O’Brien’s at bat starts at the 37 second mark in the video).

TIME OF FLIGHT From the crack of the bat, to the ball leaving the park – it appears to take 5 seconds. If you watched the video, you can tell this is not a perfect measurement since the camera doesn’t track the ball very closely. If you think you have a better estimation, let me know and I’ll rework the numbers.  

LOCATION LEAVING THE PARK  The ball was hit to straight away center. From the park dimensions we know when it left the park it was 407 feet from home plate and at least 32 feet in the air to clear the batter’s eye.

ASSUMPTIONS

COEFFICIENTS OF DRAG (Cd) – The Cd determines how much a ball will slow down as it moves through the air. I chose 0.35 for the Cd because it is right in the middle of the most frequently inferred Cd values for the home runs that Allan Nathan was looking at in this paper.In looking at the Cds of baseballs, Allan Nathan showed there is reason to believe that there is some significant (meaning greater than what can be explained by random measurement error) variation in Cd from one baseball to another.

ORIGIN OF BALL I assume the ball was 3.5 feet off the ground and 2 feet in front of home plate when it was hit.  These are the standard parameters in Dr. Nathan’s trajectory calculator. But what if the location is off by a foot? The effects of the origin on the trajectory are translational. One foot up, one foot higher. One foot down, one foot lower. The other observations and assumptions are more significant in determining the trajectory of the home run.

Using these assumptions and the trajectory calculator, I was able to determine the minimum speed and backspin a ball would need in order to clear the 32 foot batter’s eye 5 seconds after being hit at different launch angles.  The table below shows the vertical launch angle (in degrees), the back spin (in RMPs) and the speed of the balled ball (in MPH).

Vertical launch angle Back spin Speed off Bat
19 14121 101
21 6817 101.9
23 4155 102.75
25 2779 103.69
27 1940 104.7
29 1375 105.89
30 1156 106.5
32 805 107.88
34 536 109.4
36 322 111.1
38 149 112.99
40 4 115.1

The graph shows a more visual representation of the trajectories in the table above (with the batter’s eye added in for reference).

http://i1025.photobucket.com/albums/y314/GWR87/OBrienhomerun_zpsb1507cf4.png

Looking at the graph you will notice that all of these balls would be scraping the top of the batter’s eye.  This makes sense because the table shows the minimum velocities and back spins needed for the ball to exactly clear the batter’s eye.

What is the slowest O’Brien could have hit the ball?

If you were in a rush, looking at the table you would think the slowest O’Brien could have hit the ball would be 101 MPH at 19o. But, not so fast! The amount of backspin required for the ball to travel at that trajectory is humanly impossible.

What is a reasonable backspin?

I am highly skeptical of backspin values greater than 4,000 rpm based on the Baseball Prospectus article by Alan Nathan “How Far Did That Fly Ball Travel?.” The backspin on home runs Nathan examined ranged from 500 to 3,500 rpm, with most falling in around 2,000. The first 3 entries in the table have backspins of over 4,000 and can be eliminated as possibilities. If the ball with the 19o launch angle only had 3,500 rpm of back spin it would have hit the batter’s eye less than 11 feet off the ground instead of clearing it.  Maybe you’re skeptical that I eliminated the 3rd entry because it’s close to the 4,000 rpm cut off.  Think about it this way, if a player was able to hit a ball with over 4,000 rpm of back spin, they would have to be hitting at a much higher launch angle than 23o (Higher launch angles generate greater spin while lower launch angles generate less spin).

The high launch angle trajectories with very little back spin (like the bottom three in the table) are also not very likely.  A ball hit with a 40o launch angle would almost certainly have more than 4 rpm of back spin.  If the ball hit with the 40o launch angle had 1,000 rmp of back spin (instead of 4) it would have been 70 feet off the ground, easily clearing the 32 foot batter’s eye.

Accounting for reasonable back spin, the slowest O’Brien could have hit the ball is 103.69 MPH at 25o with 2,779rpm of backspin.

So what do all these observations and assumptions get us?

We can say that the ball was likely hit 103.69 MPH or harder, with a launch angle of 25o or greater.  103.69 MPH launch velocity is not that impressive, it is essentially the league average launch velocity for a home run.  Distance wise, how impressive of a home runs was it? Unobstructed the ball would have landed at least 440 feet from home plate (assuming the 25o scenario).  The ball probably went further than 440 because it did not scrape the batter’s eye. So, how rare is a 440+ foot home run? Last year during the regular season there were 160 home runs that went 440 feet or further, there were a total of 4661 home runs that season, meaning only 3.4% of all home runs were hit at least that far.

For those of you who wanted to just skip to the end. My educated guess is that the ball went at least 440 feet and left the bat at at least 103.69 MPH.

If you like this, you can read other articles on my blog GWRamblings, or follow me on twitter  @GWRambling

None of this would have been possible without Alan Nathan’s great work on the physics of baseball.  I used his trajectory calculator to do this, and I referenced his articles frequently to make sure I wasn’t way making stupid assumptions. The information on major league home run distance is based off of hittrackeronline.com


Nick Markakis, What Happened?

Nick Markakis has carved himself out a nice major league career. He now has the 8h most hits in Orioles history and by seasons end he’ll likely be in sole possession of 6th place. Markakis, now with nearly 1,500 hits, at 30 years old has a shot at gathering 2,500 hits in his career. While hits are a compilation statistic,  that would still place him in the top 100 of all time. However, Markakis still strikes me as a player of unfulfilled potential. In his last four seasons, Markakis has not compiled a WAR higher than his rookie season (2.1 in 2006). His two highest WAR seasons—far and away—were at ages 23 and 24. In 2008, a season in which he compiled 6.1 WAR, he had the 11th highest total in all of baseball. To peak so young is a very odd career trajectory. While Markakis was on the path to being one of the best all around players in baseball, he cratered early. This loss in value is due to two reasons, which are readily apparent to date this season, a reduction in power and a reduction in defense.

Markakis early on posted decent advanced defensive numbers. But, since 2009 he has been bad according to the metrics. To follow up that up with some regular scouting, he has simply lost a step. He lost his range at a young age and has never been able to get it back. His arm keeps him respectable but he has even lost some of that strength as well. He remains a below average right fielder and it is not getting any better.

While his defense has hindered his overall value, the most critical aspect of his game to leave him at young age was his power. Markakis never hit many home runs, with a career high of 23, but the doubles were critical to his value. He had four straight seasons of 43, 48, 45, and 45. All fantastic numbers. In fact, after the 2010 season, he had a decent shot of reaching the top 10-20 for the all time doubles record if he kept up that pace. However, his homers and doubles fell following 2010. If he had maintained a 40 double, 15-20 homer pace over the course of his career, alongside his .300 batting average and decent walk rate, Markakis could have been one of the most valuable outfielders in the game. The graph below tells the story best of when he lost his power. Those are his season by season ISO and SLG numbers.

NickMarkakis_PowerGraph

Looking at the graph above, once can see that Markakis was average to above average in power production for his first handful of seasons. Starting in 2010 is when his power began to fall to below average. His numbers spiked in 2012, however that is his shortest season to date so the sample size is smaller than the other years around it. Also, 2012 was still lower in both ISO and SLG than 2007 and 2008. Since 2009, Nick Markakis has been a below average power hitter. And his most recent season, 2013, was his worst ever producing a paltry .085 ISO (.145 is considered average and .080 is considered awful) and posting a -.1 WAR number. But, the question still remains to why did he lose his power?

After watching Markakis for years and staring at hours of tape it is hard to tell if this power reduction is due to mechanical issues. Markakis has been known to change his stance and approach at the plate nearly every week. He will lower or raise his hands, stay open or close up, he is a constant tinkerer at the plate with his mechanics. I do not believe mechanics has anything to do with the steady power decline. Nor is it necessarily how pitchers are pitching to Markakis. Looking at the numbers, he is seeing a similar amount of pitches in the zone, a little less than the early years but nothing unexpected and in fact his rate has rebounded recently. Furthermore, the mix of pitches he is seeing is similar to his early years. It has not been an adjustment from pitchers. Rather, much like his defense, he simply lost a step earlier than most other position players do.

Looking at the two heat maps below. One shows his power peak years (2007 to 2010) and the one below that shows the last two seasons (2013 to 2014). They are ISO heat maps showing which pitches in which locations Markakis has been able to drive for extra bases.

Markakis2007to2010ISOMarkakis2013to2014ISO

Clearly, Nick Markakis has shown over the past two seasons to not be able to drive the pitches for extra bases that he once could. In particular the pitches in the outside middle of the plate—which if you remember those great Markakis years he could artfully fade right in between the center fielder and the left fielder for a double like clockwork—he has shown a clear ability to not drive for extra bases anymore. The only power left in Markakis’ game comes from pitches down and in and even then its limited power at best. Basically, he can still run into a meatball, but his double-hitting days are over. And with someone who cannot and has never been able to hit the ball out of the park readily, Markakis is basically a slap-hitting right fielder who can post some decent value at the plate, but nothing special.

The career arc is strange and unfortunate but clearly obvious. Markakis simply could not and cannot maintain the production of his early seasons. His skills broke down sooner than most. He is a nice piece and if he kept up his early pace, he would have been a steal on his current contract. However, unless he is brought back at a reduced price—or if Peter Angelos decides that loyalty is worth $17.5 million—Orioles fans better get used to having a new right fielder in 2015.

Article originally posted at www.Orioles-Nation.com


Satchel Paige: Baseball’s Believable Myth

One of the biggest drawbacks of statistics is the how they can get in the way of our imagination. I’ve heard stories of how Pete Rose could will his team to victory on any given day of his career that spanned 23 years. Our stats claim that, actually, you can value his contributions at 80 wins. Rickey Henderson’s speed was electric and unfathomable, and no one can put a number on that, we’ve heard. FanGraphs says, really, his baserunning was worth 142 runs. Aroldis Chapman throws so hard, his fastball isn’t comparable to anyone else’s in baseball. Our data suggest that last year it was 7 runs above average.

While statistics have contributed significantly more than they’ve taken from us, it is occasionally fun to ignore them and just pretend the stories we want to believe are true. However, for a pitcher that is the focus of some of the most incredible tales in baseball history, a few stats from the end of his career are all the more reason to trust the absurd stories we have about him.

Satchel Paige pitched almost all of his professional baseball career in the Negro Leagues and barnstorming. He estimated that he played for 250 teams, though his “facts” about himself were often far from reality (for instance, he claimed that he never hit under .300, but he actually hit .097 in the majors). Baseball wasn’t integrated until Paige was 41 years old. Up until that point, he had built a legendary career that earned him the first Hall of Fame induction for any Negro Leagues player. Unfortunately, record keeping from these leagues was nearly non-existent, and almost no statistical evidence remains of his elite performances.

Stories of Paige paint a picture of arguably the most talented and entertaining pitcher to ever throw a baseball. As a teenager playing semi-pro baseball in Alabama, he supposedly got so mad at a poorly performing defense that he ordered his outfielders to sit down in the infield, where they watched him strike out the game’s last batter to complete his shutout with the bases loaded.

The greatest Negro Leagues hitter, Josh Gibson, once told Paige that he was going to hit a grand slam off of him in an upcoming game. With Gibson in the hole and one player on base, Paige intentionally walked the next two hitters, so Gibson would have an opportunity to hit a grand slam. Paige struck him out.

Joe DiMaggio called Paige the best pitcher and hardest thrower he had ever seen. Teammates claimed he could consistently throw his fastball over a gum wrapper. In his six exhibition matchups against Dizzy Dean (during two seasons in which Dean achieved a total WAR over 13), Paige won 4 games, and Dean said Paige’s fastball made his own look like a changeup.

Witnesses of Paige’s pitching would go on to tell countless other stories of his heroics, and a good number of them can’t be true. But what is possibly most remarkable is how historically effective he was when he was finally allowed to play in the majors, long after his prime.

Satchel Paige’s pitching demands were enormous, because through almost his entire career, people only paid to watch him pitch. He would frequently throw over 100 pitches in consecutive days. While his estimate of 2,500 games started is almost certainly exaggerated, he may very well have thrown more professional innings than anyone ever has. He pitched professionally for 22 years before Major League teams would allow him to join a roster; he would have done so with more financial incentive to pitch frequently than any reasonable person could expect.

Considering the wear and tear on his arm, expectations even for such a legendary pitcher would need to be very tempered for his performance in his 40’s. After all, only 67 pitchers have ever even thrown 100 innings after they turned 40.

Of those 67, Paige ranks 8th in ERA- (81). Of the seven in front of him, three were knuckleball pitchers, one pitched before World War I, and one has been held out of the Hall of Fame due to steroid allegations (whether fair or not).

In the course of his first 4 seasons, 128 pitchers threw at least 300 innings. Of those 128, Paige’s strikeout rate ranked 2nd. At the end of that four-year stretch, he was 46. 46 year olds don’t strike players out. You have to go down 20 spots to find a pitcher who was less than 10 years younger than Paige.

After Paige had been out of the majors for over a decade, the Kansas City A’s had him throw for them when he was 59 years old. He threw three scoreless innings, allowing only one runner.

It’s easy to wish we had better stats of Satchel Paige’s early career. It could help us establish if he really had, as he said, over 20 no-hitters. We could definitively say whether or not he had 250 shutouts, 2000 wins, 21 straight wins, or over 60 consecutive scoreless innings, all of which he claimed to be true. It’s quite likely all those numbers are fabricated. It’s possible that many of the stories about his pitching are exaggerated.

But when Satchel Paige was finally given a chance to prove himself, he blew away any realistic expectations anyone could have set for him. No one will ever know what stories about Satchel Paige really happened, or how trustworthy people’s observations of him were. But 25 years into his career, at years in his life few ever spend pitching professionally, he gave us a reason to believe them.


Foundations of Batting Analysis – Part 2: Real and Indisputable Facts

In Part 1 (http://www.fangraphs.com/community/foundations-of-batting-analysis-part-1-genesis/), we examined how the hit became the first estimate of batting effectiveness in 1867 leading to the creation of the modern batting average in 1871. In Part 2, we’ll look more closely at what the hit actually measures and the inherent flaws in its estimation.

Over the century-and-a-half since Henry Chadwick wrote “The True Test of Batting,” it has been a given that if the batter makes contact with the ball, he has only shown “effectiveness” when that contact results in a clean hit – anything else is a failure. At first glance, this may seem somewhat reasonable. The batter is being credited for making contact with the ball in such a way that it is impossible for the defense to make an out, an action that must be indicative of his skill. If the batter makes an out, or reaches base due to a defensive error that should have resulted in an out, it was due to his ineffectiveness – he failed the “test of skill.”

This is an oversimplified view of batting.

By claiming that a hit is entirely due to the success of the batter and that an out, or reach on error, is due to his failure, we make fallacious assumptions about the nature of the game. Consider all of the factors involved in a play when a batter swings away. The catcher calls for a specific pitch with varying goals in mind depending on the batter, the state of the plate appearance, and the game state. The pitcher tries to pitch the ball in a way that will accomplish the goals of the catcher.[i] The batter attempts to make contact with the ball, potentially with the intent to hit the ball into the air or on the ground, or in a specific direction. The fielders aim to use the ball to reduce the ability of the batting team to score runs, either by putting out baserunners or limiting their ability to advance bases. The baserunners react to the contact and try to safely advance on the bases without being put out. All the while, the dirt, the grass, the air, the crowd, and everything else that can have some unmeasurable effect on the outcome of the play, are acting in the background. It is misleading to suggest that when contact between the bat and ball results in a hit, it must be due to “effective batting.”

Let’s look at some examples. Here is a Stephen Drew pop up from the World Series last year:

Here is a Michael Taylor line drive from 2011:

The contact made by Taylor was certainly superior to that made by Drew, reflecting more batting effectiveness in general, but due to fielding effectiveness—and luck—Taylor’s ball resulted in an out while Drew’s resulted in a hit.

Here are three balls launched into the outfield:

In each case, the batter struck the ball in a way that could potentially benefit his team, but varying levels of performance by the fielders resulted in three different scoring outcomes: a reach on error, a hit, and an out, respectively.

Here are a pair of a groundballs:

Results so dramatically affected by luck and randomness reflect little on the part of the batter, and yet we act as if Endy Chavez was effective and Kyle Seager was ineffective.

Home runs may be considered the ultimate success of a batter, but even they may not occur simply due to batting effectiveness. Consider these three:

Does a home run reflect more batting effectiveness when it lands in front of the centerfielder, when it’s hit farther than humanly possible,[ii] or when it doesn’t technically get over the wall?

The hit, at its core, is an estimate of value. Every time the ball is put into play in fair territory, some amount of value is generated for the batter’s team. When an out is made, the team has less of an opportunity to score runs: negative value. When an out is not made, the team has a greater opportunity to score runs: positive value. Hits estimate this value by being counted when an out is not made and when certain other aspects of the play conform to accepted standards of batting effectiveness, i.e. the 11 subsections of Rule 10.05 of the Official Baseball Rules that define what are and are not base hits, as well as the eight subsections of Rule 10.12.(a) that define when to charge an error against a fielder.

Rule 10.05 includes the phrase “scorer’s judgment” four times, and seven of the 11 parts of the rule involve some form of opinion on the part of the scorer to determine whether or not to award a hit. All eight subsections of Rule 10.12.(a) that define when to charge an error against a fielder are entirely subjective. Not only is the hit as an estimate of batting effectiveness muddled by the forces in the game that are outside of the batter’s control, but the decision whether to award a hit or an error can be based on subjective opinion. Imagine you’re the official scorer; are these hits or errors?

If you agreed with the official scorer on the last play, that Ortiz reached on a defensive error, you were “wrong” according to MLB, which overturned the call and awarded Ortiz a hit retroactively (something I doubt would have occurred if Darvish had completed the no-hitter). Despite Chadwick’s claim in 1867 that “there can be no mistake about the question of a batsman’s making his first base…whether by effective batting, or by errors in the field,” uncertainty in how to designate the outcome of a play is all too common, and not a modern phenomenon.

In an article in the 6 April 1916 issue of the Sporting News, John H. Gruber explains that before scoring methods became standardized in 1880, the definition of a hit could vary wildly from scorer to scorer.

“It was evidently taken for granted that everybody knew a base hit when he saw one made…a group of ‘tight’ and another of ‘open’ scorers came into existence.

‘Tight’ were those who recognized only ‘clean’ hits, when the ball was not touched by a fielder either on the ground or in the air. Should the fielder get even the tip of his fingers on the ball, though compelled to jump into the air, no hit was registered; instead an error was charged.

The ‘open’ contingent was more liberal. To it belonged the more experienced scorers who used their judgment in deciding between a hit and an error, and always in favor of the batter. They gave the batter a hit and insisted that he was entitled to a hit if he sent a ‘hot’ ball to the short-stop or the third baseman and the ball be only partly stopped and not in time to throw it to a bag.

Some of them even advocated the ‘right field base hit,’ which at present is scored a sacrifice fly. ‘For instance,’ they said, ‘a man is on third base and the batsman, in order to insure the scoring of the run by the player on third base, hits a ball to right field in such a way that, while it insures his being put out himself, sends the base runner on third home, and scores a run. This is a play which illustrates ”playing for the side” pretty strikingly, and it seems to us that such a hit should properly come under the category of base hits.’”

While official scorers have since become more consistent in how they score a game, there will never be a time when hits will not involve a “scorer’s judgment” on some level. As Isaac Ray wrote in the North American Review in 1856, building statistics based on opinion or “shrewd conjecture” leads to “no real advance in knowledge”:

“The common fallacy that, imperfect as they are, they still constitute an approximation of the truth, and therefore are not to be despised, is founded upon a total misconception of the proper objects of statistical inquiry, as well as of the first rules of philosophical induction. Facts—real and indisputable facts—may serve as a basis for general conclusions, and the more we have of them the better; but an accumulation of errors can never lead to the development of truth. Of course we do not deny that, in a mere matter of quantity, the errors on one side generally balance the errors on the other, and thus the value of the result is not materially affected. What we object to is the attempt to give a statistical form to things more or less doubtful and subjective.”

Hits, these “approximations of the truth,” have been used as the basic measurement of success for batters for the entire history of the professional game. However, in the 1950s, Branch Rickey, the general manager of the Los Angeles Dodgers, and Allan Roth, his statistical man-behind-the-curtain, acknowledged that a batter could provide value to his team outside of just swinging the bat. On August 2, 1954, Life magazine printed an article titled “Goodby to Some Old Baseball Ideas” in which Rickey wrote on methods used to estimate batting effectiveness:

“…batting average is only a partial means of determining a man’s effectiveness on offense. It neglects a major factor, the base on balls, which is reflected only negatively in the batting average (by not counting it as a time at bat). Actually walks are extremely important…the ability to get on base, or On Base Average, is both vital and measurable.”

While the concept didn’t propagate widely at first, by 1984 on base average (OBA) had become one of three averages, along with batting average (BA) and slugging average (SLG), calculated by the official statisticians for the National and American Leagues. These averages are currently calculated as follows:

BA = Hits/At-Bats = H/AB

OBA = (Hits + Walks + Times Hit by Pitcher) / (At-Bats + Walks + Times Hit by Pitcher + Sacrifice Flies) = (H + BB + HBP) / (AB + BB + HBP + SF)

SLG = Total Bases on Hits / At-Bats = TB/AB

The addition of on base average as an official statistic was due in large part to Pete Palmer who began recording the average for the American League in 1979. Before he began tracking these figures, Palmer wrote an article published in the Baseball Research Journal in 1973 titled, “On Base Average for Players,” in which he examined the OBA of players throughout the history of the game. To open the article, he wrote:

“There are two main objectives for the hitter. The first is to not make an out and the second is to hit for distance. Long-ball hitting is normally measured by slugging average. Not making an out can be expressed in terms of on base average…”

While on base average has proven popular with modern sabermetricians, it does not actually express the rate at which a batter does not make an out, as claimed by Palmer. Rather, it reflects the rate at which a batter does not make an out when showing accepted forms of batting effectiveness; it is a modern take on batting average. The suggestion is that when a batter reaches base due to a walk or being hit by a pitch he has shown effectiveness, but when he reaches on interference, obstruction, or an error he has not.

Here are a few instances of batters reaching base without swinging.

What effectiveness did the batter show in the first three plays that he failed to show in the final play?

In the same way that there are a litany of forces in play when a batter tries to make contact with the ball, reaching base due to non-swinging events requires more than just batting effectiveness. Reaching on catcher’s interference may not require any skill on the part of the batter, but there are countless examples of batters being walked or hit by a pitch that similarly reflect no batting skill. A batter may be intentionally walked because they are greatly skilled and the pitcher, catcher, or manager fears what the batter may be able to do if he makes contact, but in the actual plate appearance itself, that rationalization is inconsequential. If we’re going to estimate the effectiveness of a batter in a plate appearance, only what occurs during the plate appearance is relevant.

Inconsistency in when we decide to reward batters for reaching base has limited our ability to accurately reflect the value produced by batters. We intentionally exclude certain results and condemn others as failures despite the batter’s team benefiting from the outcomes of these plays. Instead of restricting ourselves to counting only the value produced when the batter has shown accepted forms of effectiveness, we should aim to accurately reflect the total value that is produced due to a batter’s plate appearance. We can then judge how much of the value we think was due to effective batting and how much due to outside forces, but we need to at least set the baseline for the total value that was produced.

To accomplish this goal, I’d like to repurpose the language Palmer used to begin “On Base Averages for Players”:

There are two main objectives for the batter. The first is to not make an out and the second is to advance as many bases as possible.

“Hitters” aim to “hit for distance” as it will improve their likelihood of advancing on the bases. “Batters” aim to do whatever it takes to advance on the bases. Hitting for distance may be the best way to accomplish this, in general, but batters will happily advance on an error caused by an errant throw from the shortstop, or a muffed popup in shallow right field, or a monster flyball to centerfield.

Unlike past methods that estimate batting effectiveness, there will be no exceptions or exclusions in how we reflect a batter’s rate at accomplishing these objectives. Our only limitation will be that we will restrict ourselves to those events that occur due to the action of the plate appearance. By this I mean that baserunning and fielding actions that occur following the initial result of the plate appearance are not to be considered. For instance, events like a runner advancing due to the ball being thrown to a different base, or a secondary fielding error that allows runners to advance, are to be ignored.

The basic measurement of success in this system is the reach (Re), which is credited to a batter any time he reaches first base without causing an out.[iii] A batter could receive credit for a reach in a myriad of ways: on a clean hit,[iv] a defensive error, a walk, a hit by pitch, interference, obstruction, a strikeout with a wild pitch, passed ball, or error, or even a failed fielder’s choice. The only essential element is that the batter reached first base without causing an out. The inclusion of the failed fielder’s choice may seem counterintuitive, as there is an implication that the fielder could have made an out if he had thrown the ball to first base, but “could” is opinion rearing its ugly head and this statistic is free of such bias.

The basic average resulting from this counting statistic is effective On Base Average (eOBA), which reflects the rate at which a batter reaches first base without causing an out per plate appearance.

eOBA = Reaches / Plate Appearances = Re/PA

Note that unlike the traditional on base average, all plate appearances are counted, not just at-bats, walks, times hit by the pitcher, and sacrifice flies. MLB may be of the opinion that batters shouldn’t be punished when they “play for the side” by making a sacrifice bunt, but that opinion is irrelevant for eOBA; the batter caused an out, nothing else matters.[v]

eOBA measures the rate at which batters accomplish their first main objective: not causing an out. To measure the second objective, advancing as many bases as possible, we’ll define the second basic measurement of success as total bases reached (TBR), which reflects the number of bases to which a batter advances due to a reach.[vi] So, a walk, a single, and catcher’s interference, among other things, are worth one TBR; a two-base error and a double are worth two TBR; etc.

The average resulting from TBR is effective Total Bases Average (eTBA), which reflects the average number of bases to which a batter advances per plate appearance.

eTBA = Total Bases Reached / Plate Appearances = TBR/PA

We now have ways to measure the rate at which a batter does not cause an out and how far they advance on average in a plate appearance. While these are the two main objectives for batters, it can be informative to know similar rates for when a batter attempts to make contact with the ball.

To build such averages, we need to first define a statistic that counts the number of attempts by a batter to make contact, as no such term currently exists. At-bats come close, but they have been altered to exclude certain contact events, namely sacrifices. For our purposes, it is irrelevant why a batter attempted to make contact, whether to sacrifice himself or otherwise, only that he did so. We’ll define an attempt-at-contact (AC) as any plate appearance in which the batter strikes out or puts the ball into play. The basic unit to measure success when attempting to make contact is the reach-on-contact (C), for which a batter receives credit when he reaches first base by making contact without causing an out. A strikeout where the batter reaches first base on a wild pitch, passed ball, or error counts as a reach but it does not count as a reach-on-contact, as the batter did not reach base safely by making contact.

The basic average resulting from this counting statistic is effective Batting Average (eBA), which reflects the rate at which a batter reaches first base by making contact without causing an out per attempt-at-contact.

eBA = Reaches-on-Contact / Attempts-at-Contact = C/AC

Finally, we’ll define total bases reached-on-contact (TBC) as the number of bases to which a batter advances due to a reach-on-contact. The average resulting from this is effective Slugging Average (eSLG), which reflects the average number of bases to which a batter advances per attempt-at-contact.

eSLG = Total Bases Reached-on-Contact / Attempts-at-Contact = TBC/AC

The two binary effective averages—eOBA and eBA—are the most basic tools we can build to describe the value produced by batters. They answer a very simple question: was an out caused due to the action in the plate appearance. There are no assumptions made about whose effectiveness caused an out to be made or not made, we only note that it occurred during a batter’s plate appearance; these are “real and indisputable facts.”

The value of these statistics lies not only in their reflection of whether a batter accomplishes his first main objective, but also in their linguistic simplicity. Miguel Cabrera led qualified batters with a .442 OBA in 2013. This means that he reached base while showing batting effectiveness (i.e. through a hit, walk, or hit by pitch) in 44.2 percent of the opportunities he had to show batting effectiveness (i.e. an at-bat, a walk, a hit by pitch, or a sacrifice fly). That’s a bit of a mouthful, and somewhat convoluted. Conversely, Mike Trout led all qualified batters with a .445 eOBA in 2013, meaning he reached base without causing an out in 44.5 percent of his plate appearances. There are no exceptions that need to be acknowledged for plate appearances or times safely reaching base that aren’t counted; it’s simple and to the point.

The two weighted effective averages—eTBA and eSLG—depend on the scorer to determine which base the batter reached due to the action of the plate appearance, and thus reflect a slight level of estimation. As we want to differentiate between actions caused by a plate appearance and those caused by subsequent baserunning and fielding, it’s necessary for the scorer to make these estimations. This process at least comes with fewer difficulties, in general, than those that can arise when scoring a hit or an error. No matter what we do, official scorers will always be a necessary evil in the game of baseball.

While I won’t get into any real analysis with these statistics yet, accounting for all results can certainly have a noticeable effect on how we may perceive the value of some players. For example, an average batter last season had an OBA of .318 with an eOBA of .325. Norichika Aoki was well above average with a .356 OBA last season, but by accounting for the 16 times he reached base “inefficiently,” he produced an even more impressive .375 eOBA. While he was ranked 37th among qualified batters in OBA, in the company of players like Marco Scutaro and Jacoby Ellsbury, he was 27th among qualified batters in eOBA, between Buster Posey and Jason Kipnis; a significant jump.

In the past, we have only cared about how many total bases a batter reached when he puts the ball into play, which is a disservice to those batters who are able to reach base at a high rate without swinging. Joey Votto had an eSLG of .504 last season – 26th overall among qualified batters. However, his eTBA, which accounts for the 139 total bases he reached when not making contact, was .599 – 7th among qualified batters.

This is certainly not the first time that such a method of tracking value production has been proposed, but it never seems to gain any traction. The earliest such proposal may have come in the Cincinnati Daily Enquirer on 14 August 1876, when O.P. Caylor suggested that there was a strong probability that “a different mode of scoring will be adopted by the [National] League next year”:

“Instead of the base-hit column will be the first base column, in which will be credited the times a player reached first base in each game, whether by an error, called balls, or a safe hit. The intention is to thereby encourage not only safe hitting, but also good first-base running, which has of late sadly declined. Players are too apt, under the present system of averages, to work only for base hits, and if they see they have not made one, they show an indifference about reaching first base in advance of the ball. The new system will make each member of a club play for the club, and not for his individual average.”

Of course, this new mode was not adopted. However, the National League did count walks as hits for a single season in 1887; an experiment that was widely despised and abandoned following the end of the season.

It has been 147 years since Henry Chadwick introduced the hit and began the process of estimating batting effectiveness. Maybe it’s time we accept the limitations of these estimations and start crediting batters for “reaching first base in advance of the ball” and advancing as far as possible, no matter how they do so.


 

[i] Whether it’s the catcher, pitcher, or manager who ultimately decides on what pitch is to be thrown is somewhat irrelevant. The goal of the pitching battery is to execute pitches that offer the greatest chance to help the pitching team, whether that’s by trying to strike out the batter, trying to induce weak or inferior contact, or trying to avoid the potential for any contact whatsoever.

[ii] Technically, it only had a true distance of 443 feet—not terribly deep in the grand pantheon of home runs—but the illusion works for me on many levels.

[iii] The fundamental principle of this system, that a reach is credited when an out doesn’t occur due to the action of the plate appearance, means that some plays that end in outs are still counted as reaches. In this way, we don’t incorrectly subtract value that was lost due to fielding and baserunning following the initial event. For instance, if a batter hits the ball cleanly into right field and safely reaches first base, but the right fielder throws out a baserunner advancing from first to third, the batter would still receive credit for a reach. Similarly, if a batter safely reaches first base but is thrown out trying to advance to second base, for consistency, this is considered a baserunning mistake and Is still treated as a reach of first base.

[iv] There is one type of hit that is not counted as a reach. When a batted ball hits a baserunner, the batter receives credit for a hit while an out is recorded, presumably because it is considered an event that reflects batting effectiveness. In this system, that event is treated as an out due to the action of the plate appearance—a failure to safely reach base.

[v] Sacrifice hits may be strategically valuable events, as the value of the sacrifice could be worth more than the average expected value that the batter would create if swinging away, but they are still negative events when compared to those that don’t end in an out—a somewhat obvious point, I hope. The average sacrifice hit is significantly more valuable than the average out, which we will show more clearly in Part III, but for consistency in building these basic averages, it’s only logical to count them as what they are: outs.

[vi] There are occasionally plays where a batter hits a groundball that causes a fielder to make a bad throw to first, in which the batter is credited with a single and then an advance to second on the throwing error. As the fielding play is part of the action of the plate appearance—it occurs directly in response to the ball being put into play—the batter would be credited with two TBR for these types of events.


 

I’ve included links to spreadsheets containing the leaders, among qualified batters, for each effective average, as well the batters with the largest difference between their effective and traditional averages, for comparison. Additionally, the same statistics have been generated for each team along with the league-wide averages.

2013 – Effective Averages for Qualified Players

2013 – Largest Difference Between Effective and Traditional Averages for Qualified Players

2013 – Effective Averages for Teams and Leagues


Ben Revere: Frustratingly Frustrating and False Hope-Inducing

I can picture it so clearly – Ruben Amaro Jr., on a cold December day in 2012, finalizing a trade with the Twins, patting himself on the back while sipping on the finest scotch in the office. He just cashed in on Vance Worley’s irrationally high stock and added a top-of-the-order center fielder, filling both a defensive void and a lineup void – one that could push the impatient Jimmy Rollins out of the leadoff spot. That player, little Ben Revere, also happened to be one of the fastest in baseball. What was there to like about need-filling Revere in the winter of 2012?

Of the outfielders who logged 1,500 innings from 2011-2012, only four of them had higher UZR/150.

Name Inn UZR/150
Josh Reddick
1881 19.3
Jason Heyward 2346 18.3
Peter Bourjos 1771 17.5
Chris Young 2098.1 15.9
Ben Revere 1987 15.7
Jacoby Ellsbury 1969.2 13.9

And only Jason Heyward posted a higher RngR.

Name Inn ARM RngR
Jason Heyward 2346 -1 32.6
Ben Revere 1987 -2.4 27.4
Josh Reddick 1881 5.4 24.2
Chris Young 2098.1 1.1 23.5
Jacoby Ellsbury 1969.2 -4.7 22

Of the 262 players with 600+ PA from 2011-2012, only Marco Scutaro and Juan Pierre posted better Swinging Strike rates.

Name SwStr%
Marco Scutaro 1.80%
Juan Pierre 2.40%
Ben Revere 3.00%
Jeff Keppinger 3.00%
Brett Gardner 3.00%
Denard Span 3.00%

Over those two seasons in Minnesota (241 games), Revere stole 74 bases and posted a WAR of 4.7.

He was an incredibly fast, rangy center-fielder who put the ball in play and walked just enough to give his speed a chance to create extra bases. He had holes in his game, namely arm strength and a complete lack of XBH potential. Still, I was all-in.

Fast forward roughly 18 months, and I’m sitting in Citizens Bank Park, watching a bases-loaded, zero out line drive sink into left-field, where Carl Crawford slides to make the catch. I turn my head back towards the infield and Ben Revere is running in the wrong direction. “Oh no!” I blurted out. He raced back to third base to avoid being doubled-up instead of racing home on a tag.

I sat and watched Dee Gordon – he of speed and contact – foul off numerous pitches, eventually work a walk, and then steal second as a formality – all while double-checking that Revere still has a 2.0% walk rate this season. He still does.

His arm strength remains Pierre-esque, but his route-running has transcended legendary status to a place I’ve never seen before. It’s a dark place. It’s trepidation to a Westerosian extent. It’s a gasp, then a breath-hold, followed by either a head shake or an exhale, depending on whether Revere recuperated from his disastrous first step and somehow worse second through tenth steps to catch the ball.

Ben Revere

Exhale.

Ben Revere

Shake head.

Of the 98 outfielders with 750 innings played from 2013-2014, Revere’s panic-inducing time in Philadelphia, 69 of them have posted better UZR/150 than Revere.

His RngR has gone from 27.4 as a Twin to -1.8 as a Phillie. That’s good for a drop from 2nd in all of baseball to 62nd out of 98.

One of the fastest outfielders in the majors is posting a below average RngR. One of the best contact-makers in the majors is unwilling to work the count.

In a sports world of placing blame, where does it lie? Should the general manager have seen through the lofty defensive numbers and recognized flaws? Should Amaro Jr. have known this player won’t get on base enough to utilize his best asset? Do we blame the player for being unable to improve aspects of their game that have shown to be improvable over time?

Or do I blame myself, for looking at this player, looking at these numbers, and still seeing hope on the horizon? If only he could… If he just…

Revere’s 0.9 WAR in an injury-shortened 2013 season  – in which all of the aforementioned regressions occurred – gives hope for a 2-3 WAR per season player. He just needs to…


Is Bud Norris this Good?

Back in January, while everyone was still waiting around for the Orioles to sign a free agent, I wrote this post on what I thought about Bud Norris at the time. I came to the conclusion that Bud Norris pitched too few innings, gave up too many walks and home runs, and struggled too much against lefties to be an effective starter for the Orioles. I believed he was better suited to a bullpen role where his stuff would play up some and Buck could protect him against his horrendous splits. To date, Bud Norris has proven that diagnosis incorrect. Norris has posted a 3.58 ERA and averaged 6.27 innings per start and has been one of the better starters on the Orioles this year. I wanted to figure out what has made Bud better this year and see if he could continue his early season success.

The worst part of Norris’ game prior to this season were his bad splits against lefties. Last season, he had .387 wOBA against lefties, this seasons its a much more manageable .312. His HR/FB rate is down as well from last year’s second half and sitting at a more reasonable 10.9% this season down from 12.9% in the second half of 2013. Also, his walk rate is down to 6.8% this year while it sat in the second half of last season at 10.8%. However, Norris’ strikeouts are also down, he is striking out only 16.1% of batters this season while he struck out 23.0% of batters in the second half of 2013.  With the reduced walks and strikeouts Norris has been able to pitch longer into games this season.

Norris is not walking as many batters, controlling his home runs, improving against lefties, but he is also not missing as many bats as he used to. At first glance at some of the peripheral statistics one would say Bud Norris has been very lucky to this point in the season. The main reason being that his BABIP to date is .253. League average is somewhere around .300, meaning that 30% of balls put into play fall for hits. Norris is yielding hits at only a 25% rate on balls put into play. That would indicate some luck. However, I hate it when people simply list BABIP as reason for good or bad luck. It’s a decent indicator, but how balls are but into play matter most. Line drives fall for hits more often than ground balls and ground balls more often than fly balls.

Looking at the batted ball numbers, Norris has shown some  improvement. He has 20.5% line drive rate, a 43.0% ground ball rate, and a 36.4% fly ball rate. In the second half of 2013 he had rates of 22.7%, 39.5%, and 37.8% respectively. The reduction in line drives and increase in ground balls are good indicators that batters are barreling up the ball on Norris less than they did last season.  (Side note, while fly balls fall less often for hits than ground balls do, its better for a pitcher like Norris to have a higher GB% because he is home run prone and the Orioles infield defense is plus). All of his batted balls rates are around league average thus far into the season.

However, I would not be a good analyst if I did not tell you how he has gotten batters to make weaker contact against him. Looking at the tape reveals no major mechanical changes, unlike Matt Wieters for instance. He is a little higher in his set this year, a little taller on the follow through, but nothing of particular note. Bud Norris is simply pitching better this season. Looking at the graph below, all of his pitches are lower in the zone (the middle of the graph is the middle of the plate). In particular, he is locating his change up nearly half an inch lower than he did last season, indicating he is sharpening his command and refining that pitch. Also, his whiff rate on his change up is double what is was last season (6.5% in 2013 and 13.24% to date in 2014). This improved change up is likely what is helping him against the lefties.

Brooksbaseball-Chart (1)

Furthermore, as seen in the graph below, Norris’ velocity has increased with every pitch this season. His average fastball velocity is 1 MPH faster than last season. Velocity is not everything, but higher velocities tend to, regardless of location, induce weaker contact and make it harder to make strong contact.

Brooksbaseball-Chart

There are good and bad sides to Bud Norris’ start to the 2014 season. The good side being a lowering walk rate, better results against the lefties, a controlled home run rate, increased velocity, and improved pitch location. The bad side of Norris’ start to date this year is the decreasing percentage of which he is getting hitters to swing and miss and his unsustainable low BABIP (even with the slightly improved batted ball numbers). His plummeting strike out rate and low BABIP even with the increased velocity and location are bad signs moving forward.  Bud Norris has been a better pitcher than he was last year, but I highly doubt he is a 3.5 ERA pitcher for the entirety of the 2014 season.


Samuel Deduno’s Crazy Fastball, Explained…Maybe

Like many other Twins fans, Samuel Deduno fascinates me. Thanks to some terrible starting pitching from the likes of Jason Marquis and Carl Pavano, Deduno earned his first extended stint in the major leagues in 2012. Immediately he dazzled fans and frustrated hitters with his success relative to the rest of the rotation despite his lack of control and self-proclaimed “crazy” fastball (he’s admitted he has no clue how to control its movement).

Deduno started the 2013 season in the minor leagues again, but received another call-up and improved significantly. He cut his walks from 15% to 9% of batters faced, and though his strikeouts also dropped, he combated that by allowing fewer home runs.

A few weeks ago, I chronicled Deduno’s fastball by showing that whether by accident or on purpose, Deduno’s moving fastball has become almost exclusively a cutter. In the past, it also would sink into the hands of a righthanded batter, but for whatever reason that movement has been eliminated.

Another fascinating fact of Deduno’s fastball is how much it drops, regardless of horizontal movement. MLB pitchers average nearly 9 inches of “rise” (relative to a spinless ball with no gravity), but Deduno’s gets only 2 inches of rise. His fastball drops more than some pitcher’s sliders and change-ups, which is pretty impressive. And while I’m no expert of the sciences, I may have figured out what the hell is going on with his fastball.

But first, a quick anecdote that gets passed around by some of the Twins part-time staff (I’m an usher at Target Field and have heard this story a few times). A few years ago, the Yankees were in town and Derek Jeter faced Deduno. The story goes that Jeter grounded out, and upon returning to the dugout he started shaking his fist violently while saying, “His fastball moves all over the (redacted) place!” He then went to a security guard and asked who was the pitcher on the mound. The security guard replied with, “He says, ‘My name is Sam Deduno and I have a crazy fastball.'”

But I digress. There are plenty of pitchF/X websites out there, and for reasons I cannot explain, I’ve become rather attached to the Texas Leaguers website. I typically pay the most attention to velocity and movement, but something caught my eye the last time I was drooling over Deduno’s data. There is also a number charted for every pitch that is titled “Spin Rate” and Deduno’s fastball spin rate is absolutely absurd. If you want to look at his cutting fastball since 2013, it’s 606 revolutions per minute. His sinking fastball is at 727.

“Yeah, okay, but why does that matter?” the casual observer would note. Well, unfortunately Texas Leaguers does not keep track of the average spin rate for fastballs and cutters, but I’ll post a variety of pitchers for context. All data is from the start of the 2013 season to the present.

Fastballs

Glen Perkins – 2,475

R.A. Dickey – 1,543

Josh Collmenter – 2,259

Rick Porcello – 1,867

Joe Saunders – 1,944

Cutters

Mariano Rivera – 1,580

Kevin Correia – 1,243

Dan Haren – 1,243

Andy Pettitte – 1,521

Scott Feldman – 958

Now can you understand why I said that Deduno’s fastball spin rate was absurd. Remember when I said that his pitch rises far less than the average MLB fastball? Well you might be aware of the Magnus effect, where the rotation of a ball affects its flight path. Now that it’s warming up outside, you have another excuse to inflate a beach ball. Go ahead and do it, then try to throw it overhand. Because of the backspin on the ball, no matter how hard you try to throw it straight forward, it will also rise. If you try to throw it underhand, it’s going to sink because of the topspin. Major league fastballs appear to rise because of the Magnus effect, causing it to resist the force of gravity. Because Deduno’s fastball has (relatively) little spin on it, the Magnus effect is reduced and it doesn’t rise as much.

Now why does Deduno’s fastball have so little spin? This is one of those cases where I wish I could see a slo-mo of his fastball, because I get the feeling that the ball might slip out of his hand a bit, kind of like Robert Coello’s “forkleball.” Speaking of which, let’s check out some of those pitches that aren’t supposed to spin very much.

Other Pitches

R.A. Dickey knuckleball – 901

Steven Wright knuckleball – 1,001

Robert Coello “forkleball” – 950 (it’s classified as about 4 different pitches and I found the average)

Ladies and gentlemen, here is why Samuel Deduno’s fastball is crazy. It supposedly spins less than noted knuckleballer R.A. Dickey’s famed knuckleball. I’ll admit that seems absurd (especially after watching the “forkleball” GIF above) but remember, that GIF is merely a highlight of one of the best Coello has thrown. He’s thrown many more that had far more spin than that pitch. The same is true of Dickey’s knuckleball; he’s thrown tons of them that failed to achieve a lack of spin. While my understanding is likely wrong, I still prefer to think that Samuel Deduno is practically firing 90 MPH knuckleballs at Kurt Suzuki and Josmil Pinto. That’s probably why Derek Jeter said it was like the ball was shaking as it came to home plate in the anecdote above, Deduno has no idea where his fastball is going, and why his control has been below average for his career.

Samuel Deduno’s crazy fastball, explained. Maybe.


Foundations of Batting Analysis – Part 1: Genesis

This was originally written as a single piece of research, but as it grew in length far beyond what I originally anticipated, I’ve broken it into three parts for ease of digestion. In each part, I have linked to images of the original source material when possible. There has been nothing quite as frustrating in researching the creation of baseball statistics as being misled by faulty citations, so I figured including actual copies of the original material would mitigate this issue for future researchers. Full bibliographic citations will be included for the entirety of the paper at the conclusion of Part III.

“[Statistics’] object is the amelioration of man’s condition by the exhibition of facts whereby the administrative powers are guided and controlled by the lights of reason, and the impulses of humanity impelled to throb in the right direction.”

–Joseph C. G. Kennedy, Superintendent of the United States Census, 1859

In a Thursday afternoon game in Marlins Park last season, Yasiel Puig faced Henderson Alvarez in the top of the fourth inning and demolished a first-pitch slider to straight-away center field. As Puig flipped his bat with characteristic flair and began to trot towards first base, remnants of the ball soared over the head of Justin Ruggiano and hit the highest point on the 16-foot wall, 418-feet away from home plate; Puig coasted into second base with a stand-up double.

Two months earlier, in another afternoon game, this time at Yankee Stadium, Puig hit the ball sharply onto the ground between Reid Brignac and second base causing it to roll into left-center field. Puig sprinted towards first base, rounding the bag hard before Brett Gardner was able to gather the ball. Gardner made a strong, accurate throw into second base, but it was a moment too late; Puig slid into second, safe with a double.

In MLB 13: The Show, virtual Yasiel Puig faced virtual Justin Verlander in Game Seven of the Digital World Series. Verlander had managed to get two outs in the inning, but the bases were loaded as Puig came to the plate. The Tiger ace reared back and threw the 100-mph heat the Dodger phenom was expecting. Puig began his swing but, at the moment of contact, there was a glitch in the game. Suddenly, Puig was standing on second base, all three baserunners had scored, and Verlander had the ball again; “DOUBLE” flashed on the scoreboard.

If the outcome is the same, is there any difference between a monster fly ball, a well-placed groundball, and a glitch in the matrix?

Analysis of batting presented over the past 150 years has suggested that the answer is no – a double is a double. However, with detailed play-by-play information compiled over the last few decades, we can show that the traditional concepts of the “clean hit” and “effective batting” have limited our ability to accurately measure value produced by batters. I’d like to begin by examining how the hit found its way into the baseball lexicon and how it has impacted player valuation for the entire history of the professional game.

The earliest account of a baseball game that included a statistical chart, the first primordial box score, appeared in the 22 October 1845 issue of the New York Morning News edited by J. L. O’Sullivan. This “abstract” recorded two statistics—runs scored and “hands out”—for the eight players on each team (the number of players wasn’t standardized to nine until 1857). Runs scored was the same as it is today, while hands out counted the total number of outs a player made both as a batter and as a baserunner.

For the next two decades, statistical accounting of baseball games was limited to these two statistics and basic variations of them. Through the bulk of this period, the box score was little more than an addendum to the game story – a way to highlight specific contributions made by each player in a game. It wasn’t until 1859 that a music teacher turned sports journalist took the first steps in developing methods to examine the general effectiveness of batters.

Henry Chadwick had immigrated to Brooklyn from Exeter, England with his parents and younger sister a few weeks before his 13th birthday in 1837. He came from a family of reformists guided by the Age of Enlightenment. Henry’s grandfather, Andrew, was a friend and follower of John Wesley, who helped form a movement within the Church of England in the mid-18th century aimed at combining theological reflection with rational analysis that became known as Methodism. Henry’s father, James, spent time in Paris in the late-18th century in support of the French Revolution and stressed the importance of education to learn how to “distinguish truth from error to combat the evil propensities of our nature.” Henry’s half-brother, Edwin, 24 years Henry’s senior, was a disciple of Jeremy Bentham, whose philosophies on reason, efficiency, and utilitarianism inspired Edwin’s work on improving sanitation and conditions for the poor in England, eventually earning him knighthood. This rational approach to reform that was so prevalent in his family will be easily seen in Henry Chadwick’s future promotion of baseball.

Chadwick’s work as a journalist began at least as early as 1843 with the Long Island Star, when he was just 19 years old, but he worked primarily as a music teacher and composer as a young adult. By the 1850s, his focus had shifted primarily to journalism. While his early writing was on cricket, he eventually shifted to covering baseball in assorted New York City and Brooklyn periodicals. Retrospectively, Chadwick described his initial interest in promoting baseball, and outdoor games and sports in general, as a way to improve public health, both physically and psychologically. In The Game of Base Ball, published in 1868, Chadwick recounted a thought he had had over a decade earlier:

“…that from this game of ball a powerful lever might be made by which our people could be lifted into a position of more devotion to physical exercise and healthful out-door recreation than they had hitherto, as a people, been noted for.”

From his writing on baseball during the 1850s, Chadwick became such a significant voice for the sport that, in 1857, he was invited to suggest amendments at the meeting of the “Committee to Draft a Code of Laws on the Game of Base Ball” for a convention of delegates representing 16 baseball clubs (two of which were absent) based in and around New York City and Brooklyn. The Convention of 1857 laid down rules standardizing games played by those clubs, including setting the number of innings in a game to nine, the number of players on a side to nine, and the distance between the bases to 90 feet. The following year, another convention was held, now with delegates from 25 teams, which formed the first permanent organizing body for baseball: the National Association of Base Ball Players (NABBP).[i] The “Constitution,” “By-Laws,” and “Rules and Regulations of the Game of Base Ball” adopted by the NABBP for that year were printed in the 8 May 1858 issue of the New York Clipper.

As the rules were being unified among New York teams, the methods used to recount games were evolving. By 1856, early versions of the line score, an inning-by-inning tally of the number of runs scored by each team, were being tested in periodicals, like this one from the 9 August issue of the Clipper. On 13 June 1857, the Clipper included its first use of a traditional line score for the opening game of the season between the Knickerbockers and the Eagles.[ii] In August 1858, Chadwick—who by this time had become the Clipper’s baseball reporter—began testing out various other statistics, noting the types of outs each player was making and the number of pitches by each pitcher. A game on 7 August 1858, between the Resolutes and the Niagaras, featured 812 total pitches in eight innings before the game was called due to darkness.

In 1859, Chadwick conducted a seasonal analysis of the performance of baseball players—the first of its kind. In the 10 December issue of the Clipper, the Excelsior Club’s performance during the prior season was analyzed through a pair of charts titled, “Analysis of the Batting” and “Analysis of the Fielding.” Most notably, within the “Analysis of the Batting” were two columns, both titled “Average and Over.” These columns reflected the number of runs per game and outs per game by each player during the season – the forebears of batting average. The averages were written in the cricket style of X—Y, where X is the number of runs or outs per game divided evenly (the “average”) and Y is the remainder (the “over”). For instance, Henry Polhemus scored 31 runs in 14 games for the Excelsiors in the 1859 season, an average of 2—3 (14 divides evenly into 31 twice, leaving a remainder of 3). Runs and outs per game became standard inclusions in annual batting analyses over the next decade.

These seasonal averages marked a significant leap forward for baseball analysis, and yet, their foundation, runs and outs, was the same as that used for nearly every statistic in baseball’s brief history. It’s important to note that the baseball players and journalists covering the sport in this period all generally had a cricket background.[iii] In cricket, there are three possible outcomes on any pitch: a run is scored, an out is made, or nothing changes. When the batter successfully moves from base to base in cricket, he is scoring a run; there are no intermediary bases states like those that exist in baseball. Consequently, the number of runs a cricket player scores tends to be a very accurate representation of the value he provided his team as a batter.

In baseball, batters rarely score due solely to their performance at the plate. Excluding outside-the-park home runs, successfully rounding the bases to score a run requires baserunning, fielding, help from teammates, and the general randomness that happens in games. It was 22 years after the appearance of that first box score in the New York Morning News before an attempt was made to isolate a player’s batting performance.

In June 1867, Chadwick began editing a weekly periodical called The Ball Players’ Chronicle – the first newspaper devoted “to the interest of the American game of base ball and kindred sports of the field.” To open the first issue on 6 June, a three-game series between the Harvard College Club and the Lowell Club of Boston was recounted. The deciding game, a 39-28 Harvard victory to win the “Championship of New England,” received a detailed, inning-by-inning recap of the events, followed by a box score. The primary columns of the chart featured runs and outs, as always. What was noteworthy about this box score, though, was the inclusion of a list titled “Bases Made on Hits,” reflecting the number of times each player reached first base on a clean hit. Writers had described batters reaching base on hits in their game accounts since the 1850s, but it was always just a rhetorical device to describe the action of the game. This was the first time anyone counted those occurrences as a measurement of batting performance.

Three months after this game account, in the 19 September issue of the Chronicle, Chadwick explained his rationale for counting hits in an editorial titled “The True Test of Batting”:

“Our plan of adding to the score of outs and runs the number of times…bases are made on clean hits will be found the only fair and correct test of batting; and the reason is, that there can be no mistake about the question of a batsman’s making his first base, that is, whether by effective batting, or by errors in the field…whereas a man may reach his second or third base, or even get home, through…errors which do not come under the same category as those by which a batsman makes his first base…

In the score the number of bases made on hits should be, of course, estimated, but as a general thing, and especially in recording the figures by the side of the outs and runs, the only estimate should be that of the number of times in a game on which bases are made on clean hits, and not the number of bases made.”

Taking his own advice, Chadwick printed “the number of times in a game on which bases are made on clean hits” side-by-side with runs and outs for the first time in the same 19 September issue of the Chronicle.[iv] Over the next few months, most major newspapers covering baseball were including hits in the main body of their box scores as well. The hit had become baseball’s first unique statistic.

By 1868, hits had permeated the realm of averages. On 5 December of that year, the Clipper included a chart on the “Club Averages” for the Cincinnati Club.[v] In addition to listing runs per game and outs per game for each player, the chart included “Average to game of bases on hits,” the progenitor of the modern batting average. All three of these averages were listed in decimal form for the first time in the Clipper. A year later, on 4 December 1869, “Average total bases on hits to a game” appeared as well in the Clipper, the precursor to slugging average.

As hits per game became the standard measurement of “effective batting” over the next few seasons, H. A. Dobson of the Clipper noted an issue with this “batting average” in a letter he wrote to Nick E. Young, the Secretary of the Olympic Club in Washington D.C.—and future president of the National League— who would be attending the Secretaries’ Meeting of the newly formed National Association of Professional Base Ball Players (NAPBBP).[vi] The letter, which was published in the Clipper on 11 March 1871 was “on the subject of a new and accurate method of making out batting averages.”

Dobson was a strong proponent of using hits to form batting averages, noting that “times first base on clean hits…is the correct basis from which to work a batting average, as he who makes his first base by safe hitting does more to win a game than he who makes his score by a scratch. This is evident.” He notes, though, that measuring the average on a per-game basis does not allow for comparison of teammates, as the “members of the same nine do not have the same or equal chance to run up a good score,” and it does not allow the comparison of players across teams, “as the clubs seldom play an equal number of games.” Dobson continues:

“In view of these difficulties, what is the correct way of determining an average so that justice may be done to all players?

This question is quickly answered, and the method easily shown.

According to a man’s chances, so should his record be. Every time he goes to the bat he either has an out, a run, or is left on his base. If he does not go out he makes his base, either by his own merit or by an error of some fielder. Now his merit column is found in ‘times first base on clean hits,’ and his average is found by dividing his total ‘times first base on clean hits’ by his total number of times he went to the bat. Then what is true of one player is true of all…In this way, and in no other, can the average of players be compared…

It is more trouble to make up an average this way than up the other way. One is erroneous, one is right.”

At the end of the letter, Dobson includes a calculation, albeit for theoretical players, of hits per at-bat—the first time it was ever published.

Thus, the modern batting average was born.[vii]


[i] The Chicago Cubs can trace their lineage back to the Chicago White Stockings who formed in 1870 and are the lone surviving member of the NABBP. The Great Chicago Fire in 1871 destroyed all of their equipment and their new stadium, the Union Base-Ball Grounds, only a few months after it opened, holding them out of competition for two years. If not for the fire, the Cubs would be the oldest, continually-operating franchise in American sports. That honor instead goes to the Atlanta Braves which were founding members of the National Association of Professional Base Ball Players (NAPBBP) in 1871 as the Boston Red Stockings.

[ii] Though the game was described as the “first regular match of Base Ball played this season,” it did not abide by the rules set forth in the Convention of 1857 that occurred just a few months prior. Rather, the teams appear to have been playing under the 1854 rules agreed to by the Knickerbockers, Gothams, and Eagles where the winner was the first to score 21 runs.

[iii] The first known issue of cricket rules was formalized in 1744 in London, England and brought to America in 1754 by Benjamin Franklin, 91 years before William R. Wheaton and William H. Tucker drafted the Rules and Regulations of the Knickerbocker Base Ball Club, the first set of baseball rules officially adopted by a club. Years later, Wheaton claimed to have written rules for the Gotham Base Ball Club in 1837, on which the Knickerbocker rules were based, but there is no existing copy of those rules. Early forms of cricket and baseball were played well before each of their rules were officially adopted, but trying to put a start date on each game before the formal inception of its rules is effectively impossible.

[iv] There is an oft-cited article written by H. H. Westlake in the March 1925 issue of Baseball Magazine, titled “First Baseball Box Score Ever Published,” in which Westlake claims that Chadwick invented the modern box score, one that included runs, hits, put outs, assists, and errors, in a “summer issue” of the New York Clipper in 1859. However, the box score provided by Westlake doesn’t actually exist, at least not in the Clipper. For comparison, here is the Westlake box score printed side-by-side with a box score printed in the 10 September 1859 issue of the Clipper. While the players are listed in the same order, and the run totals are identical (and the total put outs are nearly identical), the other statistics are completely imaginary.

[v] This club, featuring the renowned Harry Wright, became the first professional club in the following season, 1869, when the NABBP began to allow professionalism.

[vi] The NAPBBP is more commonly known today as, simply, the National Association (NA). However, before the NAPBBP formed, the common name for the NABBP was also the National Association.  It seems somewhat disingenuous after the fact to call the later league the National Association, but I suppose it’s easier than saying all those letters.

[vii] I immediately take this back, but only on a technicality. “Hits per at-bat” is the modern form of batting average, but at-bats as defined by Dobson are not the same as what we use today. Dobson defined a time at bat as the number of times a batter makes an “out, a run, or is left on his base.” In the subsequent decades after the article was published, “times at bat” began to exclude certain events. Notably, walks were excluded beginning in 1877 (with a quick reappearance in 1887 when they were counted the same as hits), times hit by the pitcher were excluded in 1887, sacrifice bunts in 1894, catcher’s interference in 1907, and sacrifice flies in 1908 (though, sacrifice flies went in and out of the rules multiple times over the next few decades, and weren’t firmly excluded until 1954).