Archive for Research

Basic Machine Learning With R (Part 3)

Previous parts in this series: Part 1 | Part 2

If you’ve read the first two parts of this series, you already know how to do some pretty cool machine-learning stuff, but there’s still a lot to learn. Today, we will be updating this nearly seven-year-old chart featured on Tom Tango’s website. We haven’t done anything with Statcast data yet, so that will be cool. More importantly, though, this will present us with a good opportunity to work with an imperfect data set. My motto is “machine learning is easy — getting the data is hard,” and this exercise will prove it. As always, the code presented here is on my GitHub.

The goal today is to take exit velocity and launch angle, and then predict the batted-ball type from those two features. Hopefully by now you can recognize that this is a classification problem. The question becomes, where do we get the data we need to solve it? Let’s head over to the invaluable Statcast search at Baseball Savant to take care of this. We want to restrict ourselves to just balls in play, and to simplify things, let’s just take 2016 data. You can download the data from Baseball Savant in CSV format, but if you ask it for too much data, it won’t let you. I recommend taking the data a month at a time, like in this example page. You’ll want to scroll down and click the little icon in the top right of the results to download your CSV.

View post on imgur.com


Go ahead and do that for every month of the 2016 season and put all the resulting CSVs in the same folder (I called mine statcast_data). Once that’s done, we can begin processing it.

Let’s load the data into R using a trick I found online (Google is your friend when it comes to learning a new programming language — or even using one you’re already pretty good at!).

filenames <- list.files(path = "statcast_data", full.names=TRUE)
data_raw <- do.call("rbind", lapply(filenames, read.csv, header = TRUE))

The columns we want here are “hit_speed”, “hit_angle”, and “events”, so let’s create a new data frame with only those columns and take a look at it.

data <- data_raw[,c("hit_speed","hit_angle","events")]
str(data)

 

'data.frame':	127325 obs. of  3 variables:
 $ hit_speed: Factor w/ 883 levels "100.0","100.1",..: 787 11 643 ...
 $ hit_angle: Factor w/ 12868 levels "-0.01               ",..: 7766 1975 5158  ...
 $ events   : Factor w/ 25 levels "Batter Interference",..: 17 8 11 ...

Well, it had to happen eventually. See how all of these columns are listed as “Factor” even though some of them are clearly numeric? Let’s convert those columns to numeric values.

data$hit_speed <- as.numeric(as.character(data$hit_speed))
data$hit_angle <- as.numeric(as.character(data$hit_angle))

There is also some missing data in this data set. There are several ways to deal with such issues, but we’re just simply going to remove any rows with missing data.

data <- na.omit(data)

Let’s next take a look at the data in the “events” column, to see what we’re dealing with there.

unique(data$events)

 

 [1] Field Error         Flyout              Single             
 [4] Pop Out             Groundout           Double Play        
 [7] Lineout             Home Run            Double             
[10] Forceout            Grounded Into DP    Sac Fly            
[13] Triple              Fielders Choice Out Fielders Choice    
[16] Bunt Groundout      Sac Bunt            Sac Fly DP         
[19] Triple Play         Fan interference    Bunt Pop Out       
[22] Batter Interference
25 Levels: Batter Interference Bunt Groundout ... Sacrifice Bunt DP

The original classification from Tango’s site had only five levels — POP, GB, FLY, LD, HR — but we’ve got over 20. We’ll have to (a) restrict to columns that look like something we can classify and (b) convert them to the levels we’re after. Thanks to another tip I got from Googling, we can do it like this:

library(plyr)
data$events <- revalue(data$events, c("Pop Out"="Pop",
      "Bunt Pop Out"="Pop","Flyout"="Fly","Sac Fly"="Fly",
      "Bunt Groundout"="GB","Groundout"="GB","Grounded Into DP"="GB",
      "Lineout"="Liner","Home Run"="HR"))
# Take another look to be sure
unique(data$events)
# The data looks good except there are too many levels.  Let's re-factor
data$events <- factor(data$events)
# Re-index to be sure
rownames(data) <- NULL
# Make 100% sure!
str(data)

Oof! See how much work that was? We’re several dozen lines of code into this problem and we haven’t even started the machine learning yet! But that’s fine; the machine learning itself is the easy part. Let’s do that now.

library(caret)
inTrain <- createDataPartition(data$events,p=0.7,list=FALSE)
training <- data[inTrain,]
testing <- data[-inTrain,]

method <- 'rf' # sure, random forest again, why not
# train the model
ctrl <- trainControl(method = 'repeatedcv', number = 5, repeats = 5)
modelFit <- train(events ~ ., method=method, data=training, trControl=ctrl)

# Run the model on the test set
predicted <- predict(modelFit,newdata=testing)
# Check out the confusion matrix
confusionMatrix(predicted, testing$events)

 

Prediction   GB  Pop  Fly   HR Liner
     GB    9059    5    4    1   244
     Pop      3 1156  123    0    20
     Fly      6  152 5166  367   457
     HR       0    0  360 1182    85
     Liner  230   13  449   77  2299

We did it! And the confusion matrix looks pretty good. All we need to do now is view it, and we can make a very pretty visualization of this data with the amazing Plotly package for R:

#install.packages('plotly')
library(plotly)
# Exit velocities from 40 to 120
x <- seq(40,120,by=1)
# Hit angles from 10 to 50
y <- seq(10,50,by=1)
# Make a data frame of the relevant x and y values
plotDF <- data.frame(expand.grid(x,y))
# Add the correct column names
colnames(plotDF) <- c('hit_speed','hit_angle')
# Add the classification
plotPredictions <- predict(modelFit,newdata=plotDF)
plotDF$pred <- plotPredictions

p <- plot_ly(data=plotDF, x=~hit_speed, y = ~hit_angle, color=~pred, type="scatter", mode="markers") %>%
    layout(title = "Exit Velocity + Launch Angle = WIN")
p

View post on imgur.com


Awesome! It’s a *little* noisy, but overall not too bad. And it does kinda look like the original, which is reassuring.

That’s it! That’s all I have to say about machine learning. At this point, Google is your friend if you want to learn more. There are also some great classes online you can try, if you’re especially motivated. Enjoy, and I look forward to seeing what you can do with this!


Hardball Retrospective – What Might Have Been – The “Original” 1999 White Sox

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the teams with the biggest single-season difference in the WAR and Win Shares for the “Original” vs. “Actual” rosters for every Major League organization. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com.

Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

AWAR – Wins Above Replacement for players on “actual” teams

AWS – Win Shares for players on “actual” teams

APW% – Pythagorean Won-Loss record for the “actual” teams

Assessment

The 1999 Chicago White Sox 

OWAR: 45.1     OWS: 289     OPW%: .504     (82-80)

AWAR: 28.5      AWS: 225     APW%: .466     (75-86)

WARdiff: 16.6                        WSdiff: 64  

The “Original” 1999 White Sox tied the Royals for second place in the American League Central, eight games behind the Indians. Robin Ventura (.301/32/120) established career-highs in batting average and RBI while earning his sixth Gold Glove Award at the hot corner. Randy Velarde (.317/16/76) rapped 200 base knocks and set personal-bests in almost every offensive category. Mike Cameron drilled 34 doubles and pilfered 38 bags. Harold Baines (.312/25/103) topped the century mark in RBI for the third time in his career during his age-40 season. Ray Durham registered 109 tallies and swiped 34 bags. Magglio Ordonez (.301/30/117) scored 100 runs and merited his first All-Star invitation. Frank E. Thomas clubbed 36 two-baggers and delivered a .305 BA. Chris Singleton (.300/17/72) placed sixth in the AL Rookie of the Year balloting and Paul Konerko contributed 24 dingers and 81 ribbies for the “Actuals”.

Frank E. Thomas rated tenth among first basemen according to “The New Bill James Historical Baseball Abstract” top 100 player rankings. “Original” White Sox chronicled in the “NBJHBA” top 100 ratings include Robin Ventura (22nd-3B) and Harold Baines (42nd-RF).

  Original 1999 White Sox                          Actual 1999 White Sox

STARTING LINEUP POS OWAR OWS STARTING LINEUP POS OWAR OWS
Carlos Lee LF -0.04 10.36 Carlos Lee LF -0.04 10.36
Mike Cameron CF 3.63 21.44 Chris Singleton CF 2.61 16.33
Magglio Ordonez RF 1.7 18.56 Magglio Ordonez RF 1.7 18.56
Harold Baines DH 1.7 12.96 Frank E. Thomas DH 2.2 17.07
Frank E. Thomas 1B/DH 2.2 17.07 Paul Konerko 1B 1.45 14.68
Randy Velarde 2B 5.23 24.19 Ray Durham 2B 3.63 20.45
Liu Rodriguez SS/2B -0.12 1.41 Mike Caruso SS -2.58 4.25
Robin Ventura 3B 5.1 28.27 Greg Norton 3B 0.06 12.36
Mark Johnson C 0.28 6.12 Brook Fordyce C 1.59 11.45
BENCH POS OWAR OWS BENCH POS OWAR OWS
Ray Durham 2B 3.63 20.45 Mark Johnson C 0.28 6.12
Greg Norton 3B 0.06 12.36 Craig Wilson 3B -0.38 4.06
Olmedo Saenz 3B 1.35 8.68 Darrin Jackson LF -0.05 2.68
Craig Grebeck 2B 0.82 4.39 Brian Simmons LF -0.15 1.76
Craig Wilson 3B -0.38 4.06 Liu Rodriguez 2B -0.12 1.41
Brian Simmons LF -0.15 1.76 Jeff Liefer 1B -0.6 0.91
Jeff Liefer 1B -0.6 0.91 McKay Christensen CF -0.27 0.47
Norberto Martin 2B 0.09 0.44 Jason Dellaero SS -0.39 0.32
Jason Dellaero SS -0.39 0.32 Josh Paul C -0.09 0.27
Josh Paul C -0.09 0.27 Jeff Abbott LF -0.73 0.18
Robert Machado C -0.08 0.22
Chris Tremie C -0.18 0.18
Jeff Abbott LF -0.73 0.18
Frank Menechino SS -0.08 0.14
John Cangelosi LF -0.06 0.02

Mike Sirotka (11-13, 4.00) and James Baldwin (12-13, 5.00) labored through their second seasons in the Sox rotation. Alex Fernandez supplied a 7-8 record with a 3.38 ERA after missing the entire 1998 campaign due to injury. Bob Wickman notched 37 saves with an ERA of 3.39 for the “Originals” while Keith Foulke (2.22, 9 SV) and Bob Howry (3.59, 28 SV) secured late-inning leads for the “Actuals”.

  Original 1999 White Sox                       Actual 1999 White Sox 

ROTATION POS OWAR OWS ROTATION POS AWAR AWS
Mike Sirotka SP 3.94 13.5 Mike Sirotka SP 3.94 13.5
Alex Fernandez SP 3.34 10.47 James Baldwin SP 2.19 9.47
James Baldwin SP 2.19 9.47 Jim Parque SP 1.26 6.82
Brian Boehringer SP 1.64 6.91 Kip Wells SP 0.79 2.93
Jim Parque SP 1.26 6.82 Jaime Navarro SP -1.15 2.16
BULLPEN POS OWAR OWS BULLPEN POS AWAR AWS
Bob Wickman RP 1.33 10.19 Keith Foulke RP 3.86 16.7
Al Levine RP 0.77 6.84 Bob Howry RP 0.61 10.06
Pedro Borbon RP 0.36 4.11 Sean Lowe RP 1.58 7.94
Buddy Groom RP -0.27 3.49 Bill Simas RP 0.68 6.46
Steve Schrenk RP 0.54 3.04 Carlos Castillo SW 0.05 1.45
Kip Wells SP 0.79 2.93 John Snyder SP -0.97 1.22
Scott Radinsky RP 0 2.35 Tanyon Sturtze SP 0.48 0.91
Jason Bere SP -0.6 1.6 Pat Daneker SP 0.23 0.82
Carlos Castillo SW 0.05 1.45 Jesus Pena RP -0.27 0.42
Pat Daneker SP 0.23 0.82 Joe Davenport RP 0.13 0.25
Aaron Myette SP 0 0.11 Aaron Myette SP 0 0.11
Chad Bradford RP -0.5 0 Bryan Ward RP -1.15 0.09
John Hudek RP -1.04 0 Chad Bradford RP -0.5 0
David Lundquist RP -0.74 0 Scott Eyre RP -0.66 0
Jack McDowell SP -0.36 0 David Lundquist RP -0.74 0
Nerio Rodriguez RP -0.16 0 Todd Rizzo RP -0.11 0

 

Notable Transactions

Robin Ventura 

October 23, 1998: Granted Free Agency.

December 1, 1998: Signed as a Free Agent with the New York Mets. 

Randy Velarde

January 5, 1987: Traded by the Chicago White Sox with Pete Filson to the New York Yankees for Mike Soper (minors) and Scott Nielsen.

December 23, 1994: Granted Free Agency.

April 12, 1995: Signed as a Free Agent with the New York Yankees.

November 2, 1995: Granted Free Agency.

November 21, 1995: Signed as a Free Agent with the California Angels.

October 23, 1998: Granted Free Agency.

December 7, 1998: Signed as a Free Agent with the Anaheim Angels.

Mike Cameron

November 11, 1998: Traded by the Chicago White Sox to the Cincinnati Reds for Paul Konerko. 

Harold Baines

July 29, 1989: Traded by the Chicago White Sox with Fred Manrique to the Texas Rangers for Wilson Alvarez, Scott Fletcher and Sammy Sosa.

August 29, 1990: Traded by the Texas Rangers to the Oakland Athletics for players to be named later. The Oakland Athletics sent Joe Bitker (September 4, 1990) and Scott Chiamparino (September 4, 1990) to the Texas Rangers to complete the trade.

January 14, 1993: Traded by the Oakland Athletics to the Baltimore Orioles for Allen Plaster (minors) and Bobby Chouinard.

November 1, 1993: Granted Free Agency.

December 2, 1993: Signed as a Free Agent with the Baltimore Orioles.

October 20, 1994: Granted Free Agency.

December 23, 1994: Signed as a Free Agent with the Baltimore Orioles.

November 6, 1995: Granted Free Agency.

December 11, 1995: Signed as a Free Agent with the Chicago White Sox.

November 18, 1996: Granted Free Agency.

January 10, 1997: Signed as a Free Agent with the Chicago White Sox.

July 29, 1997: Traded by the Chicago White Sox to the Baltimore Orioles for a player to be named later. The Baltimore Orioles sent Juan Bautista (minors) (August 18, 1997) to the Chicago White Sox to complete the trade.

October 29, 1997: Granted Free Agency.

December 19, 1997: Signed as a Free Agent with the Baltimore Orioles.

Alex Fernandez 

December 7, 1996: Granted Free Agency.

December 9, 1996: Signed as a Free Agent with the Florida Marlins. 

Bob Wickman 

January 10, 1992: Traded by the Chicago White Sox with Domingo Jean and Melido Perez to the New York Yankees for Steve Sax.

August 23, 1996: Traded by the New York Yankees with Gerald Williams to the Milwaukee Brewers for a player to be named later, Pat Listach and Graeme Lloyd. The Milwaukee Brewers sent Ricky Bones (August 29, 1996) to the New York Yankees to complete the trade. Pat Listach returned to original team on October 2, 1996.

Honorable Mention

The 1932 Chicago White Sox 

OWAR: 21.5     OWS: 205     OPW%: .380     (58-96)

AWAR: 17.0      AWS: 147     APW%: .325     (49-102)

WARdiff: 4.5                        WSdiff: 58  

The cellar-dwelling “Original” 1932 White Sox fared better than their “Actual” counterparts in terms of team WAR, Win Shares and winning percentage. Although the “Actuals” recorded only 49 victories, the team finished in seventh place ahead of the miserable Red Sox (43-111). Willie Kamm clubbed 34 doubles, delivered a .286 BA and drove in 83 baserunners for the Pale Hose. Second-sacker Bill Cissell posted career-bests in batting average (.315), runs (85), hits (184), doubles (36), home runs (7) and RBI (98). Rookie right fielder Bruce Campbell (.286/14/87) contributed 36 two-baggers and 11 three-base hits. Smead “Smudge” Jolley (.312/18/106) drilled 30 doubles while outfield mate Carl Reynolds produced a .305 BA. Luke Appling aka “Old Aches and Pains” rewarded the Chicago brass with 20 two-base hits and 10 triples after achieving full-time status. Ted Lyons completed 19 of 26 starts and furnished an ERA of 3.28.

On Deck

What Might Have Been – The “Original” 2001 Rangers

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive

 


MLB to Across the Pacific and Back

The player that all Milwaukee Brewers fans, and baseball fans for that matter, should be watching most closely this spring is Eric Thames. Thames, after three incredible seasons in the KBO, signed a three-year, $16-million deal to man first base for the Brewers. The front office likes what they see from the 2015 KBO MVP, but admittedly did not scout him in person while he was playing overseas; instead, they relied on video to make their assessment of his game. I’ll admit, I can’t wait to see Thames play this year; the mystery, concerns, and potential all make for great theater, but there is one question that keeps haunting me at night: How do former MLB payers fare when they play overseas and then return? As much as this post is about Thames, it is also about those few players who have done what he is doing.

I approached this by looking at all the major-league players who have played in both Korea and Japan over the past 10 years. I could have gone further back to the days when Cecil Fielder was playing in Japan, but the game, both in North America and across the Pacific, has changed significantly since then. The argument could be made that the game has changed significantly over the past 10 years — it changes every season — but that is the beauty of baseball.

I wanted to isolate Korea only, but, perhaps not surprisingly, there were too few players to make anything of that. Out of the several hundred total players in both these leagues over the past 10 years, only a total of 11 players who began their career in MLB returned to MLB after an overseas hiatus. That’s 11 between the KBO AND NPB. 11! Four players from the KBO and seven from NPB. Here’s a graph that shows their names and WAR before and after their careers in Japan and Korea:

Pre WAR MLB Season(s) Pre Post WAR MLB Season(s) Post
Joey Butler 0 2013-2014 0.5 2015
Brooks Conrad -0.1 2008-2012 -0.5 2014
Lew Ford 8.4 2003-2007 0 2012
Andy Green -1.2 2004-2006 0 2009
Dan Johnson 4.0 2005-2008 -0.8 2010-2015
Casey McGehee 1.6 2008-2012 -0.4 2014-2016
Kevin Mench 5.8 2002-2008 -0.4 2010
Brad Snyder -0.1 2010-2011 0.1 2014
Chad Tracy 5.7 2004-2010 -0.3 2012-2013
Wilson Valdez 0.7 2004-2005, 2007 -1.1 2009-2012
Matt Watson -0.5 2003. 2005 0.1 2010
Total WAR: 24.3 -2.8
Eric Thames -0.6 2011-2012 ? 2017-?

(Numbers courtesy of baseball-reference.com)

The outcome for these players is, well, not good. A select few players like Lew Ford and Chad Tracy carry the “pre-Japan/Korea WAR” section thanks to longer, successful careers in MLB before they changed leagues. It also seems unfair to compare these players to each other due to their careers, or lack thereof, upon their return. For example, Ford’s 79 plate appearances are incomparable to Wilson Valdez’s 966. But, in every case, the story arch is the same: Begin their professional baseball career in North America, make it to the majors as a 20-something, decline at the major- and minor-league level, go to Japan/Korea, return to North America in a very limited capacity and fail to make an impact with a major-league-affiliated team.

If the careers of these 11 players is a trend, then Eric Thames is in for a lot of trouble.

But there is reason to believe that Thames is the exception to the rule. Will Franta wrote a convincing Community Research article about the reason to believe that Eric Thames will do well. Additionally, various projections believe that Thames could be anywhere from a 1.2 to 2.2 WAR player with mid- to high-20 home-run totals and an above-average wRC+. Dave Cameron wrote an article analyzing the projections for Thames and concluded that he has the potential to be “the steal of the winter,” and for three years and $16 million, that could very well be true.

But there are factors going against Thames. It isn’t all too often professional players find their footing at the major-league level in their 30s (Thames will be 30 on Opening Day). Plus, with several other corner infielders in the form of Hernan Perez, Travis Shaw, Jesus Aguilar and others who could fill in at first if need be such as Ryan Braun and Scooter Gennett, a team in the middle of a rebuild might not completely be opposed to disposing the incumbent starting first baseman if another star emerges. Even comparing career KBO and NPB players to their transitions to MLB, we can see that there are a lot more Tsuyoshi Nishiokas than Jung-ho Kangs, which is why players like Kang, Ichiro Suzuki, Hideo Nomo, and Yu Darvish are lauded when they succeed in the majors.

I believe that Eric Thames will not be like the 11 others who, by and large, failed in their returns. Thames is intriguing and there is a lot to like about him — and a lot to worry about with him. There are pros and cons to his game. I believe that he will be a great addition to a team that, honestly, could afford to wait for him to assimilate completely to the game.


Adjusting Appearance Data for Base-Out State

So far, we’ve developed some mathematical principles for visualizing appearance data for relief pitchers, and for measuring how apart they are. The goal has been to say something about how pitchers are being used, not only in a vacuum, but in the context of the way in which the team has chosen to divide up its relief innings for the season. We’ve only partially gotten there so far, but today let’s take a slight detour to ask: Is the underlying data conveying the most useful information?

Inning and score differential at the time of entering the game are the critical data elements in answering questions related to usage. The numbers and tables in my previous articles all focused on using these two elements. Here’s an example of the underlying data being used, in the form of three Daniel Hudson appearances which appear identical.

Three (Similar?) Daniel Hudson Appearances
Date Player Season Inning Score
6/28/2016 Daniel Hudson 2016 8 1
8/20/2016 Daniel Hudson 2016 8 1
9/21/2016 Daniel Hudson 2016 8 1

Inning and score differential are critical; however, as data elements are concerned, they are somewhat raw. Fortunately, those aren’t the only data elements we can look at. The next-most impactful data, I would argue, is the base-out state at the time that the pitcher enters the game.

Let’s establish a baseline: It’s the norm for relief pitchers to enter the game in a clean inning (no outs, no runners on base). Among pitchers with 20+ relief appearances in 2016, this was the situation in 68.1% of appearances. That’s a very high percentage, considering that there are 24 base-out states. It’s also very intuitive when we think about the game. Among other reasons, pitchers need time to warm up, and mostly, they do so while their own team is batting. It’s also the only base-out state which is guaranteed to happen every inning.

It would be atypical – and therefore, interesting – for a pitcher to be used frequently in other base-out states. Moreover, we should be giving credit to pitchers who are being used in that way. An appearance where a pitcher enters with a four-run lead but the bases loaded should not be viewed in the same way as an appearance where a pitcher enters with a four-run lead in a clean inning. More than likely, the manager has two different pitchers in mind for each of these scenarios.

Adjusting the inning is easy: Credit partial innings in the event that the pitcher enters with more than zero outs in the inning. This will bump the inning component of every pitcher’s “center of gravity” up a bit, giving credit to players for working slightly later in the game when called upon mid-inning. (Note: we could also define terms in a different way, and say that a pitcher who enters in a “clean” 9th inning is actually entering at inning 8.0, as 8 innings have been recorded prior to his entrance; however, this makes the resulting metric less intuitive.)

Adjusting the score differential doesn’t seem as straightforward at first, but fortunately, we can use the concept of RE24 to accomplish this. Given that entering in a clean inning is the default status, we will make no adjustment to the score differential for a given appearance if the pitcher entered in a clean inning. For any other base-out state, we will add or subtract the difference between expected runs in that base-out state and expected runs in a clean inning state (0 on, 0 out).

Let’s return to the three appearances shown above. As you might have guessed by now, they are not identical. Rather, they illustrate the importance of adjusting for base-out state.

Three Daniel Hudson Appearances (in greater detail)
Date Player Inning Score Outs Bases Adj. Inn. Adj. Score
6/28/2016 Daniel Hudson 8 1 0 ___ 8.00 1.00
8/20/2016 Daniel Hudson 8 1 0 123 8.00 -0.82
9/21/2016 Daniel Hudson 8 1 2 _2_ 8.67 1.16

If you were to ask Daniel Hudson to recall what he could about these three appearances, he’d probably feel very differently about each of them (if he remembers, anyway). In the first case, he’s coming into a clean 8th inning, protecting a one-run lead. It was a situation he found himself in with some regularity in 2016, prior to assuming the closer’s role.

The second situation is an absolute bear. Jake Barrett has allowed a leadoff single to lead off the inning, and poor Steve Hathaway, who shouldn’t be touching this game situation with a 10-foot pole at this point in his career, has subsequently allowed a double and a walk to load the bases. Hudson has been brought in to protect a one-run lead with the bases loaded and nobody out. The opposing team has an expected run value of 2.282. While technically Hudson has been given a lead, it’s one that he would be hard-pressed to keep, even if he does everything right. The reality is that this appearance is associated with an expectation that Arizona will trail by the end of it – as you can see on the play-by-play log, the Padres have a 70.6% win probability at this point. It would be silly to give this appearance the same treatment as the first two. (Hudson, by the way, does a masterful job of escaping this situation without surrendering the lead!)

The third case is the one I want to focus on. Rather than a clean inning, Hudson was asked to get the third out of the 8th inning, with the tying run standing on second base. While the Leverage Index at the time of entry for this appearance is higher (3.50) than in the first instance (2.17), Hudson actually has an easier job: He needs just one out instead of three, and the opposing team is expected to score fewer runs in this situation, all else being equal. In the “clean” 8th inning, he can be expected to give up 0.481 runs, while in the two-out, runner-on-second situation, he can be expected to give up just 0.319 runs. Moreover, the chance of scoring at least one run – presumably the more important question where one-run leads are concerned – is also lower in the “higher leverage” situation. (This doesn’t even account for the batter, Hector Sanchez, who is hardly Wil Myers at the plate, and is probably inferior to the 4-5-6 hitters in the Phillies lineup, as well.)

This brings up an important distinction between leverage and run prevention. Leverage Index, certainly, is an important tool. What it measures, however, is variance in win probability for a single at-bat. Managers rarely have the luxury of giving their pitchers one-batter appearances in the regular season. Even the notoriously fleeting Javier Lopez averaged nearly three batters per appearance in 2016. Managers must therefore determine how to maximize the value of relief appearances as a whole, not just at the time when the reliever is entering the game. Leverage Index shows how much variance can arise from the current plate appearance, but a manager may very well be better served having their best pitcher throw the entirety of the 8th inning, rather than having him get the third out in a situation that commands high leverage but still has relatively low run expectation.

Next time, we’ll look at how base-out state adjustments impacted the raw inning-score matrix data in 2016, to draw conclusions about which relievers were used most often in high-pressure, mid-inning situations, and whether that sort of usage aligns with what we’d expect from an optimal manager.


An Attempt to Quantify Quality At-Bats (Part 2)

In my first article, I created a definition for what I feel like constitutes a quality at-bat. I also examined a few test cases1 and hypothesized different ways in which this data could be used going forward. As a reminder, my definition of a quality at-bat (QAB) is an at-bat that results in at least one of the following:

  1. Hit
  2. Walk
  3. Hit by pitch
  4. Reach on error
  5. Sac bunt
  6. Sac fly
  7. Pitcher throws at least six pitches
  8. Batter “barrels” the ball.

 

To calculate a QAB percentage I divided the player’s total number of QABs by his total number of plate appearances. I then dove a little deeper into QABs to see what conclusions I could draw from this statistic.

The first thing I did was run every hitter in 2016 who had more than 400 at-bats and created a leaderboard. I displayed the players with the best QAB% and the worst QAB% below. The average QAB percentage in 2016 was 48.54%.  Not surprisingly, Mike Trout leads all hitters and is followed closely by Joey Votto — a player who always finds a way to get on base. The player that stuck out to me most on this list was Chris Carter. This is a player who had a lot of trouble getting a contract this offseason, despite leading the league in homers. In fact, he had so much trouble that he considered going to Japan before finally signing with the Yankees. However, he had the 10th highest QAB percentage. Mike Napoli’s QAB% also surprised me because I do not view him to be a particularly elite hitter; yet he ranked number four between two of baseball’s best hitters.

Players with best QAB% Players with worst QAB%
Name QAB % Name QAB %
Mike Trout 64.02% Josh Harrison 41.83%
Joey Votto 63.52% Rajai Davis 41.82%
Freddie Freeman 57.93% Andrelton Simmons 41.74%
Mike Napoli 57.89% Ryan Zimmerman 41.67%
Josh Donaldson 57.71% Alcides Escobar 41.40%
Paul Goldschmidt 57.65% Jason Heyward 41.34%
Dexter Fowler 57.61% Adeiny Hechavarria 41.32%
DJ LeMahieu 57.30% Jonathan Schoop 40.49%
David Ortiz 55.27% Salvador Perez 40.22%
Chris Carter 55.16% Alexei Ramirez 38.46%

 

One commenter on my last post pointed out that OBP could be highly correlated with QAB%. They were right. In fact, there is a strong correlation of r2=.82 between OBP and QAB%, which makes sense since they share many of the same parameters. After this finding, I decided to create an interactive scatter plot of OBP and QAB% to see what the data looked like and to see if I could find any interesting patterns. If you interact with the graph you can see that the five players who seem to be a little above the data between .3 and .35 OBP are Chris Carter, Mike Napoli, Michael Saunders, Miguel Sano, and Jason Werth.

 

Click here for an interactive version

Why does QAB% seem to favor this group of players more than others? By investigating the other parameters in my definition of QABs, I found that these five hitters were taking a lot of pitches. In fact, all five of these hitters were in the top 15 last year in pitches per plate appearance, with Jason Werth and Mike Napoli being numbers one and two, respectively. Additionally, Chris Carter’s score was likely higher since he barreled the 8th most balls last season. This leads me to believe that QAB% tends to favor or distinguish hard-hitting, patient sluggers.

Is QAB% another way in which we should be evaluating hitter performance? Probably not. As much as I love seeing Chris Carter on a list with the best players in baseball, this statistic uses an old-school mindset that does not show true value. That being said, it can still be helpful. It is a good way to show which hitters are taking a lot of pitches. It also helps quantify what coaches and broadcasters mean when they say a player had a  “good at-bat.” Finally, perhaps you watched a lot of Indians games last season and you couldn’t help but feel like Mike Napoli was the best hitter ever. His QAB% may identify why you feel that way. Mike Napoli is a good hitter, but not nearly as good as former MVP Josh Donaldson despite the fact that they both have a very similar number of at-bats that a coach would call “quality”.  Overall, I think this statistic does a good job of quantifying something that used to be a lot harder to quantify. At the very least, QAB% has given me a reason to be excited about Chris Carter joining the Yankees, my favorite team. Opening day cannot come soon enough.

 

  1. In my first article I made a mistake with my test cases. Barrels, a Statcast statistic, did not start being counted until 2015. I had provided QAB numbers starting in 2014. With the way I wrote my code this actually caused the barrels in 2015 and 2016 not to be counted. I should not have provided 2014 numbers at all, and the numbers for 2015 and 2016 were a little lower than they should have been. All of my calculations have been corrected for this article.

 


WAR and the Relief Pitcher, Part II

Background

Back on 2016-Nov-11 I posted WAR and Eating Innings.

Basically, I was looking at reliever WAR and concluded that giving a lower replacement to relievers isn’t quite correct. Inning for inning, a replacement reliever needs to be better than a replacement starter, because eating innings has real value. But reliever/starter doesn’t actually capture the ability to eat innings, and I gave several examples where it fails historically.

I don’t have roster-usage numbers and don’t want to penalize a pitcher for sitting on the bench, but outs per appearance makes a nice proxy for the ability to eat innings; and in a linear formula that attempts to duplicate the current distribution of wins between relievers and starters, this gives roughly 0.367 win% as pitcher replacement level (as opposed to the current 0.38 for starters and 0.47 for relievers), and then penalized the pitcher roughly 1/100th of a win per appearance.

The LOOGY needs to be pretty good against his one guy to make up for that penalty, but for a starter it will make almost no difference.

That’s pretty much the entire article summarized in three paragraphs. By design, this doesn’t change much about 2016 WAR — it will give long relievers a modest boost, and very short relievers (LOOGYs and the like) a very modest penalty, and have an even smaller effect on starters.

So why did I bother?

Well, first, there are historical cases where it does matter; but more to the point, I was thinking that relievers are being undervalued by current WAR, and to examine this I needed a method to evaluate a reliever’s value compared to a starter’s value, and different replacement levels complicate that.

Why Do I Think Relievers Are Undervalued?

You could just go to this and read it; it shows that MLB general managers thought relievers were undervalued as of a few years ago. But that’s not what convinced me. What convinces me is the 2016 Reds pitching staff. 32 men pitched at least once for the Cincinnati Reds in 2016. Their total net WAR was negative.

Given that the Reds did spend resources (money and draft picks) on pitching, if replacement level is freely available, then that net negative WAR is either spectacularly bad luck, or spectacularly bad talent evaluation.

32 Reds pitchers were used; sort by innings pitched, and the top seven are all positive WAR, accounting for 5.6 of the Reds’ total of 6.7 positive WAR. Of their other 25 pitchers, only three had positive WAR: Michael Lorenzen (reliever, 50 innings, part of the Reds’ closer plans for the coming year), Homer Bailey (starter, coming off Tommy John and then injured again, only six appearances), and Daniel Wright (traded away mid-season, after which he turned back into a pumpkin and accumulated negative WAR for the season).

It sure sounds like the Reds coaches knew who their best pitchers were and used them. Their talent evaluation was not spectacularly bad. But they had 17 relievers with fewer than 50 innings, and not one of them managed to accumulate positive WAR for the year.

Based on results, we can list the possible mistakes in who they gave innings to: Maybe they could have used Lorenzen a bit more. That’s it; otherwise it’s hard to improve on who they gave the innings to. They also usually gave the high-leverage innings to their best relievers.

So, if replacement level is freely available, why did the Reds coaches give a total of 574.2 innings to 22 pitchers who managed between them to accumulate no positive WAR and 7.1 negative WAR?

If that’s just bad luck, it is spectacularly bad luck; and spectacularly consistent, as the Reds seem to have known in advance exactly who was going to have all this bad luck.

I don’t really believe it is bad luck. Thus, I don’t really believe that the Reds pitchers were below replacement, and the alternative is that replacement (at least for relievers) is too high.

GMs Still Agree: Relievers Are Undervalued by WAR

The article I referenced above was from the 2011-2012 off season; maybe something has changed.

As I write this (2017-Feb-24), FanGraphs’ Free Agent Tracker shows 112 free agents signed over the 2016-2017 off-season. 10 got qualifying offers and thus aren’t truly representative of their free-market value. 22 have no 2017 projection listed, and most of those went for minor-league deals (Sean Rodriguez and Peter Bourjos are the exceptions, and they aren’t pitchers). I’m going to throw those 32 out.

That leaves a sample of 80 players, 28 of them relievers or SP/RP. A fairly simple minded chart is below:

(Hmm, no chart. There was supposed to be a chart. Don’t see an option that will change this. Relief pitcher Average $/Year=5.7105*projected 2017 WAR with an R2 of 0.585; everyone else Average $/Year=4.6028+1.401*projected 2017 WAR with an Rof .5917. Note that the “everyone else” line, if you could see it, is below the relief pitcher line at 0 WAR, and then slopes up faster from there.)

R2 values aren’t great, and overall values per WAR are low because most of the big paydays are on multiyear contracts where value can be assumed likely to collapse by the end of the contract (I’m not including any fall-off). But the trend continues — MLB general managers think relievers are worth more than FanGraphs thinks they are.

The formula I give above (replacement of 0.367 win% with a −0.01 wins/appearance) is based on trying to reproduce the FanGraphs results. But if the FanGraphs results are wrong, then so is my formula.

Why the Current Values Might Be Wrong

I’ve shown why I think the current values are wrong, but what could cause such an error?

Roster spots change in value over time. That’s all it takes; the reliever is held to a higher (per-inning) standard because historical analysis indicated that he should be. But if roster spots were free, then it would be absurd to evaluate starters and relievers at all differently. The difference in value depends on the value of a roster spot; or, if using my method, the “cost” imposed per appearance needs to be based on the value of a roster spot.

Prior to 1915, clubs had 21 players, and no DL at all. In 1941, the DL restrictions were substantially loosened, and a team could have two players on the DL at the same time (60-day DL only at that time). In 1984, they finally removed the limits to the number of players on a DL at a time; in 2011, a seven-day concussion DL was added, and a 26th roster spot for doubleheader days; in 2017, the normal DL will be shortened to 10 days.

21 players and no DL makes roster spots golden. You simply could not have modern pitcher usage in such a period.

Not to mention the fact that, in 1913, you’d never have been able to get a competent replacement on short notice. Jets and minor-league development contracts both also dropped the value of a roster spot.

25-26 roster spots, September call-ups to 40, and starting this year you can DL as many players you want for periods short enough that it’s worth thinking about DLing your fifth starter any time you have an off day near one of his scheduled starts. Roster spots are worth a lot less today; it’s not surprising that reliever WAR seems off, when it was based on historical data, and the very basis for having a different reliever replacement level is based on the value of a roster spot.

Conclusion

When I started this, I was hoping to produce a brilliant result about what relief-pitcher replacement should be. I have failed to do so; there’s simply too little data, as shown by the low R2 values on the chart I tried to include above, to make a serious try at figuring out what general managers are actually doing in terms of their concept of reliever replacement level.

But the formula I suggested back in November has an explicit term acting as a proxy for the value of a roster spot, and that term can be adjusted for era. If you drop the cost of an appearance from 0.01 WAR to some lower value, raising replacement a bit to compensate, you’ll represent the fact roster spots have changed in value over time.

Given any reasonable attempt to estimate the cost per appearance based on era, I don’t see how this could be worse than the current methods.


Prospect Watch: 5 Future All-Stars No One Is Talking About

I chose to stick with hitters in this article, because pitching prospects are extremely difficult to predict, and I think the pitchers who do get the hype are typically deserving. However, I do see a trend of some unnoticed hitting prospects turning out great careers in the majors. Let’s get right to it.

1. Travis Demeritte – 2B – ATL

In 2016, Demeritte went from the Rangers’ to the Braves’ system and spent the entire year in high-A ball, where he dominated at the plate. A 2B with power like Cano, good speed and the ability to get on base is such a rarity.

In my opinion, Demeritte has the highest chance of being a perennial All-Star out of these five prospects. The middle infield in Atlanta has an extremely bright future. I’m predicting that Demeritte will make his splash in 2018, and make his first ASG appearance by 2020 (age 25). Let’s look at his numbers from a season ago:

 

Name Age G AB PA H 2B 3B HR BB SO SB CS BB% K% OPS ISO wOBA wRC+
Travis Demeritte 21 145 547 635 145 33 13 32 78 200 20 4 12.3% 31.5% 0.905 0.283 0.393 139


Let’s compare these to the four All-Star 2B in 2016 and Brian Dozier.

Name G AB PA H 2B 3B HR BB SO SB CS BB% K% OPS ISO wOBA wRC+
Jose Altuve 161 640 717 216 42 5 24 60 70 30 10 8.4% 9.8% 0.928 0.194 0.391 150
Robinson Cano 161 655 715 195 33 2 39 47 100 0 1 6.6% 14.0% 0.882 0.235 0.37 138
Brian Dozier 155 615 691 165 35 5 42 61 138 18 2 8.8% 20.0% 0.886 0.278 0.37 132
Dustin Pedroia 154 633 698 201 36 1 15 61 73 7 4 8.7% 10.5% 0.825 0.131 0.358 120
Ian Kinsler 153 618 679 178 29 4 28 45 115 14 6 6.6% 16.9% 0.831 0.196 0.356 123


Some things to keep in mind as we compare these players: Demeritte was playing in A+ ball, but he did play an average of 12 less games than these major-leaguers. As you can see, it’s basically a two-man race (other than Dozier’s 42 HRs) between Altuve and Demeritte here. While we cannot expect these A+ ball numbers to translate directly against ML pitching, Demeritte definitely deserves more attention in top-prospect lists. While he’s not quite as speedy as Altuve, he has more power, and he walks at a far higher rate. The one glaring weakness is the K numbers for Demeritte. However, some of the top players in the league K at very high rates. As long as the OPS stays high, it doesn’t really matter how a guy makes outs anymore.

I should note that 2016 was a breakout year for Demeritte; in years past he didn’t quite live up to his potential, and also served an 80-game PED suspension. These could be the main reasons why he hasn’t garnered much attention yet. He still has to prove himself to most. However, I’m sold. I’d pencil him in for the majority of the 2020s’ ASGs right now.

 

2. Ramon Laureano – OF – HOU

Laureano has all the tools: he can play any OF spot well, he has speed and pop, and he gets on base. Houston’s farm has taken a bit of a hit due to some trades in the last two years, but that’s because they knew they had guys like Laureano who don’t have super high trade value, but have a chance to be great ML players like the guys they traded. Let’s look at Laureano’s 2016 numbers.

Name Age G AB PA H 2B 3B HR BB SO SB CS BB% K% OPS ISO wOBA wRC+
Ramon Laureano 21 128 461 555 146 32 9 15 73 128 48 15 13.2% 23.1% 0.943 0.206 0.418 159


The numbers speak for themselves. This is the making of a star; where is the hype? I know it’s not a huge sample size, and we don’t have much to go off from the previous year either, but in A+ and AA last year he put up those phenomenal numbers you see above.

If those aren’t All-Star numbers, then I don’t know what are. Laureano’s ability to play all three OF spots will keep him in the lineup everyday and help his chances of making it to the ASG. When he does get the call-up, if his numbers stay relatively close to this, there’s no way he doesn’t make three to four All-Star Games. As of now, he’s more of a speed threat, but as he develops, the speed/power combo will even out and he will be an Andrew McCutchen-type player. Keep tabs on this guy.

 

3. Christin Stewart – OF – DET

While researching Stewart, I couldn’t find an article more recent than September of 2015. There’s no one talking about him…why? As we know, Detroit is aging and looking to deal top players. So, I’m assuming we will be seeing a lot of opportunities for young guys to step up and prove themselves. Detroit’s system isn’t super deep, but that could change anytime if they do decide to move some key pieces. Regardless, I see Stewart as the prospect to watch moving forward; he has the tools to be an All-Star. Let’s check out his numbers from 2016.

Name Age G AB PA H 2B 3B HR BB SO SB CS BB% K% OPS ISO wOBA wRC+
Christin Stewart 22 147 514 622 132 29 2 31 93 154 4 2 15.0% 24.8% 0.883 0.245 0.407 156


The power is impressive, and by this chart he looks even a bit better than the two previous guys I mentioned. However, with the K numbers pretty high up there, and not a whole lot of speed, Stewart is a player that could fall into slumps. Often times, adjusting to the majors can be challenging, and some top prospects never quite figure it out. While Stewart’s MiLB numbers are pretty insane, his slump potential makes him a pretty risky pick here. However, I do believe that if he does indeed figure it out, he will make it to a few ASG and serve as an everyday player in this league for a decade. HRs and BBs get it done. Keep an eye on Stewart.

 

4. Jason Martin – OF – HOU

Another Houston OF prospect…another future All-Star? I think so. The future is certainly bright over at Minute Maid Park: Altuve is a cornerstone, Correa is a centerpiece, Springer is a baller, and they have prospects for days. If they can just figure out how to pitch, they could be a WS contender for the next eight years.

Why Martin, though? Let’s check out his 2016 numbers from high-A ball.

Name Age G AB PA H 2B 3B HR BB SO SB CS BB% K% OPS ISO wOBA wRC+
Jason Martin 20 121 431 502 114 25 7 23 63 112 22 12 12.5% 22.3% 0.874 0.251 0.382 131


Impressive, to say the least. At just 20 years old, he pumped out 23 homers in 121 games. He walks every eight at-bats, and he also grabbed 22 bags on the season. The ability to walk and run (lol) will typically keep guys out of major slumps. While Martin is not a highly-touted prospect at this point, I think he will be a household name by 2022. I expect him to get the call-up in 2019 and play a significant role during a pennant race that year. In 2020, he will burst onto the scene and prove his worth to this franchise.

With Houston’s current build, this might be a guy we see dealt if they are trying to add talent at the deadline this year. That doesn’t change my prediction, however. I see Martin suiting up for the ASG a few times throughout his career. Stay posted.

 

5. Tom Murphy – C – COL

You can’t keep putting Yadier Molina in there every year. And with Buster Posey most likely making that change to 1B full-time within three years, Jonathan Lucroy getting dealt to the AL, Kyle Schwarber playing OF, etc, pathways for guys like Tommy Murphy open up. Making the All-Star Game as a C is not saying as much as other positions, in my opinion. A decent hot streak in the first half will inflate your hitting numbers. For example, Derek Norris in 2014. It may seem like he was the best catcher in the league at the halfway point, but, as usual, it evened out by season’s end.

With that being said, Murphy has proven he has pop, and playing in Colorado is a huge advantage for him. While I don’t think he will be a Hall-of-Fame catcher, I do think he’s flying under the radar right now and will probably open some eyes in 2017. I’d say he makes two appearances in the ASG before 2022. However, once he gets up near 30 and he’s no longer playing in Colorado, I think he will have trouble keeping a job.

I have him on the list, first of all, because he meets the criteria, and also because I think people should pay attention to him, and lastly because he’s ML-ready, unlike the rest of these guys. Trevor Story didn’t have a whole lot of hype; most people didn’t expect him to make the team out of spring, but with the Jose Reyes situation, the kid got a shot and as we all know, he ran with it. I’m not saying Murphy will make a cannonball-esque splash like Story, but I think he will turn some heads and maybe even get some ASG votes this year. Anything can happen, especially in Colorado. Keep tabs on him.

Honorable Mentions

Dylan Cozens – OF – PHI

There’s not a lot of buzz surrounding Cozens, which is surprising to me, because usually when we see 40 HR in 134 games, we really perk up. In his age-22 season, he played all 134 games at the AA level for the Phillies affiliate, Reading Fightin’ Phils, a place where most Phillies prospects prosper. The reason why Cozens doesn’t quite make the cut here is because of the words, “future All-Star.” He is one of those lefties that mash in the right ballpark and against RHP, but usually career platoon hitters, even if they are highly effective, don’t make the ASG.

Rhys Hoskins – 1B – PHI

Hoskins is another AA player in the Phillies system. He probably has a little bit more of a well-rounded hitting ability than does Cozens, but he’s a 1B, and that’s an overloaded position. You have to be incredible to crack that ASG squad, and I just don’t think Hoskins will ever be quite at that level. I do believe he will pan out to be an everyday guy for a good amount of time in this league. He has really good power and he gets on base, two things that will keep you in the lineup more often than not.

Bobby Bradley – 1B – CLE

Bradley is another guy I would keep an eye on; I’m just not sold on him yet. He has a a lot of raw power, but a really high K rate in the low levels of the minors. Also, he’s a 1B, so once again, really hard to make the ASG at that position.


xFantasy, Part IV: “Projecting” Breakouts and Busts in 2017

Back in December, I introduced “xFantasy” through a series of entries here at the FanGraphs Community blog. At its inception, xFantasy was a system based on xStats that integrated hitters’ xAVG, xOBP, and xISO in order to predict expected fantasy production (HR, R, RBI, SB, AVG). The underlying models are put together into an embedded “Triple Slash Converter” in Part 2. Part 3 compares the predictive value of xFantasy (and therefore xStats) vs. Steamer and historic stats, ultimately finding that for players under 26, xStats are indeed MORE predictive than Steamer!

To quote myself from the first piece, Andrew Perpetua over at the main blog has developed a great set of data using his binning strategy, which has been explained and updated this offseason, including some additional work since then to include park factors and weather factors. He produces xBABIP, xBACON, and xOBA numbers based on Statcast’s exit velocity/launch angle data, along with the resulting ‘expected’ versions of the typical slash-line stats, xAVG/xOBP/xSLG. Recently, Andrew has published a set of “2017 estimates” that takes the past two years of Statcast data and weights them appropriately to come up with the best estimate for a player’s xStats moving forward. After a bit of back and forth on Twitter with Andrew discussing how exactly these numbers get weighted, I think they are looking really good. I’m now adopting these numbers as the basis for xFantasy from this point on.

There are a few key takeaways from xFantasy so far that will tell us where to go next:

  1. xFantasy is not *truly* a projection. We don’t have minor-league data. We don’t have data from before 2015. At this point, xFantasy for 2017 is a weighted average of player performance from 2015-2016, so keep in mind that things like injuries or down years might have tanked a player’s xStats.
  2. More data is always better than less data. Steamer projections do a better job with established players than xFantasy does, likely due to having more info about past performance.
  3. Players under 26 have short track records, and xFantasy beats Steamer in projecting them going forward! For young players, or players that have undergone some significant, recent transformation at the MLB level, xFantasy could give us better info than traditional projections.

So what’s it mean? At this time, I will echo Andrew’s repeated recommendations that you should *not* use xFantasy as your projection system of choice in 2017. On average, Steamer will do better (at least for now…I think 2017 could be the year where we finally have enough Statcast data to put up a challenge). But xFantasy could be very useful in helping you to identify players (on a case-by-case basis) with short track records that might deserve a bump up or down from the projections spit out by the traditional systems.

For now, I’ve identified 10 (five up, five down) hitters aged 26 and under heading into 2017 that might deserve a second look based on xFantasy. Included below is each player’s xFantasy line and Steamer-projected 2017 line, both scaled to 600 PA, along with the 5×5 $ values, and at the far right, the difference between the two.

While the Billy Butler/Danny Valencia debacle was definitely the most interesting thing going on with the A’s late in 2016, Ryon Healy was a pretty good story himself. He came seemingly out of nowhere to hit over .300 with 13 HR in 283 second-half PAs, playing his way into a spot as the everyday 3B and likely No. 3 hitter for the 2017 A’s. xStats says you should believe it, with a .324 xAVG and 30 xHR. Steamer hasn’t bought into the average/power yet, but the relatively low ~20% K rate looks real.

Trevor Story was the best player in baseball for a couple of weeks this past year, and it seems to me that the late-season injury has made people forget that. xFantasy didn’t forget, though, and even with the huge K-rate, is seeing a .281 xAVG with 39 HR and 12 SB. Based on this line, I’m slotting Story comfortably into the same tier of SS’s as Correa, Seager, and Lindor for 2017. Downgrade in weekly H2H leagues where the away games can kill him a bit.

Gary Sanchez and Trea Turner have been well covered by Andrew here and here. I’ll just add that even though both are expected to regress from their lofty 2016 performances, xFantasy backs up the idea that they’ll both still be among the best players in baseball. Steamer is missing the boat on both guys.

I personally had a love/hate relationship with Tyler Naquin in 2016, who bounced on and off my roster in the “Beat Paul Sporer” NFBC league and always seemed to hit well when he was on the wire, and never when he was on my team. He’s been a trendy topic this offseason among people still using “Sabermetrics 1.0” to point at his BABIP and say he’ll be terrible in ’17. Statcast says he actually hit well enough to earn a .370 BABIP! Combine that with what seems to be a developing power profile and something like 15 SBs and you’ll have a nice little player for your fantasy squad. Just hope Cleveland plays him!

On the downside, we have quite a few players that have been trendy ‘sleeper’ picks in the lead-up to 2017 drafts so far. Javier Baez, even if he manages to find playing time in a crowded Cubs infield, just hasn’t hit the ball well enough to overcome the poor plate discipline. Mitch Haniger hit .229 in limited time (123 PA) but Statcast says he hit even worse than that — let’s hope it’s just a sample-size thing, because a .213 xAVG won’t cut it if you’re only getting 20 HR from him.

Yasiel Puig has been in the major leagues longer than many of these guys, so at this point maybe we should just believe Steamer, but I figured it would be worth including him here because it’s an interesting case to study. He hit .255 and .263 in 2015 and 2016 respectively, and that wasn’t bad luck according to Statcast, with a .249 xAVG in that time. Steamer still buys a bounceback to his pre-2015 ways with a .284 projection. I’m actually leaning toward Steamer here, because I believe that Puig’s stats have been heavily influenced by his various leg injuries over the past two years. Maybe I should see repeated injuries and use that to project future injuries, but in this case I’m going to give a 26-year-old the benefit of the doubt and say that a healthy Puig should match this Steamer projection in 2017.

Two more 24-year-olds close us out:  Max Kepler was very, very good in July and very, very bad after that, en route to an xFantasy line that doesn’t believe in the power, and *does* believe in the very poor BABIP and AVG. Staying away from that garbage pile, and moving on to another…A.J. Reed! He was supposed to be the chosen one last year, and instead he gave us his best 2014 Melvin Upton impression…without the speed. His playing-time picture is even more unclear than Baez’s, and even if he plays, Statcast tells me he has some work to do.

And finally, for an honorable mention of a player that’s new on the scene, but too old to qualify, I have to bring up Ryan Schimpf:

Woah.

Next time…

I closed out Part 3 by promising xFantasy for pitchers was coming, and it is! Using a model based on scFIP, xOBA, and xBACON, xFantasy for pitchers v1.0 now exists. There’s still work to be done in order to determine how useful it actually is, though!

As I said last time, it’s been fun doing this exploration of rudimentary projections using xFantasy and xStats. Hopefully others find it interesting; hit me up in the comments and let me know anything you might have noticed, or if you have any suggestions.


Which MLB Hitters Have Gotten Off the Ground?

Following up on excellent recent pieces by Travis Sawchik and Jeff Sullivan, I had a hypothesis: If there is truly a swing-path revolution underway in MLB, perhaps the best hitters by wOBA and wRC+ showed more marked FB+LD%’s (Air%) tendencies in 2015-2016 than in years past? If not them, then perhaps there is a trend among the middle and/or lower classes of hitters?

The hypothesis was wrong, but the investigation still gave some interesting context to the 2016 power spike and the profiles of recent successful/unsuccessful MLB hitters in general.

Here’s a plot of the average FB%+LD% (Air%) for each year, 2009-2016, for all qualifying MLB hitters per FanGraphs leaderboards, divided into three roughly even buckets of 40-50 players by wRC+ (<100wRC+ left, 100-120wRC+ center, >120 wRC+ right):

Here’s a plot of the average FB%+LD% (Air%) for each year, 2009-2016, for all qualifying MLB hitters per FanGraphs leaderboards, divided into three roughly even buckets of 40-50 players by wOBA ( <.320 left, .320-.350 center, >.350 right):

The consistency of these numbers is remarkable. The writing has been on the wall for some time with regards to the benefits of hitting it in the air.

Perhaps plenty of hitters are (and always have been) trying to hit it in the air more often and are either failing to make the change stick, or not finding success quickly enough to stick with the change / stay in the league?

We aren’t seeing across-the-board nor player-class-specific changes that stand out beyond random variation by this method (yet).

There could be an equilibrium point here where given the best pools of pitching and hitting talent available (regardless of how they arrived at said status), the outcomes will be pretty similar at a macro level, save for major fundamental changes to how the game is played.

This does not mean that individual players cannot aspire to find more optimal approaches. Surely there have always been hitters finding success via these means, and only recently have we been focusing on batted-ball data and focusing on these traits of the transformations.

Preach on, Josh Donaldson: Ground balls? They call those outs up here.


Hardball Retrospective – What Might Have Been – The “Original” 1993 Angels

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the teams with the biggest single-season difference in the WAR and Win Shares for the “Original” vs. “Actual” rosters for every Major League organization. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com.

Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

AWAR – Wins Above Replacement for players on “actual” teams

AWS – Win Shares for players on “actual” teams

APW% – Pythagorean Won-Loss record for the “actual” teams

Assessment

The 1993 California Angels 

OWAR: 39.3     OWS: 277     OPW%: .533     (86-76)

AWAR: 27.8      AWS: 212     APW%: .438     (71-91)

WARdiff: 11.5                        WSdiff: 65  

The “Original” 1993 Angels placed runner-up to the Rangers for the division title, yet the ball club held a fifteen-game advantage over the “Actual” Halos. Tim Salmon garnered 1993 AL Rookie of the Year honors with a .283 BA, 31 dingers, 95 ribbies and 93 runs. Devon White collected his fifth Gold Glove Award and posted career-bests with 42 doubles and 116 runs scored. “Devo” successfully swiped 34 bags in 38 attempts. Dante Bichette provided a .310 BA while clubbing 43 two-base hits and launching 21 moon-shots. Wally Joyner aka “Wally World” contributed 36 doubles along with a .292 BA. Chad Curtis tallied 94 runs and pilfered 48 bases in his sophomore season. Brian Harper (.304/12/73), Mark T. McLemore (.284/4/72) and Paul Sorrento (.257/18/65) augmented the Angels’ attack.

Wally Joyner ranked thirty-seventh among first basemen according to “The New Bill James Historical Baseball Abstract” top 100 player rankings. “Original” Angels registered in the “NBJHBA” top 100 ratings include Dickie Thon (57th-SS), Tim Salmon (72nd-RF), Devon White (81st-CF), Tom Brunansky (85th-RF), Dante Bichette (90th-RF) and Brian Harper (99th-C). Furthermore, the list includes Gary Gaetti (34th-3B) and Chili Davis (64th-RF) from the “Actual” Angels ’93 roster.

Original 1993 Angels                                      Actual 1993 Angels

STARTING LINEUP POS OWAR OWS STARTING LINEUP POS AWAR AWS
Chad Curtis LF/CF 2.16 16.51 Luis Polonia LF -0.17 10
Devon White CF 4.47 21.28 Chad Curtis CF 2.16 16.51
Tim Salmon RF 4.36 24.61 Tim Salmon RF 4.36 24.61
Dante Bichette DH/RF 1.71 19.35 Chili Davis DH 0.33 11.91
Wally Joyner 1B 3.14 18.09 J. T. Snow 1B 0.66 10.09
Mark McLemore 2B/RF 2.19 13.37 Damion Easley 2B 1.15 8.38
Gary Disarcina SS -1.15 5.73 Rene Gonzales 3B 0.29 7.04
Damion Easley 3B/2B 1.15 8.38 Gary Disarcina SS -1.15 5.73
Brian Harper C 1.27 15.66 Greg Myers C 0.59 4.26
BENCH POS AWAR AWS BENCH POS AWAR AWS
Paul Sorrento 1B 1.03 13.23 Torey Lovullo 2B 0.39 7.35
Erik Pappas C 1 8.23 Stan Javier LF 1.17 7.1
Dickie Thon SS 0.02 4.88 Eduardo Perez 3B -0.21 3.25
Eduardo Perez 3B -0.21 3.25 Rod Correia SS -0.15 2.84
Dick Schofield SS -0.15 2.43 Chris Turner C 0.6 2.25
Ruben Amaro CF 0.44 2.29 Kelly Gruber 3B 0.2 2.19
Chris Turner C 0.6 2.25 Kurt Stillwell 2B -0.19 1.33
Tom Brunansky RF -0.6 1.56 Ron Tingley C -0.47 1.24
Doug Jennings 1B 0.17 1.46 John Orton C 0.05 1.03
John Orton C 0.05 1.03 Jim Edmonds RF -0.13 0.78
J. R. Phillips 1B 0.17 0.87 Ty Van Burkleo 1B -0.03 0.5
Jim Edmonds RF -0.13 0.78 Jim Walewander SS 0.04 0.41
Larry Gonzales C 0.06 0.24 Larry Gonzales C 0.06 0.24
Jeff Manto 3B -0.23 0.09 Gary Gaetti 3B -0.39 0.12
Gus Polidor 3B -0.04 0.02 Jerome Walton DH -0.03 0.06

Chuck Finley (16-14, 3.15) whiffed 187 batsmen and paced the Junior Circuit in complete games with 13. The Halos compensated for a pedestrian rotation with a stellar bullpen consisting of Bryan Harvey (1.70, 45 SV), Roberto Hernandez (2.29, 38 SV) and Alan Mills (5-4, 3.23). Mark Langston (16-11, 3.20) topped the “Actuals” in strikeouts (196) and innings pitched (256.1) while earning his fourth All-Star invitation.

  Original 1993 Angels                              Actual 1993 Angels 

ROTATION POS OWAR OWS ROTATION POS AWAR AWS
Chuck Finley SP 4.9 18.94 Mark Langston SP 6.16 20.37
Jim Abbott SP 1.34 9.75 Chuck Finley SP 4.9 18.94
Frank Tanana SP 1.03 7.07 Scott Sanderson SP 0.65 5.75
Phil Leftwich SP 1.5 5.13 Phil Leftwich SP 1.5 5.13
Kirk McCaskill SP -0.43 2.35 Joe Magrane SP 0.26 2.58
BULLPEN POS OWAR OWS BULLPEN POS OWAR OWS
Bryan Harvey RP 3.46 17.47 Joe Grahe RP 0.86 7.28
Roberto Hernandez RP 2.49 15.5 Steve Frey RP 0.67 6.92
Alan Mills RP 1.45 9.45 Mike Butcher RP 0.33 4.35
Joe Grahe RP 0.86 7.28 Gene Nelson RP 0.32 4.31
Mike Fetters RP 0.25 4.25 Ken Patterson RP 0.19 2.92
Hilly Hathaway SP 0.04 2.15 Hilly Hathaway SP 0.04 2.15
Scott Lewis SP 0.3 1.61 Scott Lewis SP 0.3 1.61
Mike Witt SP -0.13 1.23 Brian Anderson SP 0.17 0.63
Brian Anderson SP 0.17 0.63 Darryl Scott RP -0.22 0.42
Mike Cook RP 0.08 0.47 Chuck Crim RP -0.27 0.4
Darryl Scott RP -0.22 0.42 John Farrell SP -1.65 0
Marcus Moore RP -0.56 0.36 Mark Holzemer SP -0.83 0
Mark Holzemer SP -0.83 0 Doug Linton RP -0.81 0
Dennis Rasmussen SP -0.62 0 Jerry Nielsen RP -0.61 0
Paul Swingle RP -0.37 0 Russ Springer SP -1.03 0
Paul Swingle RP -0.37 0
Julio Valera SP -1.13 0

Notable Transactions

Devon White 

December 2, 1990: Traded by the California Angels with Willie Fraser and Marcus Moore to the Toronto Blue Jays for a player to be named later, Junior Felix and Luis Sojo. The Toronto Blue Jays sent Ken Rivers (minors) (December 4, 1990) to the California Angels to complete the trade. 

Dante Bichette

March 14, 1991: Traded by the California Angels to the Milwaukee Brewers for Dave Parker.

November 17, 1992: Traded by the Milwaukee Brewers to the Colorado Rockies for Kevin Reimer.

Wally Joyner

October 28, 1991: Granted Free Agency.

December 9, 1991: Signed as a Free Agent with the Kansas City Royals. 

Bryan Harvey

November 17, 1992: Drafted by the Florida Marlins from the California Angels as the 20th pick in the 1992 expansion draft.

Brian Harper 

December 11, 1981: Traded by the California Angels to the Pittsburgh Pirates for Tim Foli.

December 12, 1984: Traded by the Pittsburgh Pirates with John Tudor to the St. Louis Cardinals for Steve Barnard (minors) and George Hendrick.

April 1, 1986: Released by the St. Louis Cardinals.

April 25, 1986: Signed as a Free Agent with the Detroit Tigers.

March 23, 1987: Released by the Detroit Tigers.

May 12, 1987: Purchased by the Oakland Athletics from San Jose (California).

October 12, 1987: Released by the Oakland Athletics.

January 4, 1988: Signed as a Free Agent with the Minnesota Twins.

November 4, 1991: Granted Free Agency.

December 19, 1991: Signed as a Free Agent with the Minnesota Twins. 

Mark T. McLemore 

August 17, 1990: the California Angels sent Mark McLemore to the Cleveland Indians to complete an earlier deal made on September 6, 1989. September 6, 1989: The California Angels sent a player to be named later to the Cleveland Indians for Ron Tingley.

December 13, 1990: Released by the Cleveland Indians.

March 6, 1991: Signed as a Free Agent with the Houston Astros.

June 25, 1991: Released by the Houston Astros.

July 5, 1991: Signed as a Free Agent with the Baltimore Orioles.

October 15, 1991: Granted Free Agency.

February 5, 1992: Signed as a Free Agent with the Baltimore Orioles.

December 19, 1992: Released by the Baltimore Orioles.

January 6, 1993: Signed as a Free Agent with the Baltimore Orioles.

Honorable Mention

The 2001 Anaheim Angels 

OWAR: 37.4     OWS: 267     OPW%: .467     (76-86)

AWAR: 31.1      AWS: 225     APW%: .463     (75-87)

WARdiff: 6.3                        WSdiff: 42  

The “Original” and “Actual” 2001 Angels finished in the American League West basement. Perennial Gold Glove center fielder Jim Edmonds socked 38 doubles and 30 long balls. “Jimmy Baseball” supplied a .304 BA with 95 runs scored and 110 ribbies. Mark T. McLemore batted .286 and nabbed 39 bags in 46 attempts. Troy Glaus crushed 41 circuit clouts and 38 two-baggers as he topped the century mark in runs and RBI. Garret Anderson rapped 194 base knocks including 39 doubles and 28 round-trippers while establishing a personal-best with 123 RBI.  Jarrod Washburn delivered 11 victories with an ERA of 3.77. Troy Percival (1.65, 39 SV) made his fourth appearance in the Mid-Summer Classic and furnished a 0.988 WHIP with more than 11 strikeouts per 9 innings pitched. Glaus, Anderson, Washburn and Percival appear on the “Original” and “Actual” Angels rosters in 2001.

On Deck

What Might Have Been – The “Original” 1999 White Sox

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive