Archive for Research

Hardball Retrospective – What Might Have Been – The “Original” 2001 Rangers

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the teams with the biggest single-season difference in the WAR and Win Shares for the “Original” vs. “Actual” rosters for every Major League organization. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com.

Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

AWAR – Wins Above Replacement for players on “actual” teams

AWS – Win Shares for players on “actual” teams

APW% – Pythagorean Won-Loss record for the “actual” teams

Assessment

The 2001 Texas Rangers 

OWAR: 48.4     OWS: 278     OPW%: .513     (83-79)

AWAR: 34.2      AWS: 219     APW%: .451     (73-89)

WARdiff: 14.2                        WSdiff: 59  

The “Original” 2001 Rangers placed third in the American League West behind Seattle and Oakland. Sammy “Say It Ain’t” Sosa (.328/64/160) established personal bests in batting average, runs scored (146), RBI and bases on balls (116) while placing runner-up in the MVP balloting. Rich Aurilia (.324/37/97) contributed career-highs in nearly every batting classification including 114 tallies and 206 safeties. Juan “Igor” Gonzalez (.325/35/140) achieved his third All-Star invite and finished fifth in the American League MVP race. Ivan “Pudge” Rodriguez (.308/25/65) merited his tenth straight Gold Glove Award. Jose Hernandez swatted 26 two-baggers and 25 big-flies. The “Actuals” lineup featured Alex Rodriguez (.318/52/135) who paced the circuit in four-baggers and runs scored (133). Rafael Palmeiro (.273/47/123) surpassed the century mark in walks and equaled his single-season HR total. Frank Catalanotto batted at a .330 clip and ripped 31 two-base hits.

Ivan Rodriguez rated thirteenth among backstops according to “The New Bill James Historical Baseball Abstract” top 100 player rankings. “Original” Rangers registered in the “NBJHBA” top 100 ratings include Sammy Sosa (45th-RF), Juan Gonzalez (52nd-RF) and Ruben Sierra (70th-RF). Moreover, Alex Rodriguez (17th-SS), Rafael Palmeiro (19th-1B), Ken Caminiti (25th-3B) and Andres Galarraga (42nd-1B) achieved the distinction among members of the “Actuals” roster.

  Original 2001 Rangers                              Actual 2001 Rangers

STARTING LINEUP POS OWAR OWS STARTING LINEUP POS AWAR AWS
Rusty Greer LF -0.04 5.64 Frank Catalanotto LF 2.19 16.86
Mark Little CF 0.39 2.69 Gabe Kapler CF 0.85 12.52
Sammy Sosa RF 9.56 43.85 Ricky Ledee RF -0.48 2.21
Juan Gonzalez DH/RF 4.21 23.5 Ruben Sierra DH 0.82 9.21
Carlos Pena 1B 0.21 2.01 Rafael Palmeiro 1B 3.62 24.62
Benji Gil 2B/SS 0.99 6.69 Randy Velarde 2B 1.57 8.75
Rich Aurilia SS 5.46 32.44 Alex Rodriguez SS 8.2 34.67
Mike Lamb 3B -0.03 6.37 Mike Lamb 3B -0.03 6.37
Ivan Rodriguez C 3.92 19.8 Ivan Rodriguez C 3.92 19.8
BENCH POS OWAR OWS BENCH POS AWAR AWS
Rey Sanchez SS 2.37 13.45 Michael Young 2B 0.09 6.32
Jose Hernandez SS 2.7 12.63 Rusty Greer LF -0.04 5.64
Ruben Sierra DH 0.82 9.21 Bill Haselman C 0.09 3.71
Chad Kreuter C 1.28 9.03 Ken Caminiti 3B -0.07 3.66
Dean Palmer DH 0.14 4.53 Andres Galarraga DH -0.71 3.22
Bill Haselman C 0.09 3.71 Chad Curtis CF 0.14 2.04
Jeff Frye 2B -0.45 3.61 Carlos Pena 1B 0.21 2.01
Fernando Tatis 3B -0.26 2.25 Doug Mirabelli C -0.09 1.31
Ruben Mateo RF -0.61 1.31 Ruben Mateo RF -0.61 1.31
Andy Barkett LF 0.11 1.26 Scott Sheldon 3B -0.56 0.89
Kevin L. Brown C 0.11 1.09 Bo Porter LF -0.18 0.77
Craig Monroe RF 0.03 0.61 Craig Monroe RF 0.03 0.61
Warren Morris 2B -0.43 0.53 Mike Hubbard C 0.06 0.41
Cliff Brumbaugh RF -0.39 0.24 Marcus Jensen C -0.27 0.29
Scott Podsednik LF -0.06 0.04 Chris Magruder LF -0.32 0.12
Kelly Dransfeldt SS -0.03 0.04 Kelly Dransfeldt SS -0.03 0.04
Cliff Brumbaugh RF -0.16 0.02

Kevin J. Brown (10-4, 2.65) fashioned a 1.141 WHIP in an abbreviated season (19 starts). Robb Nen (3.01, 45 SV) struck out 93 batters in 77.2 innings and topped the circuit in saves. Jeff Zimmerman (2.40, 28 SV) was nearly unhittable out of the bullpen, producing a 0.897 WHIP.

  Original 2001 Rangers                            Actual 2001 Rangers 

ROTATION POS OWAR OWS ROTATION POS AWAR AWS
Kevin J. Brown SP 2.66 10.63 Doug Davis SP 2.6 9.25
Doug Davis SP 2.6 9.25 Rick Helling SP 1.67 8.01
Jim Brower SP 1.47 8.13 Darren Oliver SP -0.06 3.78
Rick Helling SP 1.67 8.01 Kenny Rogers SP -0.37 1.96
Ryan Dempster SP 0.53 7.65 Aaron Myette SP -0.79 0.19
BULLPEN POS OWAR OWS BULLPEN POS AWAR AWS
Robb Nen RP 1.3 13.82 Jeff Zimmerman RP 3.13 13.09
Jeff Zimmerman RP 3.13 13.09 Mike Venafro RP 0.24 4.77
Danny Patterson RP 1.29 6.66 Pat Mahomes RP -0.13 3.56
Scott Stewart RP 0.73 5.53 Juan Moreno RP 0.44 3
Mike Venafro RP 0.24 4.77 Chris Michalak RP 0.49 1.82
Darren Oliver SP -0.06 3.78 Danny Kolb RP 0.16 0.85
Bobby Witt SP 0.5 2.53 Jeff Brantley RP 0.09 0.68
Kenny Rogers SP -0.37 1.96 J. D. Smart RP -0.15 0.26
Scott Eyre RP 0.34 1.82 Mark Petkovsek RP -1.42 0.22
Brian Bohanon SP 0.11 1.78 Francisco Cordero RP 0.06 0.1
Joey Eischen RP 0 1.27 Rob Bell SP -1.14 0.08
Danny Kolb RP 0.16 0.85 R. A. Dickey RP -0.17 0.01
Luis Pineda RP -0.01 0.65 Kevin Foster RP -0.32 0.01
Mark Petkovsek RP -1.42 0.22 Joaquin Benoit SP -0.2 0
Billy Taylor RP 0.01 0.1 Tim Crabtree RP -0.39 0
R. A. Dickey RP -0.17 0.01 Justin Duchscherer SP -0.8 0
Joaquin Benoit SP -0.2 0 Ryan Glynn SP -0.51 0
Ryan Glynn SP -0.51 0 Jonathan Johnson RP -0.44 0
Jonathan Johnson RP -0.44 0 Mike Judd SP -0.33 0
Brandon Knight RP -0.54 0 Brandon Villafuerte RP -0.51 0
Matt Whiteside RP -0.61 0

 Notable Transactions

Sammy Sosa 

July 29, 1989: Traded by the Texas Rangers with Wilson Alvarez and Scott Fletcher to the Chicago White Sox for Harold Baines and Fred Manrique.

March 30, 1992: Traded by the Chicago White Sox with Ken Patterson to the Chicago Cubs for George Bell.

Rich Aurilia

December 22, 1994: Traded by the Texas Rangers with Desi Wilson to the San Francisco Giants for John Burkett.

Juan Gonzalez

November 2, 1999: Traded by the Texas Rangers with Danny Patterson and Gregg Zaun to the Detroit Tigers for Alan Webb (minors), Frank Catalanotto, Francisco Cordero, Bill Haselman, Gabe Kapler and Justin Thompson.

November 1, 2000: Granted Free Agency.

January 9, 2001: Signed as a Free Agent with the Cleveland Indians. 

Robb Nen

July 17, 1993: Traded by the Texas Rangers with Kurt Miller to the Florida Marlins for Cris Carpenter.

November 18, 1997: Traded by the Florida Marlins to the San Francisco Giants for Mick Pageler (minors), Mike Villano (minors) and Joe Fontenot.

Rey Sanchez 

January 3, 1990: Traded by the Texas Rangers to the Chicago Cubs for Bryan House (minors).

August 16, 1997: Traded by the Chicago Cubs to the New York Yankees for Frisco Parotte (minors).

November 3, 1997: Granted Free Agency.

January 22, 1998: Signed as a Free Agent with the San Francisco Giants.

November 5, 1998: Granted Free Agency.

December 11, 1998: Signed as a Free Agent with the Kansas City Royals.

October 29, 1999: Granted Free Agency.

December 7, 1999: Signed as a Free Agent with the Kansas City Royals. 

Jose Hernandez 

April 3, 1992: Selected off waivers by the Cleveland Indians from the Texas Rangers.

June 1, 1993: Traded by the Cleveland Indians to the Chicago Cubs for Heathcliff Slocumb.

July 31, 1999: Traded by the Chicago Cubs with Terry Mulholland to the Atlanta Braves for a player to be named later, Micah Bowie and Ruben Quevedo. The Atlanta Braves sent Joey Nation (August 24, 1999) to the Chicago Cubs to complete the trade.

November 5, 1999: Granted Free Agency.

December 16, 1999: Signed as a Free Agent with the Milwaukee Brewers.

Kevin J. Brown 

October 15, 1994: Granted Free Agency.

April 9, 1995: Signed as a Free Agent with the Baltimore Orioles.

November 3, 1995: Granted Free Agency.

December 22, 1995: Signed as a Free Agent with the Florida Marlins.

December 15, 1997: Traded by the Florida Marlins to the San Diego Padres for Steve Hoff (minors), Derrek Lee and Rafael Medina.

October 26, 1998: Granted Free Agency.

December 12, 1998: Signed as a Free Agent with the Los Angeles Dodgers.

Honorable Mention

The 2007 Texas Rangers 

OWAR: 36.9     OWS: 249     OPW%: .496     (80-82)

AWAR: 27.8      AWS: 225     APW%: .463     (75-87)

WARdiff: 9.1                        WSdiff: 24  

Texas finished a distant sixteen games behind Seattle in ’07. Carlos Pena (.282/46/121) registered 99 tallies and achieved personal-bests in virtually every offensive category. Mark Teixeira tagged 30 long balls, drove in 105 baserunners and contributed a .306 BA. Ian Kinsler swiped 23 bases in 25 attempts, scored 96 runs and clubbed 20 dingers during his sophomore season. Travis “Pronk” Hafner blasted 24 dingers and eclipsed the century mark in RBI for the fourth consecutive campaign. Ivan Rodriguez drilled 31 two-base hits while third-sacker Edwin Encarnacion delivered a .289 BA with 16 jacks. Aaron Harang (16-6, 3.73) posted a career-best 1.144 WHIP and placed fourth in the Cy Young balloting. Joaquin Benoit whiffed 87 batsmen over 82 innings while furnishing a 2.85 ERA along with a WHIP of 1.171.

On Deck

What Might Have Been – The “Original” 2003 Indians

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database 

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive


The 2017 Atlanta Braves: A .500 Team?

The 2016 Atlanta Braves were built to suck.  After all, starting a season 0-9 basically kills any hope left in the fan base, and gets them prepared for the contagious losing.  For the few fans who paid to go see their beloved Braves play in the now retired Turner Field, losing 93 games is heartbreaking.  A large volume of articles exists detailing the extent at which the Atlanta Braves, under both John Hart and John Coppolella, are remodeling their organization.  This article serves the purpose of examining one thing:

2016 Atlanta Braves

 

Record

Runs Scored Runs Against

Run Differential

First Half

31-58 307 414 -107

Second Half

37-35 342 265

-20

That’s right! The 2016 second-half Atlanta Braves won more games than they lost!  If you did not already know this you either (a) are not a Braves fan, or (b) could not manage to care less.  However, this could have some real value behind it.  While the Braves managed to be outscored by 20 runs in the second half, they still managed to win two more games than they lost.  They scored 35 more runs in 17 fewer games.  Their runs/game increased 3.45 to 4.75, which would have placed them in between the Mariners and Cardinals in that regard had it been 4.75 the entire 2016 season.  The most important takeaway is how much better the second-half Braves were at preventing runs — 149 fewer runs allowed than in the first half.  Shaving off that many runs in only 17 fewer games is huge.

But let’s not get ahead of ourselves.  A winning record is unsustainable at a deficit of 20 runs in 72 games.  But I am not asking whether the 2017 Atlanta Braves can win even 82 games.  Can they win 81?  Could the great finish down the stretch of the 2016 season carry over into 2017?  While going .500 is technically meaningless because a .500 team will not make the playoffs, not losing more than they win in the new SunTrust Park will energize the organization and the fan base, and prepare the team for future success.

When the 2016-2017 offseason kicked off, the Braves signed two popular starting pitchers, and acquired one via trade, to eat innings so their crop of young pitching could ripen on the farm.

Braves 2017 Offseason Acquisitions (2016 Statistics)

 

Record

ERA FIP BB/9 K/9 WHIP

WAR

Bartolo Colon

15-8 3.43 3.99 1.50 6.01 1.21

2.9

R.A. Dickey

10-15 4.46 5.03 3.34 6.68 1.37

1.0

Jaime Garcia 10-13 4.67 4.49 2.99 7.86 1.37

1.2

Two of the three had subpar years in 2016.  The other one became an internet sensation for his antics in the batter’s box and even hit a homer against the San Diego Padres.  But let’s assess what each pitcher brings to Atlanta’s rotation.

Bartolo Colon ages like a fine wine.  His ERA was better last year than any Atlanta starter except Julio Teheran.  While pitching record is not a statistic to measure performance, it is worth noting he won more games last year than any Atlanta starter.  He was better pretty much across the board than anyone not named Julio Teheran.  But can he keep this level of production up?  I would like to think so.  His two-seam velocity has stayed relatively consistent over the past three years.  All the Braves should ask Colon to do is turn in around 20 quality starts (he turned in 19 last year).  Consistency was a hallmark of his time with the Mets, and should continue in Atlanta for at least the 2017 season.

The other old guy the Atlanta Braves picked up this offseason happens to be knuckleballer — R.A. Dickey, 2012 National League Cy Young Award winner.  While Dickey will more than likely not be in the running for any hardware as he nears his 43rd birthday, he can still meet the immediate needs of his new team.  From 2011 to 2015, Dickey’s lowest inning count was 208.2, and peaked in his legendary 2012 with 233.2 innings pitched.  This is what the Braves need.  They need Dickey to turn in a mountain of good, quality innings.  If he could get over 200 innings again, and remain viable at the big-league level, then it is mission accomplished.

The third major addition to the Atlanta rotation is southpaw pitcher Jaime Garcia.  On December 1 of last year, the St. Louis Cardinals accepted minor-league infielder Luke Dykstra, right-handed pitcher John Gant, and righty Chris Ellis for Garcia’s services.  First, let’s look at the positives of this — Garcia is a definite mid-rotation talent, who posted a 3.73 ERA in 31.1 IP and a 3.18 ERA in 28.1 IP in April and May of last year, respectively.  He gives Atlanta a lefty in a rotation filled with righties.  The downside?  His low ERAs early in the season turned into a 5.40 ERA in June and a 5.60 ERA in the second half of the season.  So much for success in the second half driving this article, right?  Let’s remain optimistic.  After all, that is the whole purpose of this.  Garcia’s HR/FB rate was up from 7.1% in 2015 to a ghastly 20.2% in 2016.  He got consumed by the league-wide power surge.  I do not think such a high rate is sustainable or will happen again.

Let’s make a prediction.  Bartolo Colon makes us all fall back in love with “The Great Bart-Bino” all over again and he turns in around 16-20 quality starts for the upstart Braves.  Dickey, the workhorse of the staff, follows suit and dizzies batters with his knuckler for over 200 innings.  Garcia returns to early-2016 form, and posts something in the ballpark of 1.5 WAR.  Of course, the likelihood of all three scenarios playing out is small, but what I am trying to get across is it is possible.

Now, time to switch gears. The Braves lineup has changed its look dramatically since this time last year, sticking with a solid mixture of recognizable names and some guy named Dansby Swanson.  Here is a look at their projected Opening Day lineup:

2017 Atlanta Braves

Position

Name Bats 2016 WAR

Projected WAR

CF

Ender Inciarte L 3.8

2.5

SS

Dansby Swanson R 0.9

2.4

1B

Freddie Freeman L 6.5

3.7

LF

Matt Kemp R 0.0

0.0

RF

Nick Markakis L 1.7

0.5

2B

Brandon Phillips R 0.8

0.7

3B

Adonis Garcia R 0.2

0.2

C

Tyler Flowers

R 0.3

0.7

The projected WAR was retrieved from FanGraphs.com ZiPS projection

Look at the first half of their lineup.  To me, those three guys, Inciarte, Swanson, and Freeman, look like the core of a team poised to wreak havoc on the NL East before the end of this decade.  It is hard to project exactly what we are going to get out of Dansby Swanson, but most Braves fans and analysts expect him to take reign as the face of the franchise.

Starting in the leadoff spot is Ender Inciarte, who was brought over as icing on the cake in the Shelby Miller trade that landed Swanson and pitching prospect Aaron Blair.  In his first year in Atlanta, Inciarte posted a .732 OPS and won a Gold Glove for his outstanding play in center field.  I really could not think of a better leadoff guy for the Braves.  He is signed through 2021 at a team-friendly cost of $30.5 million, with a $9-million team option in 2022.  In his first years in the bigs, Inciarte has played in at least 118 games, posted a WAR above 3.7 (produced a figure of 5.3 in 2015), and shows no sign of slowing down as his prime years lay ahead. What if he crosses the 3.0 WAR plateau for the fourth time in four seasons, and maybe even adds another Gold Glove?  That is all his organization needs out of him.

Inciarte is a vital part of the Braves defense, which, according to 2017 PECOTA projections, leads the NL East in Fielding Runs Above Average (they are projected to attain an average figure of 3.6, while the other four teams are either at 0.0 or negative).  BaseballProspectus.com explains FRAA as an “individual defensive metric created using play-by-play data with adjustments made based on plays made, the expected numbers of plays per position, the handedness of the batter, the park, and base-out states.”  In short, the higher the number, the better the fielder, and vice versa.  The higher the team average, the better the team is overall in the field.  In his Gold Glove campaign, Inciarte registered a FRAA of 23.0, according to BP.  The graduation of Dansby Swanson and the addition of web-gem-prone second baseman Brandon Phillips will certainly strengthen the middle cone of the field.  Just how good is this team going to be at preventing runs?  Many projection systems think they will be around the top of their division, and many fans are excited to see the double-play tandem of Swanson and Phillips at work.

Freddie Freeman is the undisputed anchor of the lineup, and has finally seen the Braves ADD instead of SUBTRACT from the lineup around him.  The addition of Matt Kemp has helped tremendously.  With a recognizable slugger swinging behind Freeman, managers and pitchers had to pitch to him in the latter months of the year. With Kemp slotted behind him, Freeman hit to the tune of a .340/.456/.665 slash with 16 home runs and 18 doubles.  Kemp also matched the theme of this article with a strong second half — hitting .280/.336/.519 with 12 bombs in 241 plate appearances as a Brave.  The duo should have Braves fans excited for a full season of similar production from Freeman if Kemp is behind him.  Kemp, on the other hand, has a lower bar to pass, and could re-tool his value as an offensive player in his first full year off the West Coast.

So why is it unreasonable for the 2017 Atlanta Braves to win 81 games?  I do not think it is that far-fetched.  This article has not mentioned their incredibly deep farm system, which includes guys such as Ozzie Albies, Sean Newcomb, and Lucas Sims, but instead focuses on the immediate roster — a roster which has the potential to do unexpected things in 2017.  The dominoes would have to fall in all the right places, but this is baseball.  Anything is possible.

Theodore Hooper’s Official 2017 Atlanta Braves Prediction: 81-81

 

The statistics used in this study were found on BaseballProspectus.com, Baseball-Reference.com and FanGraphs.com, and the rosters on RosterResource.com were a great help in referencing players and transactions. 


Let’s Build Our Own Catch Probability Metric

By now you’ve seen the Statcast Catch Probabilities. They’re great! Or, at the very least, they’re a shiny new toy to play with until the regular season rolls around. But, as you may have noticed, there are a few frustrating details about it — namely, the actual math behind the statistic is completely opaque, and the details about when an individual catch happened are hard to find. So let’s fix those two problems! We’ll create a catch probability metric that anyone can compute in Excel, using data that anyone can download easily.

You may have noticed a problem with this plan, though — the data that is used for the official Statcast catch probability isn’t easily accessible. We’ll have to make do with what we can get from the Statcast search at Baseball Savant. Specifically, instead of using hang time and distance traveled, we’ll use exit velocity and launch angle. Note that this completely disregards defensive positioning and it even disregards the horizontal angle off the bat*! It’s going to make for a less perfect metric, of course, but (spoiler alert) it will turn out okay.

*This really makes more sense if you think about it in terms of probability of the hitter making an out. The old saying goes “hit ’em where they ain’t” but in recent years we’ve come to understand that it’s really “hit it hard and in the air.”

I’m not going to go into the details of how I computed this metric; it’s standard machine learning stuff. If you want to follow along with the computation, I’ve put my code up on GitHub. Instead of going through all that here, I’ll just jump to the finish line: the formula for catch probability ends up being

1/(1+exp(-(-10.152 + 0.057 * hit_speed + 0.218 * hit_angle)))

Now you might be worried that such a simple formula, excluding tons of information, might be totally worthless. I was worried about that too! But applying this formula to a test set revealed this formula to be surprisingly accurate:

Catch Probability Assessment
Statistic Value
Accuracy 0.8385
Precision 0.8338
Recall 0.8671
F1 0.8501

(if you’ve never seen those numbers before — closer to 1 is better. Trust me, it’s pretty good.)

Well, that’s all well and good, but how can you get this for yourself and play around with it? Start by downloading the data you’re interested in from Baseball Savant. For instance, you can get all the data from, say, May 1 of last year by going here. Download the CSV with the link at the bottom and then you can simply add the above formula in a new column in Excel. If you need a concrete example of how this looks in Google Sheets, I’ve put one here.

Okay, now you’ve got this, but what are you going to do with it? One possibility is to use this to try to figure out which plays the official metric estimated as being difficult. For instance, let’s say you’ve noticed that Miguel Sano made two highlight-quality plays but you don’t know Mike Petriello well enough to ask him which ones those are. Just compute your own probabilities and you’re off! Although, as expected, the numbers differ. Our numbers do have Sano making two plays in the 0-25% range, but they’re not the same ones that Statcast flagged (sorry about the quality of the GIFs).

Catch #1: estimated catch probability 18.3%
https://gfycat.com/IdealisticNecessaryIslandwhistler

Catch #2: estimated catch probability 21.3%
https://gfycat.com/ChubbyUnkemptBunny
The Twins announcers praised his first step in the former video, while in the second they talked about how the ball “hung up” for Sano to be able to catch it. Not spectacular plays by any means, but neither were the other two, of course.

Finally, because I’m sure you’re curious, here’s the top catch of 2016 according to this metric (estimated catch probability: 8.6%).
https://gfycat.com/DearJaggedFish
Of course it’s a Kevin Kiermaier catch. Hey, at least we know we’re doing something right.


Desert Optimism

I recently had the opportunity to tour Chase Field, home of the Arizona Diamondbacks.  While there, I saw a lot of banners for Zack Greinke.  After all, he is the face of the franchise (if you’re not considering Paul Goldschmidt).  After signing a six-year/$206.5-million contract before the 2016 season, Greinke changed the focus and the philosophy of the Diamondbacks.  Suddenly, they were contenders.

After signing Greinke, the D-Backs traded for Shelby Miller, who was coming off what many considered one of the best years in baseball.  However, his price was laughable.  It cost Arizona top prospect Dansby Swanson, who has emerged as a candidate for a franchise player in Atlanta.  They also coughed up Ender Inciarte, a very capable center fielder who posted a .732 OPS and won a Gold Glove in 2016.  But wait, there’s more! The Braves also received pitching prospect Aaron Blair.

The purpose of this study is not to criticize former General Manager Dave Stewart’s transactions.  After all, he truly believed, after signing ace Zack Greinke, the Diamondbacks were in a position to win — and rightly so.  Stewart felt, as did many people inside the Arizona organization, their core was established.  Below is their lineup in 2016, with the players being who played the most at their position:

POSITION Name 2016 WAR Total
C Welington Castillo 2.4
1B Paul Goldschmidt 4.8
2B Jean Segura 5.7
3B Jake Lamb 2.6
SS Nick Ahmed 0.2
LF Brandon Drury 0.0
CF Michael Bourn 0.3
RF Yasmany Tomas -0.4
Total 15.6

 

AJ Pollock, who was coming off an All-Star season in which he produced 7.4 WAR and posted an .865 OPS, played in 12 games.  Inciarte was traded to the Braves after providing 5.3 WAR playing right field in 2015.  David Peralta, who started in left field in 2015, played in 48 games last year.  Nick Ahmed also had an injury-plagued season following a strong 2015 in which he put up 2.5 WAR in his first full year in the MLB.

The injuries to Pollock, Peralta, and Ahmed were unfortunate.  The Diamondbacks got near or around replacement-level production from their positions in 2016.  In a hypothetical situation, let’s say the three guys stay healthy, and, after subtracting their counterparts’ production, up the total runs scored by the Diamondbacks from 752 to 790 runs.  After some number crunching, the Diamondbacks’ Pythagorean expectation comes out to around 71 wins.  Give or take a few, a healthy trio of Pollock, Peralta, and Ahmed would have helped Arizona’s win expectation increase by between two and five games.

But let’s be optimistic — the hypothetical healthy trio helps Arizona to an expected 74-88 record, far better than their 69-93 actual record.  That would have moved them up in the standings from fourth in the NL West to…drum roll please…fourth in the NL West.  The problem Arizona experienced in 2016 was run prevention, not run support.  As a matter of fact, total runs increased to 752 from 720 in 2015, when they went 82-80.  However, the real increase was in runs allowed — up to 890 (!!!) in 2016, as opposed to 713 in 2015.

So why does a pitching staff that added Zack Greinke, a bonafide ace and top-tier talent, and Shelby Miller, who would fit well in the center of any rotation, give up such a whopping number of runs?  Catching.  Below is a chart of how many runs these two respective pitchers had prevented or added by their respective catchers in 2015:

Pitcher Team Catcher Framing Runs Rank
Zack Greinke LAD Yasmani Grandal +23.3 1st
Shelby Miller ATL AJ Pierzynski -8.7 103rd

 

As you can see, any pitcher would love to pitch to Yasmani Grandal.  In 2015, he ranked as the best in framing runs.  Essentially, what the statistic does is quantify the catcher’s ability to get strikes called, which is incredibly valuable to a staff.  Positive is good and negative is bad.  While there is not as direct a correlation between Shelby Miller’s success and AJ Pierzynski’s lack of pitch-framing ability, it is apparent there is a direct link between Greinke’s 2015 performance and Yasmani Grandal.

In 2016, Greinke and Miller both joined a staff caught by Welington Castillo.  The best way to describe Welington is he’s an offense-first, defense-second catcher.  The theme of this study is to advocate for the use of defense-first, offense-second catchers.  Look at this chart of past World Series champion catchers:

Year Team Name Framing Runs Rank
2012 SFG Buster Posey +20.0 4th
2013 BOS Jarrod Saltalamacchia -4.6 93rd
2014 SFG Buster Posey +21.5 2nd
2015 KCR Salvador Perez -7.5 99th
2016 CHC Miguel Montero +14.6 4th

After looking at that chart, there are a couple of observations to make.  One, three out of the five previous World Series teams have had top-four catchers in terms of pitch framing and pitch presentation.  Second, Jarrod Saltalamacchia was replaced by AJ Pierzynski who was replaced by Blake Swihart who is now competing with Sandy Leon and Christian Vazquez, both of whom are defense-first catchers lauded for their ability to frame pitches.  Third, Salvador Perez is the heart and soul of the Kansas City Royals, and I guarantee Dayton Moore could not care less about his pitch-framing abilities.

Essentially, what you should take away from this is teams that win have skilled catchers.  Luckily for the Giants, Buster Posey can also hit the baseball.  To bring this full circle back to the Diamondbacks — Wellington Castillo is the wrong type of catcher.  He does not frame like Posey or Montero, and the bat is nothing too special.

But alas! Castillo is no longer part of the Arizona organization! This offseason, freshly-appointed general manager Mike Hazen has added four new catchers to the picture: Chris Iannetta, Jeff Mathis, Hank Conger, and Josh Thole.  Let’s look at their pitch-framing stats from last year:

Name Team Framing Chances Framing Runs Rank
Chris Iannetta SEA 5,495 -13.8 102nd
Jeff Mathis MIA 2,248 +7.2 15th
Hank Conger TBR 2,366 +3.6 25th
Josh Thole TOR 2,410 +4.6 21st

 

As you can see, the Diamondbacks have added a starting catcher who is not very good at framing pitches and three back-ups who do or might fit the desirable profile of this study.  Chris Iannetta signed a $1.5 million, one-year deal; Mathis signed a $4 million, one-year deal; the other two are minor-league contracts. Hazen, who came over to the Diamondbacks from the Boston Red Sox (who are leaning towards more defense-first options at catcher), made some efforts to boost his catching corps’ defensive ability, but was it enough?

In a perfect world, I think a guy like Jason Castro fits the bill perfectly in Arizona.  While the financial situations in Arizona may have made the price for Castro too high, he fits the type of catcher this study calls for, and the type of catcher Zack Greinke and Shelby Miller deserve.  He tallied +16.3 framing runs in 6,623 chances in 2016, good for third in MLB behind Buster Posey and Yasmani Grandal.  He signed for $24.5 million over three years with the Minnesota Twins, and will surely help their young staff develop.

Let’s not dwell on the hypotheticals.  The Diamondbacks have five and a half million dollars invested in two guys: Chris Iannetta and Jeff Mathis.  While Iannetta had an abysmal year in 2016 in terms of framing runs, his track record is mixed.  In 2013, for example, he recorded a framing-runs figure of -16.6, which is comparable to his 2016 number.  In 2015, however, he recorded a figure of +13.1, good for fifth in all of baseball.  What caused such a dramatic, roller-coaster shift?  I do not know — that question could be the subject of an entire different study.

Should Iannetta get most of the starts, I would say Mike Hazen would not care if he hits below the Mendoza line if his defensive statistics match his 2015 numbers.  Should he not get most of the starts at catcher, they will more than likely go to veteran backstop Jeff Mathis.  Mathis, who is lauded for his skills behind the plate, is essentially a cheap Jason Castro.  If you divide the number of framing runs Mathis achieved in 2,248 chances last year, and multiply the decimal by Castro’s number of chances, you get around a number of +21.2 framing runs.  That would have ranked him third behind Grandal and Posey.  Of course, this method is unreliable because every chance is another chance for his framing runs to drop as well as increase.  With that being said, the efficiency of Mathis behind the plate makes giving him a chance to handle the Diamondbacks’ staff worthwhile.

The addition of Taijuan Walker, who was the return on shipping Jean Segura to Seattle, is a healthy investment in the pitching staff.  With him slotting in along with Zack Greinke, Shelby Miller, Robbie Ray, and Patrick Corbin, the Diamondbacks have the makeup of a sleeper-type rotation — one that could surprise a lot of people in 2017.  If the front office has embraced the importance of defense at the catcher position like their offseason moves suggest, their staff could cut down on runs allowed dramatically, putting their lineup in position to do some damage in the NL West this year.

One team who should be noted in this study is the Houston Astros.  Whether Jeff Luhnow’s front office emphasized framing runs and having defensively-elite catchers or not, two of the catchers mentioned in this study were teammates in Houston — Jason Castro and Hank Conger.  Castro and Conger were the only two backstops on the 2015 Houston Astros, the year Dallas Keuchel won the American League Cy Young award.  This serves the purpose of further validating the benefits a defense-first catcher can have on a pitching staff.

In conclusion, baseball is trending toward sacrificing offense for defense at a premium position.  One club that can change the face of their organization by embracing the principles outlined in this study is the Arizona Diamondbacks.  While the Diamondbacks may face public scrutiny for far after Shelby Miller and Zack Greinke are gone, fans should be optimistic about 2017.  An elite defensive catcher can make a world of difference in the performance of a pitching staff.

 

The statistics used in this study were found on baseballprospectus.com, the historical rosters and statistics were found on baseball-reference.com and fangraphs.com, and rosterresource.com was a great help in referencing players and transactions.


When Do Managers Use the Hook?

For the uninitiated, this piece heavily relies on my previous work around refining the inning/score matrix to quantify bullpen usage, and more recently, using RE24 to adjust the score differential for the base/out state in cases where the pitcher is not entering into a “clean” inning.

In that most recent piece, I concluded by alluding to a sort of “leaderboard” for base/out state adjustments. One hypothesis that you might have – certainly, one that this author had – was that we might see elite non-closers at the top of the list, implying that those pitchers are being brought in with runners on base more often than usual. Although closers are generally among the most highly-regarded relief pitchers in the game, the managerial status-quo has been to use closers almost exclusively in the “clean inning” state entering the 9th. Thus, while closers might not lead in terms of score adjustments due to inherited runners, an elite setup man certainly might.

Without further ado, here’s what that leaderboard looked like in 2016.

Largest Average Negative Score Adjustments
Player Team # Apps Mean Adj. Score Mean Adj. Inn Score Diff Inn Diff
Colton Murray PHI 24 -2.30 6.90 -0.22 0.15
Chaz Roe ATL 21 -0.73 7.57 -0.21 0.11
Gavin Floyd TOR 28 0.54 8.04 -0.21 0.11
Dean Kiekhefer STL 26 -1.78 7.59 -0.21 0.13
Alex Wilson DET 62 0.18 6.97 -0.19 0.13
Carl Edwards CHC 36 1.31 7.84 -0.19 0.15
James Hoyt HOU 22 -1.77 7.26 -0.18 0.26
Jordan Lyles COL 35 0.68 7.34 -0.18 0.09
Tommy Layne NYY 29 0.83 7.49 -0.17 0.25
Matt Bowman STL 59 1.08 7.28 -0.17 0.06

So… this isn’t exactly what I thought I’d find. There aren’t any closers in this group, but there really aren’t many top-flight middle relievers, either. If anything, this group came in when the team was tied or trailing more often than not. What’s going on here?

What we can’t discern is whether mid-inning appearances tend to be high-leverage affairs. There are most certainly cases where long men are used in the middle of the 4th inning to relieve an ineffective starter. That situation isn’t interesting in a vacuum; but it may be interesting to know what portion of those mid-inning appearances are of this low-leverage variety, and which are of the high-leverage variety.

One way that we can answer this question is to stratify qualifying relief pitchers by their average inning when entering the game. To accomplish this, let’s define a “closer” as a pitcher with an average inning of 8.5 or higher, and a “middle reliever” as a pitcher with an average inning between 7 and 8.5. Then we can look at the percentage of appearances for each group which were not “clean” innings.

AppearancesByRPType1216
(Click the graph for an interactive version)

As you might expect – even if you vehemently disagree with the practice – closers very rarely enter the game mid-inning. 85-90% of their appearances come in clean innings. Middle relievers, on the other hand, come into the game at the start of an inning closer to 60-65% of the time. That number has been on the rise recently, which seems a bit odd, or at least, at odds with what we’ve seen in the postseason recently (more on that in a bit).

Some small percentage of the time – the area between the lines of the same color – pitching changes are made with 1 or 2 outs in the inning but with no one on base. This is probably not optimal: The pitcher coming into that situation has an easier-than-average job, as they’re essentially getting a shortened inning to work through. If a guy like Dellin Betances can face 300 batters in a season, why waste 20 of them on situations that are easier than average?

The orange lines represent a subset of the overall middle relief group where the team in question is either tied or has no greater than a 3-run lead, in either the 7th or 8th inning. These are situations of high importance and leverage. An effective manager might be employing mid-inning pitching changes more often in these situations in order to limit damage and preserve leads.

Yet, this subset isn’t very different than the overall middle relief group. Whatever difference exited in 2012 and 2013 has been eroded in the last few years, as part of a general trend: Mid-inning appearances in the regular season are becoming less common.

As a final step, let’s contrast this picture of usage with an analogous graph on postseason appearances. We’ll maintain the same definitions of “closer” and “middle reliever” for consistency.

PlayoffAppsByRPType1216
(Click the graph for an interactive version)

Chaos! This graph looks more disorganized than the regular-season version, but then again, the postseason is more chaotic in general. We’re dealing with smaller samples and we can’t put too much faith into these trends. That said, two things stand out when comparing postseason usage to regular-season usage:

  • Closers are no longer treated as a special species. Even through 2014, closers were entering postseason games in clean innings about 80% of the time. In the postseason! When the managers are paying attention! When there are high-leverage situations at every turn! But in the past two seasons, closers have been used increasingly with runners on base – in fact, even more so than middle relievers have in close/lead situations during that time. Again, small samples, but this screams efficiency. If your closer is your most effective weapon, you should be using him with runners on base and a late lead, instead of using your second-most effective weapon instead.
  • Middle relievers have been used more often in “matchup” situations. 2014 and 2016 stand out in this regard, and it probably has something to do with guys named Bochy and Maddon representing large shares of the sample in those years. Recall that the gap between the dotted and solid lines of the same color represents the frequency of “1+ out, 0 on” appearances. Those gaps are huge in 2014 and 2016! While mid-inning appearances among all classes of pitchers were highest in 2016, that’s not the case at all for “men on base” appearances, which were more or less in line with historical norms. This represents an increase in match-up-based thinking, not leverage-based thinking.

These graphs look different, and they probably always will. Teams have relatively fewer resource constraints in the bullpen come October. They have more days off between games, and fewer games to budget resources for in the future.

That said, there’s been no carryover at all from the wild, and relatively new, bullpen management seen in the postseasons of 2015 and 2016. Constraints will limit the extent to which managers can call upon their best arms with runners on base late in games, but it would be hard to imagine that a status quo which holds the closer for the 9th inning almost 90% of the time can’t be improved upon in some way. Teams have spent more on bullpens, but they haven’t figured out how to use them any more efficiently in the regular season, and the differences we’ve witnessed in the postseason show that they’re only getting it about half right, even when it matters most.


Basic Machine Learning With R (Part 3)

Previous parts in this series: Part 1 | Part 2

If you’ve read the first two parts of this series, you already know how to do some pretty cool machine-learning stuff, but there’s still a lot to learn. Today, we will be updating this nearly seven-year-old chart featured on Tom Tango’s website. We haven’t done anything with Statcast data yet, so that will be cool. More importantly, though, this will present us with a good opportunity to work with an imperfect data set. My motto is “machine learning is easy — getting the data is hard,” and this exercise will prove it. As always, the code presented here is on my GitHub.

The goal today is to take exit velocity and launch angle, and then predict the batted-ball type from those two features. Hopefully by now you can recognize that this is a classification problem. The question becomes, where do we get the data we need to solve it? Let’s head over to the invaluable Statcast search at Baseball Savant to take care of this. We want to restrict ourselves to just balls in play, and to simplify things, let’s just take 2016 data. You can download the data from Baseball Savant in CSV format, but if you ask it for too much data, it won’t let you. I recommend taking the data a month at a time, like in this example page. You’ll want to scroll down and click the little icon in the top right of the results to download your CSV.

View post on imgur.com


Go ahead and do that for every month of the 2016 season and put all the resulting CSVs in the same folder (I called mine statcast_data). Once that’s done, we can begin processing it.

Let’s load the data into R using a trick I found online (Google is your friend when it comes to learning a new programming language — or even using one you’re already pretty good at!).

filenames <- list.files(path = "statcast_data", full.names=TRUE)
data_raw <- do.call("rbind", lapply(filenames, read.csv, header = TRUE))

The columns we want here are “hit_speed”, “hit_angle”, and “events”, so let’s create a new data frame with only those columns and take a look at it.

data <- data_raw[,c("hit_speed","hit_angle","events")]
str(data)

 

'data.frame':	127325 obs. of  3 variables:
 $ hit_speed: Factor w/ 883 levels "100.0","100.1",..: 787 11 643 ...
 $ hit_angle: Factor w/ 12868 levels "-0.01               ",..: 7766 1975 5158  ...
 $ events   : Factor w/ 25 levels "Batter Interference",..: 17 8 11 ...

Well, it had to happen eventually. See how all of these columns are listed as “Factor” even though some of them are clearly numeric? Let’s convert those columns to numeric values.

data$hit_speed <- as.numeric(as.character(data$hit_speed))
data$hit_angle <- as.numeric(as.character(data$hit_angle))

There is also some missing data in this data set. There are several ways to deal with such issues, but we’re just simply going to remove any rows with missing data.

data <- na.omit(data)

Let’s next take a look at the data in the “events” column, to see what we’re dealing with there.

unique(data$events)

 

 [1] Field Error         Flyout              Single             
 [4] Pop Out             Groundout           Double Play        
 [7] Lineout             Home Run            Double             
[10] Forceout            Grounded Into DP    Sac Fly            
[13] Triple              Fielders Choice Out Fielders Choice    
[16] Bunt Groundout      Sac Bunt            Sac Fly DP         
[19] Triple Play         Fan interference    Bunt Pop Out       
[22] Batter Interference
25 Levels: Batter Interference Bunt Groundout ... Sacrifice Bunt DP

The original classification from Tango’s site had only five levels — POP, GB, FLY, LD, HR — but we’ve got over 20. We’ll have to (a) restrict to columns that look like something we can classify and (b) convert them to the levels we’re after. Thanks to another tip I got from Googling, we can do it like this:

library(plyr)
data$events <- revalue(data$events, c("Pop Out"="Pop",
      "Bunt Pop Out"="Pop","Flyout"="Fly","Sac Fly"="Fly",
      "Bunt Groundout"="GB","Groundout"="GB","Grounded Into DP"="GB",
      "Lineout"="Liner","Home Run"="HR"))
# Take another look to be sure
unique(data$events)
# The data looks good except there are too many levels.  Let's re-factor
data$events <- factor(data$events)
# Re-index to be sure
rownames(data) <- NULL
# Make 100% sure!
str(data)

Oof! See how much work that was? We’re several dozen lines of code into this problem and we haven’t even started the machine learning yet! But that’s fine; the machine learning itself is the easy part. Let’s do that now.

library(caret)
inTrain <- createDataPartition(data$events,p=0.7,list=FALSE)
training <- data[inTrain,]
testing <- data[-inTrain,]

method <- 'rf' # sure, random forest again, why not
# train the model
ctrl <- trainControl(method = 'repeatedcv', number = 5, repeats = 5)
modelFit <- train(events ~ ., method=method, data=training, trControl=ctrl)

# Run the model on the test set
predicted <- predict(modelFit,newdata=testing)
# Check out the confusion matrix
confusionMatrix(predicted, testing$events)

 

Prediction   GB  Pop  Fly   HR Liner
     GB    9059    5    4    1   244
     Pop      3 1156  123    0    20
     Fly      6  152 5166  367   457
     HR       0    0  360 1182    85
     Liner  230   13  449   77  2299

We did it! And the confusion matrix looks pretty good. All we need to do now is view it, and we can make a very pretty visualization of this data with the amazing Plotly package for R:

#install.packages('plotly')
library(plotly)
# Exit velocities from 40 to 120
x <- seq(40,120,by=1)
# Hit angles from 10 to 50
y <- seq(10,50,by=1)
# Make a data frame of the relevant x and y values
plotDF <- data.frame(expand.grid(x,y))
# Add the correct column names
colnames(plotDF) <- c('hit_speed','hit_angle')
# Add the classification
plotPredictions <- predict(modelFit,newdata=plotDF)
plotDF$pred <- plotPredictions

p <- plot_ly(data=plotDF, x=~hit_speed, y = ~hit_angle, color=~pred, type="scatter", mode="markers") %>%
    layout(title = "Exit Velocity + Launch Angle = WIN")
p

View post on imgur.com


Awesome! It’s a *little* noisy, but overall not too bad. And it does kinda look like the original, which is reassuring.

That’s it! That’s all I have to say about machine learning. At this point, Google is your friend if you want to learn more. There are also some great classes online you can try, if you’re especially motivated. Enjoy, and I look forward to seeing what you can do with this!


Hardball Retrospective – What Might Have Been – The “Original” 1999 White Sox

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the teams with the biggest single-season difference in the WAR and Win Shares for the “Original” vs. “Actual” rosters for every Major League organization. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com.

Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

AWAR – Wins Above Replacement for players on “actual” teams

AWS – Win Shares for players on “actual” teams

APW% – Pythagorean Won-Loss record for the “actual” teams

Assessment

The 1999 Chicago White Sox 

OWAR: 45.1     OWS: 289     OPW%: .504     (82-80)

AWAR: 28.5      AWS: 225     APW%: .466     (75-86)

WARdiff: 16.6                        WSdiff: 64  

The “Original” 1999 White Sox tied the Royals for second place in the American League Central, eight games behind the Indians. Robin Ventura (.301/32/120) established career-highs in batting average and RBI while earning his sixth Gold Glove Award at the hot corner. Randy Velarde (.317/16/76) rapped 200 base knocks and set personal-bests in almost every offensive category. Mike Cameron drilled 34 doubles and pilfered 38 bags. Harold Baines (.312/25/103) topped the century mark in RBI for the third time in his career during his age-40 season. Ray Durham registered 109 tallies and swiped 34 bags. Magglio Ordonez (.301/30/117) scored 100 runs and merited his first All-Star invitation. Frank E. Thomas clubbed 36 two-baggers and delivered a .305 BA. Chris Singleton (.300/17/72) placed sixth in the AL Rookie of the Year balloting and Paul Konerko contributed 24 dingers and 81 ribbies for the “Actuals”.

Frank E. Thomas rated tenth among first basemen according to “The New Bill James Historical Baseball Abstract” top 100 player rankings. “Original” White Sox chronicled in the “NBJHBA” top 100 ratings include Robin Ventura (22nd-3B) and Harold Baines (42nd-RF).

  Original 1999 White Sox                          Actual 1999 White Sox

STARTING LINEUP POS OWAR OWS STARTING LINEUP POS OWAR OWS
Carlos Lee LF -0.04 10.36 Carlos Lee LF -0.04 10.36
Mike Cameron CF 3.63 21.44 Chris Singleton CF 2.61 16.33
Magglio Ordonez RF 1.7 18.56 Magglio Ordonez RF 1.7 18.56
Harold Baines DH 1.7 12.96 Frank E. Thomas DH 2.2 17.07
Frank E. Thomas 1B/DH 2.2 17.07 Paul Konerko 1B 1.45 14.68
Randy Velarde 2B 5.23 24.19 Ray Durham 2B 3.63 20.45
Liu Rodriguez SS/2B -0.12 1.41 Mike Caruso SS -2.58 4.25
Robin Ventura 3B 5.1 28.27 Greg Norton 3B 0.06 12.36
Mark Johnson C 0.28 6.12 Brook Fordyce C 1.59 11.45
BENCH POS OWAR OWS BENCH POS OWAR OWS
Ray Durham 2B 3.63 20.45 Mark Johnson C 0.28 6.12
Greg Norton 3B 0.06 12.36 Craig Wilson 3B -0.38 4.06
Olmedo Saenz 3B 1.35 8.68 Darrin Jackson LF -0.05 2.68
Craig Grebeck 2B 0.82 4.39 Brian Simmons LF -0.15 1.76
Craig Wilson 3B -0.38 4.06 Liu Rodriguez 2B -0.12 1.41
Brian Simmons LF -0.15 1.76 Jeff Liefer 1B -0.6 0.91
Jeff Liefer 1B -0.6 0.91 McKay Christensen CF -0.27 0.47
Norberto Martin 2B 0.09 0.44 Jason Dellaero SS -0.39 0.32
Jason Dellaero SS -0.39 0.32 Josh Paul C -0.09 0.27
Josh Paul C -0.09 0.27 Jeff Abbott LF -0.73 0.18
Robert Machado C -0.08 0.22
Chris Tremie C -0.18 0.18
Jeff Abbott LF -0.73 0.18
Frank Menechino SS -0.08 0.14
John Cangelosi LF -0.06 0.02

Mike Sirotka (11-13, 4.00) and James Baldwin (12-13, 5.00) labored through their second seasons in the Sox rotation. Alex Fernandez supplied a 7-8 record with a 3.38 ERA after missing the entire 1998 campaign due to injury. Bob Wickman notched 37 saves with an ERA of 3.39 for the “Originals” while Keith Foulke (2.22, 9 SV) and Bob Howry (3.59, 28 SV) secured late-inning leads for the “Actuals”.

  Original 1999 White Sox                       Actual 1999 White Sox 

ROTATION POS OWAR OWS ROTATION POS AWAR AWS
Mike Sirotka SP 3.94 13.5 Mike Sirotka SP 3.94 13.5
Alex Fernandez SP 3.34 10.47 James Baldwin SP 2.19 9.47
James Baldwin SP 2.19 9.47 Jim Parque SP 1.26 6.82
Brian Boehringer SP 1.64 6.91 Kip Wells SP 0.79 2.93
Jim Parque SP 1.26 6.82 Jaime Navarro SP -1.15 2.16
BULLPEN POS OWAR OWS BULLPEN POS AWAR AWS
Bob Wickman RP 1.33 10.19 Keith Foulke RP 3.86 16.7
Al Levine RP 0.77 6.84 Bob Howry RP 0.61 10.06
Pedro Borbon RP 0.36 4.11 Sean Lowe RP 1.58 7.94
Buddy Groom RP -0.27 3.49 Bill Simas RP 0.68 6.46
Steve Schrenk RP 0.54 3.04 Carlos Castillo SW 0.05 1.45
Kip Wells SP 0.79 2.93 John Snyder SP -0.97 1.22
Scott Radinsky RP 0 2.35 Tanyon Sturtze SP 0.48 0.91
Jason Bere SP -0.6 1.6 Pat Daneker SP 0.23 0.82
Carlos Castillo SW 0.05 1.45 Jesus Pena RP -0.27 0.42
Pat Daneker SP 0.23 0.82 Joe Davenport RP 0.13 0.25
Aaron Myette SP 0 0.11 Aaron Myette SP 0 0.11
Chad Bradford RP -0.5 0 Bryan Ward RP -1.15 0.09
John Hudek RP -1.04 0 Chad Bradford RP -0.5 0
David Lundquist RP -0.74 0 Scott Eyre RP -0.66 0
Jack McDowell SP -0.36 0 David Lundquist RP -0.74 0
Nerio Rodriguez RP -0.16 0 Todd Rizzo RP -0.11 0

 

Notable Transactions

Robin Ventura 

October 23, 1998: Granted Free Agency.

December 1, 1998: Signed as a Free Agent with the New York Mets. 

Randy Velarde

January 5, 1987: Traded by the Chicago White Sox with Pete Filson to the New York Yankees for Mike Soper (minors) and Scott Nielsen.

December 23, 1994: Granted Free Agency.

April 12, 1995: Signed as a Free Agent with the New York Yankees.

November 2, 1995: Granted Free Agency.

November 21, 1995: Signed as a Free Agent with the California Angels.

October 23, 1998: Granted Free Agency.

December 7, 1998: Signed as a Free Agent with the Anaheim Angels.

Mike Cameron

November 11, 1998: Traded by the Chicago White Sox to the Cincinnati Reds for Paul Konerko. 

Harold Baines

July 29, 1989: Traded by the Chicago White Sox with Fred Manrique to the Texas Rangers for Wilson Alvarez, Scott Fletcher and Sammy Sosa.

August 29, 1990: Traded by the Texas Rangers to the Oakland Athletics for players to be named later. The Oakland Athletics sent Joe Bitker (September 4, 1990) and Scott Chiamparino (September 4, 1990) to the Texas Rangers to complete the trade.

January 14, 1993: Traded by the Oakland Athletics to the Baltimore Orioles for Allen Plaster (minors) and Bobby Chouinard.

November 1, 1993: Granted Free Agency.

December 2, 1993: Signed as a Free Agent with the Baltimore Orioles.

October 20, 1994: Granted Free Agency.

December 23, 1994: Signed as a Free Agent with the Baltimore Orioles.

November 6, 1995: Granted Free Agency.

December 11, 1995: Signed as a Free Agent with the Chicago White Sox.

November 18, 1996: Granted Free Agency.

January 10, 1997: Signed as a Free Agent with the Chicago White Sox.

July 29, 1997: Traded by the Chicago White Sox to the Baltimore Orioles for a player to be named later. The Baltimore Orioles sent Juan Bautista (minors) (August 18, 1997) to the Chicago White Sox to complete the trade.

October 29, 1997: Granted Free Agency.

December 19, 1997: Signed as a Free Agent with the Baltimore Orioles.

Alex Fernandez 

December 7, 1996: Granted Free Agency.

December 9, 1996: Signed as a Free Agent with the Florida Marlins. 

Bob Wickman 

January 10, 1992: Traded by the Chicago White Sox with Domingo Jean and Melido Perez to the New York Yankees for Steve Sax.

August 23, 1996: Traded by the New York Yankees with Gerald Williams to the Milwaukee Brewers for a player to be named later, Pat Listach and Graeme Lloyd. The Milwaukee Brewers sent Ricky Bones (August 29, 1996) to the New York Yankees to complete the trade. Pat Listach returned to original team on October 2, 1996.

Honorable Mention

The 1932 Chicago White Sox 

OWAR: 21.5     OWS: 205     OPW%: .380     (58-96)

AWAR: 17.0      AWS: 147     APW%: .325     (49-102)

WARdiff: 4.5                        WSdiff: 58  

The cellar-dwelling “Original” 1932 White Sox fared better than their “Actual” counterparts in terms of team WAR, Win Shares and winning percentage. Although the “Actuals” recorded only 49 victories, the team finished in seventh place ahead of the miserable Red Sox (43-111). Willie Kamm clubbed 34 doubles, delivered a .286 BA and drove in 83 baserunners for the Pale Hose. Second-sacker Bill Cissell posted career-bests in batting average (.315), runs (85), hits (184), doubles (36), home runs (7) and RBI (98). Rookie right fielder Bruce Campbell (.286/14/87) contributed 36 two-baggers and 11 three-base hits. Smead “Smudge” Jolley (.312/18/106) drilled 30 doubles while outfield mate Carl Reynolds produced a .305 BA. Luke Appling aka “Old Aches and Pains” rewarded the Chicago brass with 20 two-base hits and 10 triples after achieving full-time status. Ted Lyons completed 19 of 26 starts and furnished an ERA of 3.28.

On Deck

What Might Have Been – The “Original” 2001 Rangers

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive

 


MLB to Across the Pacific and Back

The player that all Milwaukee Brewers fans, and baseball fans for that matter, should be watching most closely this spring is Eric Thames. Thames, after three incredible seasons in the KBO, signed a three-year, $16-million deal to man first base for the Brewers. The front office likes what they see from the 2015 KBO MVP, but admittedly did not scout him in person while he was playing overseas; instead, they relied on video to make their assessment of his game. I’ll admit, I can’t wait to see Thames play this year; the mystery, concerns, and potential all make for great theater, but there is one question that keeps haunting me at night: How do former MLB payers fare when they play overseas and then return? As much as this post is about Thames, it is also about those few players who have done what he is doing.

I approached this by looking at all the major-league players who have played in both Korea and Japan over the past 10 years. I could have gone further back to the days when Cecil Fielder was playing in Japan, but the game, both in North America and across the Pacific, has changed significantly since then. The argument could be made that the game has changed significantly over the past 10 years — it changes every season — but that is the beauty of baseball.

I wanted to isolate Korea only, but, perhaps not surprisingly, there were too few players to make anything of that. Out of the several hundred total players in both these leagues over the past 10 years, only a total of 11 players who began their career in MLB returned to MLB after an overseas hiatus. That’s 11 between the KBO AND NPB. 11! Four players from the KBO and seven from NPB. Here’s a graph that shows their names and WAR before and after their careers in Japan and Korea:

Pre WAR MLB Season(s) Pre Post WAR MLB Season(s) Post
Joey Butler 0 2013-2014 0.5 2015
Brooks Conrad -0.1 2008-2012 -0.5 2014
Lew Ford 8.4 2003-2007 0 2012
Andy Green -1.2 2004-2006 0 2009
Dan Johnson 4.0 2005-2008 -0.8 2010-2015
Casey McGehee 1.6 2008-2012 -0.4 2014-2016
Kevin Mench 5.8 2002-2008 -0.4 2010
Brad Snyder -0.1 2010-2011 0.1 2014
Chad Tracy 5.7 2004-2010 -0.3 2012-2013
Wilson Valdez 0.7 2004-2005, 2007 -1.1 2009-2012
Matt Watson -0.5 2003. 2005 0.1 2010
Total WAR: 24.3 -2.8
Eric Thames -0.6 2011-2012 ? 2017-?

(Numbers courtesy of baseball-reference.com)

The outcome for these players is, well, not good. A select few players like Lew Ford and Chad Tracy carry the “pre-Japan/Korea WAR” section thanks to longer, successful careers in MLB before they changed leagues. It also seems unfair to compare these players to each other due to their careers, or lack thereof, upon their return. For example, Ford’s 79 plate appearances are incomparable to Wilson Valdez’s 966. But, in every case, the story arch is the same: Begin their professional baseball career in North America, make it to the majors as a 20-something, decline at the major- and minor-league level, go to Japan/Korea, return to North America in a very limited capacity and fail to make an impact with a major-league-affiliated team.

If the careers of these 11 players is a trend, then Eric Thames is in for a lot of trouble.

But there is reason to believe that Thames is the exception to the rule. Will Franta wrote a convincing Community Research article about the reason to believe that Eric Thames will do well. Additionally, various projections believe that Thames could be anywhere from a 1.2 to 2.2 WAR player with mid- to high-20 home-run totals and an above-average wRC+. Dave Cameron wrote an article analyzing the projections for Thames and concluded that he has the potential to be “the steal of the winter,” and for three years and $16 million, that could very well be true.

But there are factors going against Thames. It isn’t all too often professional players find their footing at the major-league level in their 30s (Thames will be 30 on Opening Day). Plus, with several other corner infielders in the form of Hernan Perez, Travis Shaw, Jesus Aguilar and others who could fill in at first if need be such as Ryan Braun and Scooter Gennett, a team in the middle of a rebuild might not completely be opposed to disposing the incumbent starting first baseman if another star emerges. Even comparing career KBO and NPB players to their transitions to MLB, we can see that there are a lot more Tsuyoshi Nishiokas than Jung-ho Kangs, which is why players like Kang, Ichiro Suzuki, Hideo Nomo, and Yu Darvish are lauded when they succeed in the majors.

I believe that Eric Thames will not be like the 11 others who, by and large, failed in their returns. Thames is intriguing and there is a lot to like about him — and a lot to worry about with him. There are pros and cons to his game. I believe that he will be a great addition to a team that, honestly, could afford to wait for him to assimilate completely to the game.


Adjusting Appearance Data for Base-Out State

So far, we’ve developed some mathematical principles for visualizing appearance data for relief pitchers, and for measuring how apart they are. The goal has been to say something about how pitchers are being used, not only in a vacuum, but in the context of the way in which the team has chosen to divide up its relief innings for the season. We’ve only partially gotten there so far, but today let’s take a slight detour to ask: Is the underlying data conveying the most useful information?

Inning and score differential at the time of entering the game are the critical data elements in answering questions related to usage. The numbers and tables in my previous articles all focused on using these two elements. Here’s an example of the underlying data being used, in the form of three Daniel Hudson appearances which appear identical.

Three (Similar?) Daniel Hudson Appearances
Date Player Season Inning Score
6/28/2016 Daniel Hudson 2016 8 1
8/20/2016 Daniel Hudson 2016 8 1
9/21/2016 Daniel Hudson 2016 8 1

Inning and score differential are critical; however, as data elements are concerned, they are somewhat raw. Fortunately, those aren’t the only data elements we can look at. The next-most impactful data, I would argue, is the base-out state at the time that the pitcher enters the game.

Let’s establish a baseline: It’s the norm for relief pitchers to enter the game in a clean inning (no outs, no runners on base). Among pitchers with 20+ relief appearances in 2016, this was the situation in 68.1% of appearances. That’s a very high percentage, considering that there are 24 base-out states. It’s also very intuitive when we think about the game. Among other reasons, pitchers need time to warm up, and mostly, they do so while their own team is batting. It’s also the only base-out state which is guaranteed to happen every inning.

It would be atypical – and therefore, interesting – for a pitcher to be used frequently in other base-out states. Moreover, we should be giving credit to pitchers who are being used in that way. An appearance where a pitcher enters with a four-run lead but the bases loaded should not be viewed in the same way as an appearance where a pitcher enters with a four-run lead in a clean inning. More than likely, the manager has two different pitchers in mind for each of these scenarios.

Adjusting the inning is easy: Credit partial innings in the event that the pitcher enters with more than zero outs in the inning. This will bump the inning component of every pitcher’s “center of gravity” up a bit, giving credit to players for working slightly later in the game when called upon mid-inning. (Note: we could also define terms in a different way, and say that a pitcher who enters in a “clean” 9th inning is actually entering at inning 8.0, as 8 innings have been recorded prior to his entrance; however, this makes the resulting metric less intuitive.)

Adjusting the score differential doesn’t seem as straightforward at first, but fortunately, we can use the concept of RE24 to accomplish this. Given that entering in a clean inning is the default status, we will make no adjustment to the score differential for a given appearance if the pitcher entered in a clean inning. For any other base-out state, we will add or subtract the difference between expected runs in that base-out state and expected runs in a clean inning state (0 on, 0 out).

Let’s return to the three appearances shown above. As you might have guessed by now, they are not identical. Rather, they illustrate the importance of adjusting for base-out state.

Three Daniel Hudson Appearances (in greater detail)
Date Player Inning Score Outs Bases Adj. Inn. Adj. Score
6/28/2016 Daniel Hudson 8 1 0 ___ 8.00 1.00
8/20/2016 Daniel Hudson 8 1 0 123 8.00 -0.82
9/21/2016 Daniel Hudson 8 1 2 _2_ 8.67 1.16

If you were to ask Daniel Hudson to recall what he could about these three appearances, he’d probably feel very differently about each of them (if he remembers, anyway). In the first case, he’s coming into a clean 8th inning, protecting a one-run lead. It was a situation he found himself in with some regularity in 2016, prior to assuming the closer’s role.

The second situation is an absolute bear. Jake Barrett has allowed a leadoff single to lead off the inning, and poor Steve Hathaway, who shouldn’t be touching this game situation with a 10-foot pole at this point in his career, has subsequently allowed a double and a walk to load the bases. Hudson has been brought in to protect a one-run lead with the bases loaded and nobody out. The opposing team has an expected run value of 2.282. While technically Hudson has been given a lead, it’s one that he would be hard-pressed to keep, even if he does everything right. The reality is that this appearance is associated with an expectation that Arizona will trail by the end of it – as you can see on the play-by-play log, the Padres have a 70.6% win probability at this point. It would be silly to give this appearance the same treatment as the first two. (Hudson, by the way, does a masterful job of escaping this situation without surrendering the lead!)

The third case is the one I want to focus on. Rather than a clean inning, Hudson was asked to get the third out of the 8th inning, with the tying run standing on second base. While the Leverage Index at the time of entry for this appearance is higher (3.50) than in the first instance (2.17), Hudson actually has an easier job: He needs just one out instead of three, and the opposing team is expected to score fewer runs in this situation, all else being equal. In the “clean” 8th inning, he can be expected to give up 0.481 runs, while in the two-out, runner-on-second situation, he can be expected to give up just 0.319 runs. Moreover, the chance of scoring at least one run – presumably the more important question where one-run leads are concerned – is also lower in the “higher leverage” situation. (This doesn’t even account for the batter, Hector Sanchez, who is hardly Wil Myers at the plate, and is probably inferior to the 4-5-6 hitters in the Phillies lineup, as well.)

This brings up an important distinction between leverage and run prevention. Leverage Index, certainly, is an important tool. What it measures, however, is variance in win probability for a single at-bat. Managers rarely have the luxury of giving their pitchers one-batter appearances in the regular season. Even the notoriously fleeting Javier Lopez averaged nearly three batters per appearance in 2016. Managers must therefore determine how to maximize the value of relief appearances as a whole, not just at the time when the reliever is entering the game. Leverage Index shows how much variance can arise from the current plate appearance, but a manager may very well be better served having their best pitcher throw the entirety of the 8th inning, rather than having him get the third out in a situation that commands high leverage but still has relatively low run expectation.

Next time, we’ll look at how base-out state adjustments impacted the raw inning-score matrix data in 2016, to draw conclusions about which relievers were used most often in high-pressure, mid-inning situations, and whether that sort of usage aligns with what we’d expect from an optimal manager.


An Attempt to Quantify Quality At-Bats (Part 2)

In my first article, I created a definition for what I feel like constitutes a quality at-bat. I also examined a few test cases1 and hypothesized different ways in which this data could be used going forward. As a reminder, my definition of a quality at-bat (QAB) is an at-bat that results in at least one of the following:

  1. Hit
  2. Walk
  3. Hit by pitch
  4. Reach on error
  5. Sac bunt
  6. Sac fly
  7. Pitcher throws at least six pitches
  8. Batter “barrels” the ball.

 

To calculate a QAB percentage I divided the player’s total number of QABs by his total number of plate appearances. I then dove a little deeper into QABs to see what conclusions I could draw from this statistic.

The first thing I did was run every hitter in 2016 who had more than 400 at-bats and created a leaderboard. I displayed the players with the best QAB% and the worst QAB% below. The average QAB percentage in 2016 was 48.54%.  Not surprisingly, Mike Trout leads all hitters and is followed closely by Joey Votto — a player who always finds a way to get on base. The player that stuck out to me most on this list was Chris Carter. This is a player who had a lot of trouble getting a contract this offseason, despite leading the league in homers. In fact, he had so much trouble that he considered going to Japan before finally signing with the Yankees. However, he had the 10th highest QAB percentage. Mike Napoli’s QAB% also surprised me because I do not view him to be a particularly elite hitter; yet he ranked number four between two of baseball’s best hitters.

Players with best QAB% Players with worst QAB%
Name QAB % Name QAB %
Mike Trout 64.02% Josh Harrison 41.83%
Joey Votto 63.52% Rajai Davis 41.82%
Freddie Freeman 57.93% Andrelton Simmons 41.74%
Mike Napoli 57.89% Ryan Zimmerman 41.67%
Josh Donaldson 57.71% Alcides Escobar 41.40%
Paul Goldschmidt 57.65% Jason Heyward 41.34%
Dexter Fowler 57.61% Adeiny Hechavarria 41.32%
DJ LeMahieu 57.30% Jonathan Schoop 40.49%
David Ortiz 55.27% Salvador Perez 40.22%
Chris Carter 55.16% Alexei Ramirez 38.46%

 

One commenter on my last post pointed out that OBP could be highly correlated with QAB%. They were right. In fact, there is a strong correlation of r2=.82 between OBP and QAB%, which makes sense since they share many of the same parameters. After this finding, I decided to create an interactive scatter plot of OBP and QAB% to see what the data looked like and to see if I could find any interesting patterns. If you interact with the graph you can see that the five players who seem to be a little above the data between .3 and .35 OBP are Chris Carter, Mike Napoli, Michael Saunders, Miguel Sano, and Jason Werth.

 

Click here for an interactive version

Why does QAB% seem to favor this group of players more than others? By investigating the other parameters in my definition of QABs, I found that these five hitters were taking a lot of pitches. In fact, all five of these hitters were in the top 15 last year in pitches per plate appearance, with Jason Werth and Mike Napoli being numbers one and two, respectively. Additionally, Chris Carter’s score was likely higher since he barreled the 8th most balls last season. This leads me to believe that QAB% tends to favor or distinguish hard-hitting, patient sluggers.

Is QAB% another way in which we should be evaluating hitter performance? Probably not. As much as I love seeing Chris Carter on a list with the best players in baseball, this statistic uses an old-school mindset that does not show true value. That being said, it can still be helpful. It is a good way to show which hitters are taking a lot of pitches. It also helps quantify what coaches and broadcasters mean when they say a player had a  “good at-bat.” Finally, perhaps you watched a lot of Indians games last season and you couldn’t help but feel like Mike Napoli was the best hitter ever. His QAB% may identify why you feel that way. Mike Napoli is a good hitter, but not nearly as good as former MVP Josh Donaldson despite the fact that they both have a very similar number of at-bats that a coach would call “quality”.  Overall, I think this statistic does a good job of quantifying something that used to be a lot harder to quantify. At the very least, QAB% has given me a reason to be excited about Chris Carter joining the Yankees, my favorite team. Opening day cannot come soon enough.

 

  1. In my first article I made a mistake with my test cases. Barrels, a Statcast statistic, did not start being counted until 2015. I had provided QAB numbers starting in 2014. With the way I wrote my code this actually caused the barrels in 2015 and 2016 not to be counted. I should not have provided 2014 numbers at all, and the numbers for 2015 and 2016 were a little lower than they should have been. All of my calculations have been corrected for this article.