Home Run Skewness, Babe Ruth, and Maybe PEDs

The breaking of baseball known as the dead-ball era is generally considered a phenomena of the 1919 Babe Ruth season where he hit a record 29 homers for the Red Sox.  That was a good year, but not something jaw dropping as three players had managed 25+ homers at that point and Ned Williamson’s record from 1884 was only two behind Babe.  The next season was the unprecedented explosion when Ruth redefined power posting 54 home runs doubling up anyone else who had ever played in the big leagues.

It only took a few years for the trajectory of offense, and especially home run production, to change drastically.  In 1922 Rogers Hornsby hit 42, Ken Williams 39, and Tilly Walker 37 all besting The Bambino’s paltry 35 that season.  Over the next several decades home run production shifted drastically as power re-shaped the game.

 photo HRSkew_zpsb90e19d4.jpg

 

Skewness is based on the Excel formula where anything between -1 and 1 is not skewed, and since we have no negatives here we will focus on above 1 to start, or positive skewness (long right tail).  As you can see, the peak of skewness in HR production was that 1920 season where Ruth was an extreme outlier, see below:

 photo 1920HRs_zps20fcd686.jpg

 

You can see the skewness, a long right tail, and most of it is being driven by one observation.  Positive skewness was always present in early baseball due to the large cluster of players at or slightly above 0, but this took it to a new level.  If you go back to the previous chart though, you will see that as the league started hitting more long balls the skewness quickly dissipated, and by the late 40s went away.  Only twice since 1949 did we see a skewness above 1, in 1981 and 1981 where the skewness shows up as 1.05 and 1.04 respectively, so right on the dividing line between truly skewed or not.  Interestingly, the skewness leaves and stays away shortly after the talent pool widened with an influx from the Negro Leagues which may have cut out some of the lower end that was causing it.

One of the things to keep in mind for all of this is that a lot of people look at the steroid era as another period where baseball was broken with scientifically enhanced freaks blasting way more home runs than should be seen.  Yet, in the data we don’t see a large spike in skewness through that period, which of course leads to a lot of ambiguity and no answers as you could read it in multiple ways including the two extreme views:

1) See, EVERYONE was cheating in the steroid era, so the entire distribution shifted enough to prevent even 1998’s home run chase ending with two players breaking the all-time record from becoming a skewed distribution.

2) Despite the cheating nothing was all that greatly affected.  There happen to be  a couple of cheaters who succeeded, but mostly the cheaters stayed with the pack and thus we see no skewness.

So what did the distribution look like in 1998?

 photo 1998HRs_zpsc52198d3.jpg

Rather than the highest frequencies being 0 to 4 home runs and then tapering off quickly like 1920, we now see that every qualified batter came up with at least 1 HR and that the largest mass is from 9 to 23 home runs.  This means that Mark McGwire’s 70 HRs was about 3.5 times the average and median which were 20.7 and 20 for the year.  In comparison, Babe Ruth hit 10 times the average of 5.3 HRs in 1920 and 18 times the median of 3, so you can see how much farther from the pack he was.

Whether or not PEDs broke baseball again is not something I am prepared to answer here, but we can at least say it didn’t break it to the degree that Babe Ruth did when he signaled the end of the dead-ball era.  What we can tell from home run production is that it seems to be distributed fairly evenly and has been for more than half a century of baseball in which time we have seen many changes to the game.  All that leaves me with is more questions in reality, and that is just fine by me.


Leadoff Rating 2.0

It feels icky to create a statistical formula based on what “feels right”.

Last month, I introduced a stat called Leadoff Rating, or LOR. The idea was that most systems to identify great leadoff hitters tab players like Ted Williams and Mickey Mantle, who would always hit closer to the middle of the order. I wanted to distinguish players specially suited to batting leadoff. The formula was simple: OBP minus ISO. By subtracting isolated power, we identified players who get on base a lot but aren’t true sluggers. It’s an easy calculation, and it produced fairly reasonable results. Two particular things bothered me:

1. Bad hitters occasionally had good leadoff ratings because of their very low ISO.

2. Rickey Henderson ranked 45th.

We know that leadoff is one of the two or three most important positions in the batting order. As little impact as lineup construction has on winning percentage, leadoff hitters are important. But LOR saw high OBP and low ISO as equally meaningful, so players with no power sometimes rated as desirable leadoff hitters. That seemed like something to correct.

Rickey Henderson is generally recognized as the greatest leadoff man of all time. LOR did not show this, for two main reasons. One was that the formula did not include baserunning. The other was that the all-time list slanted heavily towards Deadball players. Before Babe Ruth, everyone had low isolated power. Ty Cobb was a terrific power hitter, who led the AL in slugging eight times. Cobb’s career ISO (.146) is basically the same as Rickey’s (.140). Henderson only ranked among the top 10 in slugging twice. The game has changed.

Based on the feedback of FanGraphs readers and on my own muddlings, I’ve reworked the leadoff rating formula. The new system is more complicated — it’s annoying to do without a spreadsheet — and it’s kind of haphazard. OBP – ISO was a nice system because of its simplicity. With the updated formula, I’m guessing, choosing numbers that seem right. If someone better than I am at math would care to suggest revisions, please do so. I am fully prepared to give this stat away to smart people.

The formula I’m using now is — wait. There’s another calculation I abandoned, but it’s important for explaining how we arrived at the current iteration, and that middle step looked like this: OBP – ( .75 * ISO ) + ( ( .005 * BsR ) / ( PA / 600 ) )

On-base percentage is the heart of leadoff rating. A good hitter, and especially a good leadoff hitter, must get on base. But I only subtracted 3/4 of ISO, because (1) low ISO is not as important as high OBP, and (2) the original formula was probably a little too hard on doubles hitters. Guys like Rickey and Tim Raines ranked too low because they had more power than players like Jason Kendall and Ozzie Smith.

Commenter foxinsox suggested adding (Constant * BsR) to the calculation, which was a fine idea I should have seen earlier. The hitch was turning BsR into a rate stat.  By using BsR/PA or BsR/G, we can incorporate that element smoothly.

When I ran the numbers, the historical lists looked great (Rickey Henderson in the top 10!), but for active players, there were hits and misses. Elvis Andrus came back as the ideal leadoff hitter in 2013, and Craig Gentry (.264/.326/.299) ran away with 2014 to date. Even with the adjustments, LOR rewarded low ISO. While a .250 ISO isn’t really the right fit for the top of the batting order, neither is a sub-.050 ISO. We don’t want a guy who only hits singles, we just don’t want a cleanup hitter. Looking at the historical lists, I found that most of the top players had an ISO right around .100, so I created a Goldilocks formula, preferring a minimal absolute difference from .100 ISO. Rather than simply treating low ISO as desirable, we’re looking for the sweet spot between singles and slugging. The new formula is:

OBP –  .75 * | .100 – ISO |  + ( .005 * BsR ) / ( PA / 600 )

That’s on-base percentage, minus 3/4 of the absolute difference between ISO and .100, plus .005 times BsR per 600 plate appearances. Now very low isolated power is punished just as much as very high ISO.

Hopefully you want to see some lists. I’ll show you five: the all-time list, the post-Jackie Robinson list, the leaders for the 2013 season, 2014 to date (through July 31), and 2014 rest-of-season projections (ZiPS). We’ll also look at the 2014 leaders (both to date and projected) for every team in the major leagues. Read the rest of this entry »


Best/Worst Starting Pitchers According to ISO

ISO is used to determine a hitter’s ability to get extra-base hits as it is a measure of slugging percentage minus batting average.  So using the same idea and with the help of slugging percentage and batting average against we can evaluate the best pitchers at limiting extra-base hits.  First we will look a the 10 best starting pitchers in 2014 according to ISO.

PLAYER ISO
Garrett Richards 0.069
Chris Sale 0.077
Felix Hernandez 0.083
Chris Archer 0.083
Sonny Gray 0.084
Adam Wainwright 0.089
Jose Quintana 0.089
Clayton Kershaw 0.092
Tyson Ross 0.093
Jarred Cosart 0.094

As would’ve been expected the top ten includes some of the best pitchers in the league.  Guys like Wainwright, Kershaw and many of the others are also found near the top of the ERA leader-boards.  However, one name more than the others does not quite fit with the others on this list, Jarred Cosart.  The hard throwing right-hander who was traded at the deadline from Houston to Miami has been one of the best pitchers in the league at limiting extra-base hits.  However, his ERA — 4.51 — does not match.

Cosart’s lack of success despite his ability to limit hitters to singles is due to two areas where he struggles.  The first is stranding runners.  Cosart’s LOB% of 67.4 is 9th worst in the league.  Although Cosart has excelled in mainly allowing singles he has not done a good job of keeping those hits from coming around to score.  However, the main area that Cosart has struggled this season is his control.  His BB% is tied with A.J. Burnett for third worst in the league at 10%.  Thus Cosart’s high frequency of baserunners due to his walk rate and his struggles in stranding runners have caused the hits he has allowed to do more damage.

Player ISO
R.A. Dickey 0.174
Josh Beckett 0.175
Wei-Yin Chen 0.177
Edwin Jackson 0.177
John Danks 0.184
Chris Young 0.186
Eric Stults 0.188
Jake Peavy 0.191
Dan Haren 0.202
Marco Estrada 0.234

Again not surprisingly, several of these pitchers are among the worst qualifying starters in terms of ERA in 2014.  With the bottom 4 pitchers all with high-4 ERAs and Jackson pitching to a 5.66 ERA.  However, there are also a few outliers in terms of success with Beckett and Chris Young both pitching to much better ERAs than their ISO allowed would suggest.  Beckett’s 2.88 ERA is good for 19th best in the league with Young’s ERA placing him in the top 40 among starters.

Where both pitchers have succeeded this season is in stranding runners.  Beckett ranks number 1 in the league in LOB% while Young finds himself at 4th.  Both pitchers have been very successful at pitching themselves out of jams this season.  For that reason both pitchers have been able to allow a large amount of extra-base hits and still be among the best in the league at preventing runs.


Theo Sells High, Amazes Onlookers

There’s an old joke about a guy who’s just lost everything — marriage, job, home — and decides to end it all, so he goes to the top of the Empire State Building and jumps. As he’s plummeting to his doom, at the last possible second he performs a triple somersault and lands on his feet, completely unharmed. Two cats are watching across the street and one says to the other, “See? That’s how you do that.”

Since taking over the Cubs front office in October, 2011, Theo Epstein has been carrying out a three-pronged rebuilding plan: (1) acquire a stable of fast-developing power hitters; (2) find a #1 starter; and (3) rebuild the roster with a yard sale. The first prong is coming along well, with Arismendy Alcantara joining Anthony Rizzo in the majors and another wave (including but not necessarily limited to Kris Bryant, Javy Baez, and Jorge Soler) on the way soon. The second prong has borne no fruit yet; there is still no one in the Cubs system that realistically projects as an ace.

The jury is still out on the third prong, which has involved international signings of, and trades for, young players who in some cases are several years away from the majors. But at least on the surface, Theo has made the most out of the tattered wares in his basement. This is especially true of the parade of pitchers (some he inherited and some he acquired as reclamation projects) that he has, for the most part, successfully sold high.  For the most part the folks stopping by Crazy Theo’s Pitching Palace have walked away happy, only to soon suffer buyer’s remorse.

Here’s a list of pitchers Theo has traded away, together with the principal player received in return. In this post, all slash numbers are ERA/FIP – the first pair is the player’s numbers with the Cubs, and the second are his numbers with the team that acquired him.

 

Sean Marshall  (3.96/4.02 , 3.27/2.67 (CIN))

Swag: Travis Wood

Skinny: Marshall’s thrown just 24 innings since 2012.

 

Andrew Cashner    (4.29/4.84, 3.08/3.25 (SDP))

Swag: Anthony Rizzo

Skinny: Trade could end up helping both clubs, though Cashner’s durability is still questionable.

 

Paul Maholm  (3.74/4.14, 4.14/4.09 (ATL))

Swag: Arodys Vizcaino

Skinny: Vizcaino could be a future closer, but the T.J. survivor has logged just 34 IP in the minors this year.

 

Ryan Dempster (3.74/3.78, 5.09/4.08 (TEX))

Swag: Kyle Hendricks

Skinny: Hendricks is already benefiting from the long Wrigley grass.

 

Scott Feldman  (3.46/3.93, 4.27/4.13 (BAL))

Swag: Jake Arrieta

Skinny: Arrieta won’t defy gravity forever, but some of his improvement may be for real.

 

Matt Garza  (3.45/3.45, 4.38/3.96 (TEX))

Swag:  C.J. Edwards

Skinny: Rangers got little from this deal, in which they also gave away Neil Ramirez, Mike Olt, and Justin Grimm.

 

Jeff Samardzija  (3.97/3.80, 3.19/4.00 (OAK))

Swag: Addison Russell

Skinny: Sharknado 2 is about as good as the original, but his 2014 FIP jumped a run after the trade.

 

Jason Hammel  (2.98/3.19, 9.53/7.31 (OAK))

Swag:  Addison Russell

Skinny: Might be time to try pine tar, Jason.

 

Epstein hasn’t been able to spin all the lead into gold: he may have held onto Travis Wood past the sell-by date, and Edwin Jackson, inked to a union-appeasing contract, has been barrel-bomb bad and is now unmovable. Taken together, however, these trades brought 60% of the Cubs’ current rotation, two guys (Rizzo and Reed) who may have numerous all-star seasons in them, and a potential closer of the future. In virtually no case except Cashner did the player traded improve after the trade. (Marshall had one good year in the Reds’ bullpen, but he’s spent the bulk of the last  2 seasons in the trainer’s room.)

See? That’s how you do that.


Sorting Out Boston’s Outfield Logjam

The Red Sox made some noise this trade deadline.  On a day that was similar to August 25, 2012 when the Red Sox and Dodgers completed the Nick Punto trade, Boston unloaded key pieces to the 2013 world championship team.

The players they acquired show a clear stance to contend in 2015, just as Dave and Paul stated before.  Yoenis Cespedes and Allen Craig add something the Red Sox have lacked for quite some time now: right-handed, power hitting outfielders. However, these additions add question marks to the surplus of outfielders the Red Sox now have.  With Mike Carp designated for assignment, they now have Cespedes, Craig, Victorino, Bradley, Holt, Nava, and recently called up Mookie Betts who have all seen time in the outfield this season.

Cespedes will occupy one of those spots, most likely in right field with Victorino moving back to the DL.  Craig will probably take over in left.  Holt will be a super utility man who can fill in for literally any of the seven positions not called catcher and pitcher.  Nava will most likely be a fourth outfielder, or he could possibly platoon with Craig in left.

Craig has had a down year, but has had injury woes and still has a 110 wRC+ against LHP this year.  He owns a career wRC+ of 136 against lefties.  That figures to be an ideal platoon situation with Nava who owns a career 126 wRC+ against RHP.  It was Nava and Gomes platooning in 2013, and with Gomes out and Craig in, it looks as if Craig could be an option to replace Gomes and provide an upgrade in that role.

That leaves center field: Betts or Bradley.

Bradley has shown he’s one of the premier defensive center fielders in all of baseball.  He has been worth +17.7 runs defensively and has a UZR/150 of 28.2, which makes him the third best outfielder in the game behind Heyward and Gordon.  The problem is his bat.  He has a decent walk rate of 8.3%, but he strikes out far too often (27.6%) for a hitter with no power (1 HR, .083 ISO).  If he wants to stay the center fielder of the Red Sox he needs to cut down on his strike outs and show that he can at least be an 85-90 wRC+ guy (he’s at 67 in 2014).

Betts figures to be more of an offensive force.  Although he struggled during his brief major league stint, Betts has absolutely torn up the minor leagues.  In 54 AA games he hit .355/.443/.551 and in 34 AAA games he has hit .321/.408/.496.  He will not be what Bradley is in center field defensively, but that’s a lot to ask.  If he can be an average to above average defender, he looks to be the better choice heading forward.  With his recent call up, he will get two months to show what he can do at the big league level.

As far as 2015 goes, it seems like Shane Victorino doesn’t fit into what the Red Sox are planning to do.  After a breakout 2013, he has just not been able to consistently stay healthy.  He has one year remaining on his contract, but he may be dealt in August or sometime in the offseason. In my opinion, Betts will eventually win the center field job and Bradley could potentially be a part of a trade package in the offseason for a starting pitcher, which is another need for Boston moving forward. These new pieces will go along with their core of Pedroia, Ortiz, and Napoli to help boost an offense that has been abysmal in 2014.  Boston also has money to spend and a boatload of prospects.  According to ESPN Boston, Ben Cherington recently stated that “My expectation is that we would be active in the starting pitching market this winter with trades, free agency, whatever.”

Once they add some pieces to the top of their rotation, the Red Sox will be in prime position to contend again in 2015.


The Rays’ Not So Simple Arithmetic

The other day, the Rays traded David Price — 2012 Cy Young Award winner, homegrown star, and overall great guy — to the Tigers in a three-team deal that netted them soft-tossing fifth starter Drew Smyly, former top prospect turned utilityman Nick Franklin, and 18 year old Dominican shortstop Willy Adames.

On the surface, it looks absolutely atrocious, and there’s no way to avoid that. Most people look at the Rays’ previous trades, and compare their return from Matt Garza and James Shields. Since both are good-but-not-quite-Price caliber pitchers, one would think the Rays should have gotten a much better return than they did. And they didn’t. There’s no sugarcoating that. It is widely accepted that the Royals overpaid for Shields, and you could argue the Cubs did for Garza. But history is history. Heck, we even got more out of Victor Zambrano!

I wish we could see if the Garza or Shields trades worked out, but the minor leaguers that determine this are still hanging in the balance. Hak-ju Lee, from the Cubs, is in AAA Durham after recovering from a nasty ACL injury. He was the centerpiece of the Garza deal. Chris Archer, also from the Cubs, has yet to finish a full season. Similarly, from the Shields deal, neither Wil Myers nor Jake Odorizzi has played a full season; Patrick Leonard and Mike Montgomery have shown promise but are still a ways away from making a big league impact. So we have to resort to analyzing this year’s trade as it looks right now.

Drew Smyly has an obvious role: he will fill in the spot Price vacated. No, he’s not Price, but he’s a legitimate young big league starter right now and that is important. The consensus has him as a high-floor-low-ceiling type, not likely to develop too much further but not likely to be relegated to long relief. Over the past three years, his WAR is at 1.8, 1.9, and on pace for 2.0 this year. So if you have him under team control for four years, you could reasonably expect him to bring a total of 8.0 WAR to Tampa Bay. I will use the conservative (and inaccurate) yet convenient figure of $5 million per WAR and put the 4-year value of Drew Smyly at $40 million.

Nick Franklin was a 1st round draft pick and blossomed into a top 100 prospect….who currently holds a career .214 big league average and an even more atrocious 0.34 BB/K ratio. Upon his acquisition, Tampa Bay talking heads proclaimed him to be the next Ben Zobrist. The comparison is not unfair — utilityman, can play multiple positions, solid all-around, a nice touch of power and speed. FanGraphs has him in the mid 3’s for WAR over the next four years. As it was with Zobrist, this is entirely possible if he can draw walks at the big league level, post consistent power numbers, and maintain his defense and speed. If. He could be a quad-A player if he doesn’t put it together. But still, let’s assume he will hold a starting job and mark him down to a total of 12.0 WAR over four years. Using the same constant as above, we get $60 million for four years of Franklin.

Rumor has it that Andrew Friedman wanted Willy Adames as the key piece of the deal. Sure enough, Adames instantly found himself slotted as the Rays’ #2 prospect. As an 18 year old in a league with guys well older than him, Adames has held his own and then some — posting a 112 wRC+. Being better than average as one of the youngest players in the league is promising. Here, the Rays are testing their player evaluators. They evidently believe in Adames, but a lot can go wrong between now and his projected MLB debut. Adames is the crapshoot, the pie-in-the-sky. While I could, I’m not going to assign him a WAR value, mostly because of the abnormally high risk and my inability to calculate it. But it is something, and he could even turn out to BE the key piece of this deal.

Now here’s where it gets interesting.

David Price is having a career year — while the traditional numbers are lagging, he has a career-low FIP at 2.93, and is on pace for 6 WAR in 2014. In previous full seasons he has averaged 4.3 WAR (that’s it?), and he is projected to be right around there for the next four years. Let’s put him at 17 WAR over the next four years. That means his services would be worth $85 million were he to become a free agent today.

Price, while being a bargain over the past few years, could earn $20 million per year next year. That is barely less than he is worth on the hypothetical free agent market, and that is approximately the market value of his next four years. That’s great for a team that can pay that, and on the open market someone WILL pay that, but the Rays simply can’t. He is also likely to be a depreciating asset who will be a worse and worse bargain for whatever team he signs with. Price leaves at the height of his career and at the point of the Rays’ most leverage. Meanwhile, Smyly and Franklin are much more flexible and can be under team control for the next four years. Rays fans are getting used to this pattern of salary dumping, and they will have to — unless their available payroll magically doubles sometime soon.

This trade is a reminder that the Rays are one of the most efficient teams in baseball because they have to be. They cannot just look for the best players; they must look for the BEST VALUE players. A player is only valuable IF AND ONLY IF the player is under team control. Smyly and Franklin, on paper, add up to be just as valuable as David Price right now, for a fraction of the cost. So while it was no secret that the mediocre Rays had to dump salary this year, what shocked me was how actually valuable these three players could end up being.

This is still not a perfect trade, even if the trio performs as expected. What took everyone by surprise was the lack of name recognition. Where were the “untouchable” prospects? Where’s the upside? While Bryant, Buxton, and Correa are all legitimately untouchable, it was presumed that the Rays would get a combination like Taveras/Miller, or Gausman/Harvey, or Pederson/Seager. All six of those were floated among the talking heads of baseball, with many more possibilities abound, all six stayed put. Did the Rays wait too long to pull the trigger? I personally would have taken Russell and McKinney for Price, but instead the Rays lost one buyer when the A’s landed Samardzija.

Did they ask for too much in return? It is possible that they tried to, and they lost another potential buyer in the Cardinals upon their acquisition of Masterson. It is very plausible that the Rays, in trying to rip off Major League Baseball like they have previously, forced themselves into a buyers’ market instead of a sellers’ market. While the arithmetic above shows that they didn’t get totally ripped off, it is absolutely plausible that at one point they were offered much more than what they ended up getting. This is a disappointment, because the Rays HAVE to maximize every asset that they have to compete.

Finally, the David Price trade also puts this good ol’ fashioned arithmetic to the test one more time. By trading for two big-league ready players that add up to Price’s value, Friedman is not-so-subtly hinting that he plans on contending right now, but on a budget. Even if Smyly and Franklin add up to Price’s mid-4 WAR per year in 2015, will the Rays end up contending? If they are good enough to make the playoffs, will their plethora of #3ish starters be good enough to match up against Hernandez/Iwakuma, Kershaw/Greinke, or…uh….Scherzer/Price? I doubt it. And this is where Friedman’s arithmetic meets it match. The Rays have been right at the top of the pack in regular season wins after 2008, but have not won a playoff series in that stretch. Friedman’s math could very well be just fine for the regular season, when there are large enough sample sizes for a team to tend to be better than the other guy over 162 games, but not for the postseason, where if you’re not better than the other guy in this exact five-game series, you’re kicked to the curb.

We are still waiting on word whether a trade from before the 2011 season has worked out, so it will be a long time before we know for sure who “won.” Therefore, the only way to truly measure success is by on-field performance, and Drew Smyly and Nick Franklin surprisingly do add up to David Price’s value as far as we can tell. Before we know who won for sure, the Rays will probably have to make more tough decisions with more players to make the most of their precious money. They will probably contend, too, but if the last few years are a precedent, there won’t be any ring deliveries to St. Petersburg anytime soon. Maybe this is the tragedy of being a small-market team. Maybe, as we’ve seen with the Moneyball A’s, small-market teams just aren’t normally destined to go deep into the postseason. We should feel somewhat sorry that teams like the Rays have their hands tied, and feel sorry that Friedman only has so much room to maneuver. Meanwhile, the rings go to St. Louis and San Francisco and Boston.

With every stars-for-prospects trade as a contending team, Andrew Friedman bets on the Rays being more valuable than the sum of their parts.

With every year the scrap heap Rays rumble and tumble into contention, Andrew Friedman proves his mettle as a wise, pragmatic GM that can overcome the odds.

But with every good-but-not-great season, Andrew Friedman also cements his legacy as the genius whose arithmetic didn’t quite add up.


Using Short-Season A Stats to Predict Future Performance

Over the last couple of weeks, I’ve been looking into how a player’s stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. So far, I’ve analyzed hitters in Rookie leaguesLow-A, High-A, Double-A and Triple-A using a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

For hitters in Low-A and High-A, age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America all played a role in forecasting future success. And walk rate, while not predictive for players in Rookie ball, Low-A, or High-A, added a little bit to the model for Double-A and Triple-A hitters. Today, I’ll look into what KATOH has to say about players in Short-Season A-ball. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year. For those interested, here’s the R output based on all players with at least 200 plate appearances in a season in SS A-ball from 1995-2007.

Short Season Output

Just like we saw with hitters in Rookie ball, a player’s Baseball America prospect status couldn’t tell us anything about his future as a big leaguer. This was entirely due the scarcity players top 100 prospects in the sample, as only a handful of players spent the year in SS A-ball after making BA’s top 100 list. Somewhat surprisingly, walk rate is predictive for players in SS-A, despite being statistically insignificant for hitters in Rookie ball and the more advanced A-ball levels. Another interesting wrinkle is the “Strikeout_Rate:Age” variable. Basically, this says that strikeout rate matters more for younger players than for older players at this level. Although frequent strikeouts are obviously a bad thing no matter how old you are:

Rplot

The season is less than 50 games old for most teams in the New York-Penn and Northwest Leagues, which makes it a little premature to start analyzing players’ stats. But just for kicks, here’s a look at what KATOH says about this year’s crop of players with at least 100 plate appearances through July 28th. The full list of players can be found here, and you’ll find an excerpt of those who broke the 40% barrier below:

Player Organization Age MLB Probability
Rowan Wick STL 21 82%
Eduard Pinto TEX 19 68%
Marcus Greene TEX 19 60%
Mauricio Dubon BOS 19 59%
Franklin Barreto TOR 18 57%
Christian Arroyo SFG 19 57%
Skyler Ewing SFG 21 56%
Taylor Gushue PIT 20 55%
Domingo Leyba DET 18 55%
Raudy Read WSN 20 53%
Nick Longhi BOS 18 52%
Andrew Reed HOU 21 52%
Danny Mars BOS 20 51%
Amed Rosario NYM 18 49%
Yairo Munoz OAK 19 48%
Seth Spivey TEX 21 47%
Mike Gerber DET 21 47%
Mark Zagunis CHC 21 47%
Kevin Krause PIT 21 46%
Leo Castillo CLE 20 45%
Jordan Luplow PIT 20 45%
Mason Davis MIA 21 40%
Kevin Ross PIT 20 40%
Franklin Navarro DET 19 40%

As we saw with Rookie league hitters, KATOH doesn’t think any of these players are shoo-ins to make it to the majors. Even Rowan Wick, who hit a Bondsian .378/.475/.815 before getting promoted, gets just 82%. This goes to show that SS A-ball stats just aren’t all that meaningful.

Once the season’s over, I’ll re-run everything using the final 2014 stats, which will give us a better sense of which prospects had the most promising years statistically. I also plan to engineer an alternative methodology — to supplement this one — that will take into account how a player performs in the majors, rather than his just getting there. Additionally, I hope to create something similar for projecting pitchers based on their statistical performance. In the meantime, I’ll apply the KATOH model to historical prospects and highlight some of its biggest “hits” and “misses” from years past. Keep an eye out for the next post in the coming days.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.


Dallas Keuchel: Pitching to Strengths

Platoon splits have become a major part of baseball today.  The Athletics have ridden a split of Jaso and Norris to production from their catcher position.  Many left-handed starters have had success against righties and many have struggled.  Over the course of his career Cole Hamels has had more success against RHB than LHB (.294 vs .301 wOBA).  Hamels is known for his best pitch — his change up — which has helped him neutralize RHB throughout his career.

So often when a LHP struggles against RHBs the common fix is to use a change up more often or to improve the change up.  However, for some pitchers this model does not work.  Dallas Keuchel, a pitcher who used a change up as his primary offspeed pitch against RHBs in the beginning of his career struggled still against righties.  As shown in this article, from the time his career began until May 31st of this season Keuchel was one of the worst starting pitchers against RHB.  However, by breaking this down season by season it can be seen that Keuchel’s numbers have actually improved as his career’s progressed.

2012 2013 2014
wOBA .365 .363 .313
K% 10.3 15.0 16.3
HR/9 1.51 1.11 0.40

The key to Keuchel’s increased success against opposite-handed hitters seems to be found in his pitch selection.

2012 2013 2014
FT 36.0 31.2 38.5
SL 0.3 13.2 18.6
CH 20.4 16.5 19.2
FF 19.9 27.3 15.9
FC 11.4 5.7 7.7
CU 12.0 6.1 0.1

Keuchel has been an often-discussed topic on this site this season.  The key to his success this season has been his increased use of his rapidly-improving slider which was covered by Eno Sarris here. As Sarris states the slider will allow Keuchel to have increased success against lefties.  However, looking at Keuchel’s splits this season shows he has improved his numbers against righties significantly. According to PitchF/x data Keuchel has used the slider significantly more against righties this season.  He has done this at the expense of four-seam fastballs opting to throw more two-seamers and sliders.

While he is still using the changeup at around his career averages, his heavy increase of sliders in the biggest difference in his way of attacking hitters.  As his numbers for the season have shown he is limiting home runs and striking out the highest percentage of right-handed hitters in his career.  This has also lead to a significant improvement in his wOBA allowed. This season against righties the slider has produced a better than MLB average whiff rate (18% vs 13.%).

Keuchel provides an blueprint for other left-handed starters who struggle against righties.  Contrary to typical belief that in order to improve against opposite-handed batters pitchers must develop their change up, Keuchel has begun using his best offspeed pitch — the slider — more as a putaway pitch against off-handed batters.  Keuchel has become the poster boy for pitching to strengths, riding his sinking two-seam and slider to a breakout season while significantly improving his platoon splits.

Another pitcher who was mentioned as the worst in the league against righties was Eric Stults.  Stults, a lefty like Keuchel, features both a slider and a change up.  Additionally, much like Keuchel, Stults’s best offspeed pitch according to pitch values is his slider.  However, looking at his pitch selection to RHB he has used the change more than twice as much as the slider since 2007 (25.6% vs 10.8%).  If Stults followed in the footsteps of Keuchel and began to use his best pitch more against opposite-handed hitters it could cause him to minimize his platoon split and make him a better all-around starting pitcher.


Brandon McCarthy: A Different Pitcher

Earlier this month the Yankees took a chance on Brandon McCarthy, trading Vidal Nuno to the Diamondbacks for the sinker-balling right hander.  While McCarthy’s numbers in Arizona were ugly (5.01 ERA), his FIP was much better (3.82).  Through 4 starts the investment the Yankees made has paid off.  McCarthy has pitched to a 3-0 record with a 2.55 ERA.   Since the trade many of McCarthy’s peripherals have not changed much however, there have been a few differences.

K% GB% BB% BABIP
Diamondbacks 20.0 55.3 4.3 .345
Yankees 19.2 50.0 3.9 .333

Two keys factors for pitcher success — K% and BB% — have not changed much with K% decreasing slightly and BB% increasing sightly, although neither can be looked at as the reason McCarthy has been so much better since the deal.  Another important stat to look at is his BABIP, which, has improved a few percentage points since the beginning of the season but is still above his career average of .297.  However, a major difference that can be seen in McCarthy’s numbers since the trade is his GB%.  In recent seasons as McCarthy has began featuring his sinker more his groundball percentage has increased significantly.  the 55.3% he showed with the Diamondbacks was more than 7 percentage points higher than the career high he set in 2013.  Seeing this major change in GB% opens the question of what exactly McCarthy has been doing differently with the Yankees.

During a few of McCarthy’s starts with the Yankees, New York broadcaster Michael Kay has mentioned that McCarthy did not throw his cutter as frequently with the Diamondbacks compared to how often he has used it since the trade.  Looking at his PitchF/x pitch selection data does show an increase in the use of his cutter but it also shows several other interesting trends.

FA% FC% FS% SI% CU%
Diamondbacks 16.4 0.5 56.0 26.1
Yankees 8.6 18.9 56.8 15.4

As Kay has noted McCarthy has used his cutter more frequently but the increase is minimal compared to several other big changes McCarthy has made.  With the Dbacks McCarthy used his curveball more that a quarter of his pitches making it his second most frequently used pitch.  However, once he was traded McCarthy had been using the cutter as his second most common pitch.  However, the significant drop in his curveball usage did not get added to his cutter usage it instead was added to a pitch he did not use in Arizona, a four seam fastball.

Since his trade to New York from Arizona Brandon McCarthy has been a completely different pitcher.  While his ability has not changed and the park has not been much of an improvement (103 for NY 104 for ARI) the biggest difference in McCarthy as a pitcher has been in his pitch selection, once again featuring a four seam fastball while reducing the usage of his curveball.  To this point the move to the Yankees may have been exactly what McCarthy’s career needed simply because it allowed him to change the way he attacks hitters.


Pirates Do Not Need Help Against Left-Handed Pitching

Stats in this post are current up to right before the July 31, 2014 PIT-ARZ game.

The MLB non-waiver trade deadline just passed. I’m not interesting in debating what teams should or should not have done except to say the price for quality players was very high this year. The whole supply & demand, free market thing really worked in the favor of teams that were already out of the post season race. It was suggested that the Pirates needed a right-handed batter (RHB), since they don’t do well against left-handed pitching (LHP). I had my doubts this was really true believing adding an additional RHB won’t improve the team much. MLB teams generally do better against LHP, since most batters are RHB and the RHB/LHP split favors the batter.

Before getting into this, LHP make up only 21% of the Pirates’ season-to-date plate appearances, out of all the problems the Pirates could have making a roster move to address this isn’t necessary unless you are looking to platoon. More on that later.

Looking at the team batting splits, the Pirates have an overall .722 OPS and a LHP .670 OPS. On the surface, it appears they are performing worse against LHP, and I will concede the argument the Pirates HAVE performed worse against LHP so far in 2014, but this shouldn’t continue going forward.

The Pirates have 4,152 plate appearances racked up thru July 30th, but only 867 of them have occurred against LHP (~21%). To put this in perspective, that is equivalent to less than one month of games. How accurate are batting statistics at the end of April? They aren’t. Put simply the Pirates ‘struggles’ against LHP can mostly be attributed to a small sample size.

I went and laid out all the outcomes (1B, BB, 2B, etc.) in a vector of plate appearances and had the computer randomly draw 900 samples from the entire Pirates season and computed the OPS 1000 different times. Then I plotted them below.

Pirates LHP Central Limit Theorem

Due to the central limit theorem the mean should hover around .720 (the overall OPS) and the data should be normally distributed. Because of this I constructed the normal distribution curve and then used that to calculate the probability that a 900 plate-appearance sample can be drawn from the Pirates’ total plate appearances. It turns out 9% of the time the program will select plate appearances that total a < .670 OPS. 9% isn’t that likely, but it is not outrageous to conclude the Pirates’ low vsLHP OPS is due to small sample size.

This is not just applicable to LHP vs overall splits, but any low-percentage split including RISP. I wrote about this previously and came to a similar conclusion.

The composite distribution curves below illustrate what happens when sample size increases and why small small sizes are problematic. The vertical line is the .670 OPS mark. On the 900-sample distribution (vs LHP) there is a 9% probability of drawing a .670 OPS from the Pirates’ total plate appearances. This is the area underneath the curve to the left of the red line. Using the 3000-sample distribution curve, it’s 0.0016%. There is barely any area under the 3000-PA curve at that point, and this is a huge difference. (3000 samples are approximately how many the team has had against RHP.)

Small Sample Size Comparison

One more graph! This is a histogram of the differences between the LHP OPS and the overall OPS. The Pirates are on the low end of it. Not great, but there’s a lot of variation there.

Team OPS Difference

Switching from statistics to baseball, the Pirates have the second-fewest plate appearances against LHP in MLB. They are 11-9 in games started by a LHP. That alone should discount the poor-performance-against-LHP argument, but obviously the team batting stats suggests that they are and it has been woven into a narrative.

Looking closely at the Pirates’ roster there are many solid RHBs, McCutchen (their best hitter), Martin, Marte, Sanchez, and Mercer/Harrison are pretty good against lefties. Now, some of these player are underperforming against LHP this year, but this is where the small sample size comes in again. You wouldn’t determine any of these batters lost their platoon advantage after only 80 plate appearances. Going forward almost all of these bats should regress to their normal platoon splits.

Pedro Alvarez, Gregory Polanco, Ike Davis. Their platoon splits are pretty atrocious both for 2014 and career-wise. For example, Alvarez has a .787 OPS vs RHP and a .517 OPS against LHP this year. I don’t want to get into analyzing what’s wrong with the Pirates’ left-handed bats, except to say they are terrible against LHP. The argument should change from the Pirates don’t do well against LHP to the Pirates’ left-handed batters are terrible against LHP.

What can be done about this? The simple answer is to get better left-handed batters. Since that’s not really possible, the next best option would be platooning the left-handed batters. Ike Davis is already platooned with Gaby Sanchez, and Pedro Alvarez is barely starting any games. Polanco has regressed from his debut, but I think the best idea is for him to play everyday and deal with LOOGY relievers. I also don’t know how many fans actually want to see or are suggesting that he’s should be platooned. With all this in mind I’m not quite sure what acquiring a right-handed bat would accomplish. The Pirates are already trying to find a place for RHB Josh Harrison to play. He’s been having a good season, no matter what you think about Harrison. Furthermore, the Pirates have a guy who’s been killing LHP this year and has decent splits against them for his career. And that’s Jose Tabata.

Bottom line, adding a RHB wouldn’t help much because the team splits are still a small sample size against LHP. Beyond the statistics, the two big left-handed bats have terrible splits against LHP, and these problems have been already addressed by platooning and benching.