Archive for Research

MLB Past and Future Payrolls

I’m a big fan of Bill Simmons’ BS Report podcast. Some of my favorite parts are when Bill talks about trade possibilities between teams. It’s always fun to try and step into a general manager’s shoes and imagine what they can and can’t do to improve their teams. During one of these shows, Jonah Keri was on, and he and Bill were doing a pretty good job of breaking down the options that some MLB teams had in the coming years. It seemed like Jonah had a great command of the restrictions on some of these teams and even what the free agent market is going to look like at various points in the future. I found myself trying to picture and organize all this information in my head. I was inspired to map all this out in a big visualization.

Also, I just wanted to find out how screwed my beloved Phillies are in the coming years.

The image below is a link to the visualization:

MLB Payrolls Thumbnail

The first thing you can do is to click the arrows or use the left and right arrow keys to scroll through past and future years. I collected data back to 1998, when the Baltimore Orioles led the league in payroll with players like Mike Mussina and Rafael Palmeiro. Scrolling back to the present day shows a lot of story lines: how the Yankees expanded their payroll way faster than the rest of the league in the early 2000s, fire sales of the Marlins in 2006 and to a lesser extent in 2013, and the Dodgers’ rapid leapfrog to post the absolute largest payroll this year.

When you scroll to future years, the 2013 payroll hangs around as a ghost image to provide a rough benchmark of what you might expect the team to eventually pay. The solid bars drop down to show the contracts that the teams are currently obligated to pay in that particular year. Here, you can clearly see the Dodgers and Angels leading the league in earmarked money over the next few seasons. Going all the way to 2023 shows that the Reds have actually signed the longest contract so far.

Clicking on a team in that upper chart will show a time series of that team’s payrolls over the years broken out by player. For example, clicking on the Reds shows large green boxes way out into the future. Clicking on any of those boxes will show you that first baseman Joey Votto can expect to be paid $25M to play baseball in the year 2023. Each color in these bottom charts corresponds to a position.

There are some caveats here. I grabbed the data from Baseball Reference who gets their data from Cot’s Baseball Contracts. As far as I can tell, the data is not updated very regularly because I know of a couple contract extensions that have not made it onto their pages yet. Those contracts won’t be displayed here.

Also, when a player misses a whole season to injury, that player’s salary doesn’t show up on the Baseball Reference page. I took care to add the biggest instances of these missed seasons back into the data by hand, but I’m sure I didn’t get them all. There’s also the question of whether those salaries really should be here. I believe most teams take out insurance policies on players and thus they aren’t responsible for paying injured players. Since I have no details about that sort of thing, I just tried to include all the missed seasons I could find.

Lastly, teams sometimes agree to pay part of a player’s salary when they trade them away to another team. A good recent example of that is the Cubs paying most of Alfonso Soriano’s salary while he plays for the Yankees. The Baseball Reference site has good information about these arrangements in the current and future years. But the site does not have information about past arrangements. Again, I took care of a couple of the biggest discrepancies by hand (hello Mike Hampton!), but I’m sure there are lots still in there.

Despite those couple issues, I believe this chart does a great job of showing a snapshot of the MLB economy. I learned a lot just clicking around the whole thing while building it. I think it’s a great indication that you’re building something interesting if you constantly get distracted playing with the thing instead of working on it.


The Folly of Pitching to Contact

‘Pitching to contact’ and ‘throwing ground balls’ are classic baseball buzzwords. Twins pitching coach Rick Anderson has essentially built a career around this philosophy. It seems like every time a young pitching phenom arrives and starts striking hitters out, people start talking about how he needs to pitch to contact. The strategy has been around since this guy played, and while Kirk Rueter pitched in his last game in 2005, Kevin Correia is still hanging around and Jeremy Guthrie signed a three-year deal last offseason. And, lest we forget, Aaron Sele got a Hall of Fame vote. To take a more in-depth look at the merits of pitching to contact I grouped all 394 starting pitchers from 2002 onward (the batted ball era) who had thrown 200 or more innings, and organized them by Contact% into eight groups. The following spreadsheet details the results of my study. Groups 1-4 are classified as contact pitchers, while groups 5-8 are strikeout pitchers.

Group Contact range xFIP- ERA- WAR/200 IP RA9-WAR/200 IP GB% K% BB% HR% BABIP FB velo FB% Pitches/IP
MLB 80.0—82.2 101 103 2.4 2.3 43.0 16.8 7.9 2.8 0.295 90.3 59.3 16.2
Group 1 85.2—89.9 109 112 1.7 1.5 44.7 11.8 6.8 2.9 0.299 89.2 64.3 15.8
Group 2 84.0—85.2 106 110 2.1 2.0 43.7 13.8 7.2 2.8 0.300 89.6 61.7 16.0
Group 3 83.1—84.0 106 112 2.0 1.7 44.0 14.6 7.3 2.8 0.295 89.3 59.0 15.9
Group 4 82.1—83.1 105 110 2.4 2.0 42.4 15.6 7.6 2.8 0.299 89.4 60.1 16.2
Group 5 81.0—82.0 105 106 2.3 2.3 42.4 16.8 8.3 2.7 0.290 90.0 60.3 16.4
Group 6 79.7—80.9 100 101 3.0 3.0 43.2 18.4 7.5 2.7 0.292 90.5 59.0 16.0
Group 7 78.0—79.6 98 98 3.0 3.1 43.1 19.5 8.2 2.6 0.290 91.1 58.8 16.2
Group 8 71.3—77.8 89 90 3.8 3.7 42.1 22.7 8.2 2.5 0.290 91.9 58.5 16.2

Of the Group 1 pitchers, only 5 had an xFIP- better than the league average, and only 6 had an ERA- better than league average.  Two of these were posted by aging control artists Rick Reed and David Wells, who had success on the strength of their walk rates of 4.0% and 3.7%, respectively. Chien-Ming Wang rode his 59.5 GB% to a 98 xFIP- and 99 ERA-. Overall, Nate Cornejo was more typical of the group than these three. xFIP- went down with decreasing contact, and except for a small blip between groups 2 and 3 (both contact groups), so did ERA-.

There is a strong connection here between fastball velocity and contact rates, but there is also a strong connection between fastball usage and contact rates. Group 1 had both the slowest average fastballs and the highest use of fastballs. As anyone watching Gerrit Cole and the Pirates can tell, contact rate has almost as much to do with fastball usage as fastball velocity.

Though the contact pitchers had lower walk rates than the strikeout groups, their strikeout rates were far below average. The separation between strikeout and walk rates was better for the strikeout pitchers, with an average separation of 11.3, compared to 6.7 for the contact pitchers. In terms of K/BB, the strikeout pitchers posted a 2.4 K/BB, and the contact pitchers were at 1.9 K/BB. The old adage that groundball pitchers prevent home runs did not bear out. While the contact pitchers had a groundball rate of 43.7% compared to 42.7% for the strikeout pitchers, the contact pitchers had a HR% of 2.8, and the strikeout pitchers had a HR% of 2.6. Home runs are connected to contact.

The contact pitchers also slightly underachieved their peripherals. The ERA- for the contact groups was an average of 4.5 points higher than their xFIP-, while the ERA- for the strikeout groups was on average less than 1 point higher. The contact pitchers had an average BABIP of .298 compared to the .291 for the strikeout pitchers. High strikeout pitchers can often sustain slightly lower BABIP than their counterparts.

The connection between contact and efficiency is slight. The difference in Pitches/IP was the biggest between group 1 and group 5. The difference of 0.6 Pitches/IP translates to only 120 pitches per 200 IP. While the pitch count and innings limit debate has overtaken the nature of starting pitching, pitching to contact does not seem to be the answer. Teams and pitching coaches that are advocating pitching to contact as a means to pitch longer in games are essentially sacrificing a lot of quality for a tiny amount of quantity. And with 12 or 13 man pitching staffs being the rule of the day, this strategy seems absurd.

Despite mounting evidence that pitching to contact is a futile strategy, teams keep encouraging their young pitchers to stash away their strikeout stuff in the name of efficiency. Young pitchers Nathan Eovaldi and Gerrit Cole currently own the 3rd and 4th fastest fastballs among starting pitchers. Both of them, and Cole in particular, posted very high strikeout rates in the minor leagues. Yet both of them own strikeout rates well below the NL average, and Cole and Eovaldi’s respective xFIP- rates of 99 and 101 are decidedly average.  I know, almost anybody with a good fastball can rack up a lot of strikeouts in the minors, and Eovaldi in particular has a limited repertoire that may keep him from reaching his potential. But shouldn’t young pitchers focus on developing strikeout pitches rather than trying to get ground balls? After all, fastball velocity peaks early and Cole and Eovaldi will probably have a tougher time getting outs on contact when they aren’t throwing 96. While Mike Pelfrey has carved out a decent career for himself, I’m sure most teams hope for more out of their top pitching prospects.


An Introduction to GRIT

Earlier in the month I had an idea. It all stemmed from the idea of quantifying the un-quantifiable. I was going to record grit.

A lot of times we hear about how gritty a player is, but it’s tossed around with no real proof. Sure Nick Punto dives into first a lot, but is that really more gritty than stupid? Is a guy like David Eckstein really the grittiest of all gritty players, or can it be a guy we don’t really notice?

To figure all of this out I, along with some help, wrote a formula. The formula is imperfect, because of a lack of reliable sources for things like headfirst slides and broken-up double plays, but it tries and does its job. The formula is as follows:

(((InfH+1stS3+(.5*CS+SB2+1.5*SB3+3*SBH))(2*P/PA+.5*Foul/S%))/(HR+1)+(.1*PA/Seasons)+PitchingAppearances

Where InfH stands for Infield Hits and 1stS3 means first to third on a single, we have found a way to see a players GRIT (Game Rating In Testosterone.) All this stat is designed to show is who works harder to score a run for their team, it doesn’t show you who is better or worse, but it does show who tries.

Using this formula my small team of experts has found David Eckstein to have a career GRIT of 172.16, which is very impressive over a 10-year career, but it’s no Juan Pierre, who has amassed a career GRIT of, wait for it, 1582.

We also found the difference between Martin Prado and Justin Upton, who was the subject of criticism from Diamondbacks GM Kevin Towers who said he wasn’t gritty enough prior to trading him for Prado. We found out that Kevin Towers may have been wrong.

Using their numbers the formula says that Prado has put together a GRIT of 57.93 in his career, where Upton has a GRIT of 68.65, despite playing in one less season. So, Kevin Towers, you may need to rethink your strategy.

Also invented was TeamGRIT, a stat that uses numerous numbers to calculate how hard a team works for each run.

A disclaimer here before I list the GRITs: I am not trying to say that some teams work harder than others, nor am I saying that a high GRIT is more or less valuable than a low GRIT, all these numbers illustrate is that some teams are more comfortable with power numbers to win games, while others are more inclined to small ball.

The formula used is

(((InfH+1.5*BuntHits)+1stS3+2ndDH(.5*CS+SB2+1.5*SB3+3*SBH)(Pitches/PA+.5*Fouls/Strike%)+(GIDPinduced+OFAssists))/(HR+.5*HRA))+(.1*PA/GamesPlayed)

The following are the AL leaders prior to games played on August 7th 2013

Royals – 90.57 (9th in wins)

Indians – 74.77 (6th in wins)

Red Sox – 73.92 (1st in wins)

A’s – 70.57 (5th in wins)

Blue Jays – 61.73 (10th in wins)

Rangers – 56.52 (4th in wins)

Astros – 55.70 (15th in wins)

White Sox – 51.62 (14th in wins)

Rays – 51.10 (2nd in wins)

Angels – 48.98 (12th in wins)

Twins – 46.97 (13th in wins)

Yankees – 45.59 (8th in wins)

Orioles – 40.49 (7th in wins)

Tigers – 30.30 (3rd in wins)

Mariners – 25.90 (11th in wins)

The most interesting numbers to me are those of the Royals and the Tigers. On opposite ends of the spectrum, one is a team that absolutely crushes the ball, everything that comes their way, the Tigers hit it, and they’re fine with it. They don’t feel the need to manufacture runs the way that the Royals do. The Royals seem to grind more to score their runs. More than any other team in the league by a large margin. They, like the Astros at 55 GRITs, are doing everything in their power to score more runs. It doesn’t always work, but there’s something to be said about a team that works to get extra runs and extra outs. If anything, they’re less comfortable with a lead than the Tigers. That isn’t to say the Tigers get lazy, just that they tend to not have to try so much.

In the NL there appears to be a negative correlation between GRIT and wins; I assure you, this is just a coincidence.

NL leaders prior to games played on August 7th 2013

Pirates – 80.83 (2nd in wins)

Rockies – 77.08 (8th in wins)

Marlins – 76.31 (15th in wins)

Brewers – 73.57 (14th in wins)

Mets – 67.33 (11th in wins)

Giants – 64.21 (12th in wins)

Padres – 62.53 (9th in wins)

Phillies – 57.06 (10th in wins)

Dodgers – 51.83 (4th in wins)

Cardinals – 47.67 (3rd in wins)

Nationals – 45.03 (7th in wins)

Cubs – 44.79 (13th in wins)

Diamondbacks – 42.38 (6th in wins)

Reds – 39.99 (5th in wins)

Braves – 31.12 (1st in wins)

The only thing these numbers definitively tell us is that there is a lot more GRIT in the American League, which is a deviation from the stereotype of hard-hitting AL clubs. The longball is less important in the American League, whereas manufacturing runs is a lot more emphasized. In the National League one team stands out from the pack: The Pirates.

They have a GRIT of 80.83 while also being in 2nd place, they are the only team in the top 5 of wins who is also in the top 5 of GRIT. The Pirates also hit a fair amount of home runs, but that’s not enough for them. They aren’t comfortable with just a lead. They want more of a lead. They try their damnedest to score more runs than anyone else by any means necessary. Is this because they spent so many years as a losing team? Possibly, but that’s just a theory.

As I said before, these numbers are not proof that any team is better than another, nor are they proof than any player is better than another, just that some teams and players are GRITtier than others.

So there you have it, your introduction to GRIT.


In Defense of Striking Out: Ideal Strikeout Rates for Hitters

Strikeout rates have climbed since 2006, while league wOBA has dropped.  Responses to ballooning strikeout rates have been mixed. One response is to trade one of your best hitters, while another is to lead the MLB in home runs. Some clubs are more averse to strikeouts than others.

It’s no secret that Diamondbacks GM Kevin Towers hates strikeouts. Since taking over in 2010, Towers has discarded every Diamondbacks player who struck out 100 times or more from the 2010 club that set the major-league record for strikeouts in a season by striking out 24.7% of the time. His 2013 squad’s 18.5% strikeout rate is 10th-lowest in the majors. However, the decreased strikeout rate has not resulted in increased offense. The 2010 D-Backs scored 4.40 runs per game, posting a .325 wOBA and 93 wRC+, a shade better than that of the more contact-driven 2013 Diamondbacks who currently average 4.17 runs per game with a .313 wOBA and 92 wRC+. While the 2010 team had the 4th-best walk rate at 9.5%, the 2013 Diamondbacks are just 13th at 8.1%. Though the 2010 Diamondbacks struck out more, they also walked more, and made more quality contact, as shown by a .312 BABIP% and .166 ISO which were 2nd and 4th in the majors, respectively. The 2013 team has a .301 BABIP% and .135 ISO, good for 10th and 23rd in the majors. A look at the plate discipline numbers shows that the 2013 Diamondbacks swing at more pitches out of the strike zone and make more contact on those swings than the 2010 team.

2010 O-Swing% Z-Swing% Swing% O-Contact% Z-Contact% Contact% Zone% F-Strike% SwStr%
Diamondbacks 27.6% 64.7% 44.6% 57.9% 84.2% 75.4% 45.8% 58.5% 10.6%
2013 O-Swing% Z-Swing% Swing% O-Contact% Z-Contact% Contact% Zone% F-Strike% SwStr%
Diamondbacks 31.4% 64.8% 46.4% 68.6% 87.8% 80.6% 44.9% 59.9% 8.7%

If a hitter can cut his strikeout rate while maintaining his walk rate and power production, that is special. However, there is usually a tradeoff between power/walks and contact. After all, not everyone can be vintage Albert Pujols. To dig deeper into the balance between power and contact, I separated MLB hitters by strikeout percentage into five groups, with 30 hitters per group. I limited the study to qualified hitters, to eliminate the presence of pitchers and small sample size hitters. Not surprisingly, the first group was the clear leader in home run rate.

MLB K% BB% HR% wOBA BABIP% WAR Total PA
  19.7 7.9 2.6 0.313 0.296  
Group 1 K% BB% HR% wOBA BABIP% WAR Total PA
  27.2 8.7 4.4 0.336 0.305 61.7 13008
Group 2 K% BB% HR% wOBA BABIP% WAR Total PA
  20.7 8.6 2.5 0.337 0.323 65.9 12962
Group 3 K% BB% HR% wOBA BABIP% WAR Total PA
  17.1 8.1 3.0 0.342 0.313 68.9 13510
Group 4 K% BB% HR% wOBA BABIP% WAR Total PA
  14.3 8.5 2.4 0.342 0.313 71.5 13895
Group 5 K% BB% HR% wOBA BABIP% WAR Total PA
  10.4 7.0 2.1 0.317 0.284 51.9 13187

I included WAR even though it includes defensive and baserunning values because I thought that the contact-heavy hitters in group 5 might make up for their offensive deficiencies by being better defenders or baserunners. However, the total WAR for each group tracked offensive production for the most part. The first four groups are very close together with regards to wOBA. As I expected, the most strikeout-heavy group owned the highest walk and home run rates. Group 2 made up for its lower home run rate with a higher BABIP%. The rates of doubles were very close in all groups, ranging from 4.5% in group 5 to 5.2% in group 3. Group 5 had the lowest homerun and walk rates. Despite group 5’s ability to put the ball in play, the contact generated was of a lesser quality due to higher contact rates on pitches out of the zone. With the exception of Edwin Encarnacion, Adrian Beltre, and Buster Posey, none of the hitters in group 5 had more than 20 weighted runs above average (wRAA). The group average was 0.9 wRAA. Though group 5 had the lowest WAR of any group by a wide margin, they had the 3rd most plate appearances.

As the above table shows, there is not a significant negative connection between higher strikeout rates and offensive production. In fact, the most contact-heavy hitters are far less productive offensively than their more strikeout-prone counterparts. Of course, the plate approach of Chris Davis would not work for Marco Scutaro and vice versa. The idea of an ideal groundball rate for individual hitters has been posited. I would suggest that there is also an ideal strikeout rate for individual hitters. The following is a list of five hitters who I believe would benefit from a more or less contact-friendly approach.

Matt Holliday has trimmed his strikeout rate from 19.2% in 2012 to 14.4% this year. However, he has also trimmed his wRC+ from 141 to 137. His BABIP% is down from .337 to .312, but this is likely due to a less formidable batted ball profile, as his xBABIP% has dropped from .328 to .304. His Line Drive/Infield Fly ratio is down from 89/11 to 58/16. Furthermore, his home runs on contact has dropped from 5.7% to 4.8% and his overall homerun rate has dropped from 4.9% to 3.5%. His flyball distance has decreased from 305.15 to 294.66. A look at the PITCHf/x data shows that Holliday is swinging more and making more contact on those swings. His Swing% has jumped from 47.2 to 49.9 and his Contact% has gone from 78.5 to 81.8. His O-contact% has gone from 65.0 to 66.1 and his Z-contact from 86.1 to 89.0. While Holliday is striking out less while walking at the same rate, his swings have been noticeably less aggressive, and his overall offensive production is down.

Mike Moustakas has reduced his strikeouts even more than Matt Holliday, going from 20.2% in 2012 to 13.6% in 2012 while essentially maintaining his walk rate. However, his offensive production is down significantly, from 90 wRC+ to 79 wRC+. His home run rate has dropped from 3.3% to 2.6%, and his home runs on contact is a paltry 3.3% compared to 4.5% in 2012. His fly ball distance has dropped to 279.2 to 274.6. Moustakas’ increased contact rate has come largely from swings on balls outside of the zone, as he has seen as increase in O-Contact% from 63.7 to 74.3. During GM Dayton Moore’s tenure, the Royals have had an emphasis on putting the ball into play. Their 16.4 K% since 2007 is the lowest in the league over that time frame. However, they have only a 92 wRC+ over that span, good for 21st in the league and their BB% of 7.0 is dead last. While the Royals’ emphasis on contact appears to have helped Eric Hosmer, its application to Moustakas has had a negative impact on his production.

Adrian Gonzalez has undergone a significant change since being traded from the Padres. While playing in the spacious Petco Park Gonzalez posted home run rates between 3.8-5.9% and walk rates of 8.2-17.5%. His wRC+ numbers ranged from 123 to 156. His home run rate dipped to 3.8% in his first year at Fenway, his lowest since his first full season, but a still solid walk rate of 10.3% and a .380 BABIP% led him to an excellent 154 wRC+. Since then, his ability to draw walks and hit for power have plummeted. From 2012 to the present, Gonzalez has a 2.9 HR% and a 6.7 BB%. While Gonzalez has posted his three best contact rates since 2011, his O-Contact% has been between 70.1 and 75.9, well above his career rate of 67.1. Though Gonzalez has slightly improved his power production from 2012, his 126 wRC+ remains a far cry from his peak years. In Gonzalez’ best years, he had strikeout rates in the 17-20% range. He can still be a productive player, but the make-contact approach has taken away much of his power and walks.

Asdrubal Cabrera is posting career high strikeout and fly-ball rates in 2013. Unfortunately for him, this approach has not led to an increased power output, as his home runs on contact, average fly ball distance, and ISO are virtually unchanged from 2012. The 22.0% strikeout rate has conspired to cut his wRC+ from 113 to 91. In an effort to hit for more power, Cabrera’s contact rate has gone from 84.0% to 78.6%, a career-low figure, and his walk rate has dropped from 8.4% to 5.8%, also a career low. Though Cabrera’s BABIP%  has dropped from .303 to .286, his xBABIP% is up from .319 to .334, suggesting that he can be productive when he puts the ball in play. Not yet 28, it is time for the Indians shortstop to go back to the plate approach that made him a productive hitter in 2009-12, controlling the strike zone with a more level swing. In picture form, here is a swing from 2011 when Cabrera had a K% of 17.8 and a 119 wRC+.

 Yoenis Cespedes has improved his home runs on contact from 5.9% in 2012 to 6.4% in 2013. However, because of the jump in his strikeout rate from 18.9% to 23.9% his overall home-run rate remains at 4.3% and his ISO is basically the same. His wRC+ is only 96, compared to 136 in his debut season. Cespedes is hitting more fly balls at 47.7% compared to 39.9%, and their average distance is the same, but those fly balls have come at the expense of line drives and ground balls, which has caused his xBABIP% to sink from .305 to .279 and his actual BABIP% to go from .326 to .255. Because Cespedes is relatively new to the league, I wanted to see if pitchers are attacking him differently. However, Cespedes has been pitched to in largely the same fashion as 2012, but with slightly more fastballs and less changeups. Cespedes has been less able to hit those fastballs, as he is only 0.37 runs above average per 100 fastballs, compared to 1.71 last year. Cespedes has been seeing slightly more pitches out of the zone, as his Zone% has decreased from 46.2% to 45.1%, but his O-Zone Swing% is mostly the same. For the most part, Cespedes has been getting beat in the strike zone, as his Z-Contact% down from 84.2% to 81.0%. Because Cespedes’ raw power and athleticism are so impressive, there is a temptation to be overaggressive at the plate. He will likely always be an aggressive hitter, but if he can cut his strikeout rate to his 2012 level, it will be worth the decrease in home runs on contact.

Unlike many people, I do not think that strikeouts are inherently bad. For some hitters, the increased strikeouts are the cost of home runs and walks. Other hitters would be well served to put more balls in play while suffering a loss of power. However, start implementing a one-size fits all approach of strikeout avoidance and you’ll end up like the Royals.


Yasiel Puig’s Batting Title

I think one of the most fun parts of baseball is this part of the year; as we wind down, you can start to root for unlikely things to happen. For example, I’m kind of hoping the Pirates manage to lose at an .800+ clip and keep their sub-.500 streak alive. I’d love to see the Royals make the playoffs. Finally, I’d love to see Yasiel Puig win the NL batting title.

The rules of the game are that you have to have 502 plate appearances to win a batting title. If you’re short, you’re given an 0-fer for the rest. So if Puig finished with 492 PAs, he’d take an 0-for-10 for the purposes of the batting title. Right now, Puig is projected by STEAMER to finish the year with 435 PAs. We’ll accept that number for now, but given that number, let’s think about how likely it is that he has a high enough batting average to win the title.

The first step is to figure out the mark he needs. Let’s go with STEAMER again, and we see Michael Cuddyer, Joey Votto, Yadier Molina, and Chris Johnson all projected to finish at about .320. Let’s assume that one of those four players finishes right at his 87.5% projection (the middle of the highest quartile)…I’ll say Joey Votto, who is projected to go .302 for the rest of the year (the highest of the bunch). Using the binomial distribution, there’s a 16.2% chance Votto finishes 51/149 or better given his “true” .302 batting average. We’ll say that that is the target Puig has to reach: Votto (or one of the others) adds something like 51/149 to his current stats, for a .329 batting average.

What are the chances Puig reaches that clip? To keep it simple, let’s assume STEAMER is right on the number of PAs, ABs, and Puig’s true chance of getting a hit, and then figure out Puig’s chance of getting enough hits to finish at .329 or better. He’s going to end the year with 435 PAs and 390 ABs, if he keeps up his current pace. To that, add an 0-for-67 to get him up to 502 PAs. So he needs enough hits to have a .329 batting average in 457 ABs. That number is 150. He currently has 85 hits in 224 ABs, so for the rest of the year he needs 65 hits in 166 ABs.

Given that STEAMER projects a .293 batting average for the rest of the year, it’s pretty unlikely that he’ll hit at a .392 clip. In fact, his chances of doing so are only about 0.4%, using the binomial model.

What could help his chances? First, there’s no guarantee Votto/Johnson/Molina will get hot enough to make the mark .329. If we drop the required average to .320, using the same method as above, he’d only need 146 hits, which raises his chance to about 2.3%.

Another possibility is that he’s a better hitter than STEAMER projects. If he only regresses to .310, which would make him one of the better hitters in the league admittedly, he has about a 1.6% chance of winning the batting title. And if he is truly a .310 hitter, AND none of the other players near the top of the leaderboard stay hot enough to beat .320, Puig has a whopping 6.6% chance of winning the batting title.

Yeah, I know batting average is stupid. And I know this is a minuscule chance. But isn’t it amazing that Puig has a chance to do something like this at all, after making his debut in June? Baseball!


Mark Reynolds and his Ilk

Note: I have no idea if I’m the first to do this, but quite frankly I don’t care.

Today, it was reported that seven-year veteran and noted ump hater* Mark Reynolds was released by the Indians. As an Orioles fan who enjoyed watching Reynolds, this was disheartening for me–I’ve always liked TTO guys, and it’s hard to find a more TTO guy than Reynolds**. However, I was (and am) also a fan of the Orioles, meaning I would want them to win, preferably as often as possible. This means that starting a player with a career WAR of 7.4 (in nearly 4000 plate appearances , no less) probably isn’t the best way to accomplish that goal.

Now, about that WAR…

As of  Thursday, August 8th, 2013 (i.e. the day of his release), Reynolds is 322nd all-time in homers, and has nearly 200–for the record, there are 311 players with 200 dingers, as of the aforementioned date. Anyone who has watched Reynolds knows that he has formidable power, and his stats, at least for his career, reflect that–his .232 career ISO*** would rank 16th in the majors this year. However, that power comes at a price: namely, every other aspect of his game. Like, seriously. Plate discipline, baserunning, fielding, everything. The end result of this is the aforementioned WAR value, which translates to 1.2 WAR per 600 plate appearances; as a point of reference, these scrubs have WAR/600PA numbers of 1.9 and 1.8, respectively.

Now, the main point to get out of this is that Reynolds–a player with nearly 200 career long-balls, considered by the small-minded to be the symbol of all success–has a single-fucking-digit career WAR, when some players are able to get double-digits in a single season. This led me to the question: how many other players, of the 322 with 200 round-trippers, can fit this dubious distinction? This question led me to the answer: three. They are listed below in order of lowest to highest WAR, for your amusement, along with my best guess as to why this person was so shitty.

Jose Guillen–214 career bombs; 4.5 career WAR (.4 per 600 PAs)(!)

Guillen is  remembered for a few things:

1. Pulling a reverse Bedard (i.e. protesting when his manager removes him from the game) and being suspended for the Angels’ 2004 playoff trip; this actually happened during a decent season for him (3.0 WAR), so don’t be too sure he wouldn’t have helped them had he participated.

2. Holding that grudge with him**** for the rest of his career.

3. Being an all-around genial person.

3. His exceptional rookie year, which earned him comparisons to the immortal Neifi Perez, in addition to being, as of last June, the worst season for a right fielder ever.

4. Being, y’know, a generally horrible baseball player.

For all the talk recently of Jesus Montero being terrible despite PED usage, Guillen was pretty bad, and he juiced, too. In terms of career numbers, he had a triple-slash of .270/.321/.440, and a .330 wOBA; while he never really played in a hitter’s ballpark (he had brief stops in Cincinnati and Arizona), he still played in a hitter’s era, meaning his career wRC+ was only 98. His D, however, was what truly set him apart: -56.7 fRAA for his career, and it would’ve been even worse, if not for a ridiculously fluky 2005 (12.5 fRAA, by far the highest of his career). He also wasn’t a particularly good baserunner (-16.5 BsR).

He didn’t strike out nearly as much as Reynolds (17.2% career), but he also didn’t walk nearly as much (5% career), and his ISO was considerably lower (.169).

Dante Bichette–274 career four-baggers; 8.9 career WAR (.8 per 600 PAs)

The career of Bichette was best epitomized by his unfathomable 1999 season; I’ll provide a quick summary. Bichette had a triple-slash of .298/.354/.541 over 659 PAs, which translated to a .376 wOBA. A casual sabermetrician would look at that figure and say, “Well shoot, that’s pretty darn good!”, not knowing that it came while he played for Colorado, in 1999 (i.e. one of only three seasons in MLB history where teams averaged more than 5 runs a game). Thus, after adjusting for park and league effects, Bichette’s wRC+ for that season sat at a mere 100–he was an average hitter. For the sake of comparison, Josh Donaldson has a .372 wOBA for the Athletics this year–and a 139 wRC+. As Mr. Remington points out in the article*****, Bichette in 1999 was one of just two seasons where a hitter had a .370 wOBA or higher and a wRC+ of 100 or lower; the other season was Jeff Cirillo in 2000, playing for–you guessed it–the Rockies.

Focusing on Bichette’s career as a whole, he hit .299/.336/.499, for a .359 wOBA; however, because a lot of that was spent in Colorado, his career wRC+ was a mere 104; this, combined with poor defense (career -92 fRAA) and relatively poor baserunning (career -1.2 BsR), gave him the undesirable WAR seen above.

Bichette’s K% and BB% were somewhat similar to Guillen (15.7% and 5.2%, respectively), meaning they were considerably lower than Reynolds’ numbers; his ISO (.200) was considerably lower than Reynolds, though not as low as Guillen.

Deron Johnson–245 career circuit clouts, 9.7 career WAR (.9 per 600 PA’s)

The only old (i.e. pre-UZR) player who fit the criteria, Johnson was, allegedly, described by Pete Rose as the hardest ball-hitter he had ever seen. It’s too bad he struck out in nearly 20% of his plate appearances (high for the time period, when the average was about 15%).

Johnson only had one 4-win season (4.3 in 1965 for the Reds); in that year, he had a .370 wOBA, albeit with -9 fRAA. Fielding was his main problem (career -63 fRAA); his career triple-slash of .244/.311/.420 comes out to a .326 wOBA and a decent 102 wRC+, and his BsR was only -3.0. His K% and BB% (8.8% and 19.9%, respectively) were higher than the averages for his era, but not to the degree of Reynolds’, though his ISO (.176) was pretty high for the time.

He’s the least spectacular of the bunch, probably because he played back in the 60’s and, therefore, is completely insignificant.

 

 

So what was the point of this? To use as many variations of the word “home run” as possible?****** Possibly. To find the closest companions to a favorite player? Possibly. Was this whole thing completely, utterly pointless? Definitely.

————————————————————————————————–

*He actually made some good points in the rant. Here’s the quote that really resounded with me: “…It’s a shame [the umpires] don’t have accountability. They don’t have any, if they make a bad call, it’s like, ‘Ho-hum, next day is coming.’ If we have a bad couple of games we get benched or we get sent down. They have nobody breathing down their throats. They have nobody, they are just secure in their jobs.”

**To be fair, Reynolds acknowledges his approach may not always be the best.

***Reynolds’ and Jose Reyes‘ 2011 seasons are a perfect example of why SLG% is overrated. At the conclusion of the season in question, Reynolds’ SLG% was 10 points higher than Reynolds’ (.493 to .483), despite Reynolds having an ISO a HUNDRED AND SIX points higher (.262 to .156). Now, in the context of this season, was Reynolds a better overall hitter? Certainly not (in case you forgot, this was Reyes’ last year with the Mets, when he had a phenomenal year, leading the league in batting average, etc.). Was Reynolds a better power hitter? Certainly yes. Hmmmmm…not sure if “Certainly yes” is grammatically correct. Whatever.

****The quote from Guillen should really win an award for Worst Butchering of the English Language (particularly the first sentence).

*****In the article, Remington cites Bichette as having a 98 wRC+ in 1999, when on his player page, it lists him as having a 100 wRC+. Have the park or league factors changed since last year?

******I used homers, dingers, long-balls, round-trippers, bombs, four-baggers, and circuit clouts. Thanks to this post for supplying me.


What Kind of A-Rod Will We See?

News Today

     The Yankees welcomed Third Baseman Alex Rodriguez in Chicago today to make his season debut tonight against the White Sox.  A-Rod is expected to be in the lineup, returning to his original position for the club.  The other major event on Monday is Commissioner Bud Selig announcing the suspensions of 12 players for 50 games, and Rodriguez’s 211 game suspension, which takes effect  on Thursday, August 8th.  This has been appealed by A-Rod already as reported by the MLB Twitter account.   A-Rod will be on the active roster through the appeal process, and  should be able to play a few weeks before his status is ultimately decided on, so what can we expect to see from him on the field?

Click for a Full-size Image

Looking at his Performance

     Rodriguez played in 15 minor league rehab games to ease his return to the big leagues from off-season hip surgery. In those games, he hit .214 with a double, and 3 HR, while driving in 10 runs.  In this extremely small sample size of varying levels, it’s difficult to make any reasonable assessment.  However, we can look at a few peripheral statistics to try and gauge they type of A-Rod we’re going to see.  In his 51 minor league plate appearances, A-Rod struck out 13 times and walked 6.  This leads to a 25.5% K-Rate and an 11.8% BB-Rate.  The small sample size accounts for a large amount of error, but these numbers don’t appear to be too drastically apart from his usual self.  A-Rod’s career K-Rate is 18.2%, and it is 19% over the last five seasons.  As he’s aged, Rodriguez’s strikeout numbers have marginally increased, and seems to be following that trend.  He walked 10.9% of the time over his career, and 11.3% over the last five seasons.  A-Rod has become a more disciplined hitter with time, as pitchers have also been more cautious and pitch around him at the plate.

     Due to A-Rod’s K% and BB% in the minors seeming to be fairly stable compared to his past performance, I believe that we’ll see A-Rod maintain his current career trajectory.  His durability is not what it has been in the past, but he should return to the player he would’ve been in 2013, injury or not.  I don’t see a sudden huge drop-off, or surprising upturn in performance happening.

Career Trajectory

3
Click for a Full-size Image

     The following three plots show A-Rod’s Career trajectory in OPS (On-Base Percentage Plus Slugging Percentage), wOBA (Weighted On-Base Percentage),wRC+ (Weighted Runs Created, adjusted to the league where 100 is average), and WAR/162 (Wins Above Replacement prorated for 162 Games).  In all of the categories, higher numbers indicate a better performance.  I used 4th power exponential trend lines to approximate in all of these cases except for WAR, where I used a 6th power polynomial to account for the increased variance.

Click for a Full-size Image

     The reason for choosing a 4th degree polynomial is that I believe it truly reflects the path of A-Rod’s career.  He burst on to the scene during his first full year in 1996 with the Mariners, as he was named an All-Star, won the Silver Slugger Award, and finished 2nd in MVP voting.  His line that year was .358 / .414 / .631 and an OPS of 1.045.  Rodriguez experienced a “Sophomore Slump” if you can call it that where he hit a measly .300 / .350 / .496 and an OPS of .846, garnering his second All-Star Game appearance.  it would take A-Rod two more years to return to his 1996 performance, causing this first curve.  This curve started slowly climbing upward in 2001, his first year with the Rangers where Rodriguez admitted steroid use due to the pressure he felt to perform.  He reached his peak in 2007, an MVP season where he hit .314 /.422 /.645 with an OPS of 1.067 and 54 Home Runs, the most of his career.

Click for a Full-size Image

     This is where his current downward trend begins, as A-Rod began creeping into his mid-late 30s which bring us to where we are today.  I’ve indicated A-Rod’s drop-off since 2007 by the vertical black lines.  Notably, A-Rod’s agent Scott Boras announced during the Game 4 of the 2007 World Series, as the Red Sox were about to clinch a championship, that Rodriguez would be opting out of his contract.  The Yankees initially didn’t want to negotiate with A-Rod, but later signed him to a new deal, worth $275MM over 10 years.  Seeing A-Rod’s current decline, this was not a good move for the Yankees.  However, this was perfect for A-Rod, as he secured the deal coming off of an MVP caliber season when his value was the highest.  It’s just Boras working his magic again.

     Alex Rodriguez is on a downward decline, but as stated earlier, we should see a version of A-Rod resembling what he would be if he never missed time for injury.  This is a much needed boost for the Yankees, as their 3B for the year have accumulated a -0.9 WAR, which is 26th in the league.  With A-
Rod, who I projected to have a 2.1 WAR, the Yankees greatly improve at his position.  Assuming A-Rod plays 15 games before we know the results of his appeal, he’ll accumulate a 0.19 WAR, while the Yankees other 3B options would produce a -.08 WAR based on their performance this season.  This is a 0.27 WAR swing for the Yankees.  If you prorate this over a 162 game season, this would be a 2.92 WAR improvement which is on the Solid Starter/Good Player borderline.  For however long the Yankees have Alex Rodriguez in the lineup, he will be a huge improvement in their lineup.  It’s just a question of how well A-Rod can focus on playing during one of the most controversial and stressful times in his long career.


CAIN: Counting a Pitcher’s HR/FB Out-Performance

Dave Cameron recently posted an interesting article about Jhoulys Chacin. It’s all about how Jhoulys Chacin is defying the rules of HR/FB rates. His HR/FB rate this year is a mere 2.8%. Jhoulys Chacin has pitched 120 innings, had 106 fly balls, and allowed just three home runs. Very impressive. But it makes you wonder if there are other pitchers who are maintaining low rates while allowing more fly balls overall. Because while Jhoulys Chacin is obviously benefiting from his HR/FB ratio, it’s possible for a pitcher to have more fly balls while maintaining a slightly higher HR/FB and benefit more. So I invented CAIN, a counting stat to help measure that.

CAIN does not stand for anything. I’m just paying homage to a famous outlier.

CAIN = FB – (9.34 x HR)

To explain, the Fangraphs Glossary says that the league-average fly ball rate is “~9-10% depending on the year”. In fact, of the 91 qualified pitchers in Fangraphs database for 2013, the average HR/FB ratio is 10.7 percent. So there are 9.34 fly balls for every homer. So we can say that for most pitchers, if they had ten homers at this point in the season, they would have about 93.4 fly balls.  Ten homers and 93.4 fly balls would give you a CAIN of exactly 0. Make sense?

Now for what you came here for. Here are the top ten in CAIN this year:

Note that I’m not saying any players might actually be able to sustain their CAIN, I just think it’s an interesting little tidbit, and perhaps a nice follow on to Dave Cameron’s article.

Name Team IP HR FB CAIN HR/FB
Eric Stults Padres 133 8 163 88.3 4.90%
Jhoulys Chacin Rockies 120 3 106 78 2.80%
Bartolo Colon Athletics 135.2 9 161 76.9 5.60%
Travis Wood Cubs 128.1 10 159 65.6 6.30%
Adam Wainwright Cardinals 154.2 6 113 57 5.30%
Bud Norris Astros 119.2 10 150 56.6 6.70%
Lance Lynn Cardinals 122 7 121 55.6 5.80%
Matt Moore Rays 116.1 8 130 55.3 6.20%
Derek Holland Rangers 133.2 9 137 52.9 6.60%
Clayton Kershaw Dodgers 152.1 9 136 51.9 6.60%

And Jhoulys Chacin is not #1. It turns out that Eric Stults is in fact benefiting more from his HR/FB rate outlier this year. Of course, that’s partially happening in Petco. Petco is not Coors.

Name Team IP HR FB CAIN HR/FB
Joe Blanton Angels 116 24 133 -91.2 18.0%
Roberto Hernandez Rays 113.1 18 91 -77.1 19.8%
Jason Marquis Padres 117.2 18 99 -69.1 18.2%
CC Sabathia Yankees 142 23 150 -64.8 15.3%
Chris Tillman Orioles 119.2 21 135 -61.1 15.6%
Ryan Dempster Red Sox 115.2 20 130 -56.8 15.4%
R.A. Dickey Blue Jays 134.2 23 163 -51.8 14.1%
Jeremy Guthrie Royals 126.2 22 155 -50.5 14.2%
Hisashi Iwakuma Mariners 138.1 21 146 -50.1 14.4%
Lucas Harrell Astros 112 15 96 -44.1 15.6%

Poor Joe Blanton. His peripherals aren’t that bad this year. But he’s been posting some pretty high HR/FB rates for the last five years or so. I’ll leave it to someone else to puzzle that out.

After doing this analysis I wanted to know about exceptional seasons in the “UZR era” for pitchers’ CAINs. I am continuing to use 9.34 as the FB/HR value, not adjusted for year. If I was being very scientific I would probably break that constant out for league AND year, but I’m lazy and unpaid. Anyway, here, unsurprisingly, is Matt Cain:

Season Name Team IP HR FB HR/FB CAIN
2011 Matt Cain Giants 221.2 9 246 3.70% 161.94
2007 Chris Young Padres 173 10 243 4.10% 149.6
2002 Jarrod Washburn Angels 206 19 317 6.00% 139.54
2009 Zack Greinke Royals 229.1 11 242 4.50% 139.26
2002 Mark Redman Tigers 203 15 273 5.50% 132.9
2011 Jered Weaver Angels 235.2 20 319 6.30% 132.2
2010 Anibal Sanchez Marlins 195 10 222 4.50% 128.6
2010 Livan Hernandez Nationals 211.2 16 278 5.80% 128.56
2010 Jason Vargas Mariners 192.2 18 295 6.10% 126.88
2007 Matt Cain Giants 200 14 255 5.50% 124.24

So in summary, CAIN is a nice little tool if you are interested in seeing just how much a HR/FB rate is affecting a pitcher’s performance. If anyone can think of a better acronym, like one that actually is an acronym, please leave a comment.


The New Golden Age of Cuban Baseball in MLB

Editor’s note: This piece was removed at the request of the author.


Plate Discipline Correlations, 2008-2013

Plate Discipline Correlations, 2008-2013 

In fall 2008 FanGraphs was kind enough to release new plate-discipline metrics, including first-pitch strike percentage (F-Strike %), outside-the-zone swing rate (O-Swing %), and inside-the-zone swing rate (Z-Swing %).  At the time, Eric Seidman was even kinder when he investigated the correlation of these plate-discipline statistics with standard pitcher metrics like WHIP, FIP, BB/9, and K/9. Very thoughtful indeed.

Now we have another 4.5 years of plate discipline data, compiled by Pitch f/x rather than Baseball Info Solutions. It may be worthwhile to see how these numbers compare with Seidman’s, as well as add a measure of uncertainty to the correlations. It is possible for two factors to have a strong relationship, but because of small sample sizes or other forms of variability, the correlation value may not be as precise a measure as a high R-value may suggest.

Bootstrapping

Correlation coefficients, which fall between -1 and 1, allow us to measure the strength of linear dependence between two variables, such as O-Swing % and K %. We can use bootstrapping techniques to obtain 95% confidence intervals for these correlation coefficients. Calculating confidence intervals for correlations adds a measure of uncertainty to the process—narrow intervals indicate we can have greater confidence that the R-value we obtain represents the true correlation between the two metrics.

Bootstrapping is a statistical technique in which we resample our current sample, in this case 500 times. This repeated process allows us to assign measures of accuracy to sample estimates, such as medians, means, or correlation coefficients. For our purposes here, it is only important to note that we can be 95% confident that the true R-value lies between the intervals. If the interval includes 0, meaning absolutely no correlation, we can conclude that there is not enough evidence to indicate any relationship between the two variables.

First Strike %

These correspond well enough to the values obtained by Seidman, with one exception worth noting. While he used K/9 and BB/9 to correlate with F-Strike %, here we examine the correlation with strike and base on balls percentages. Our correlation coefficient is similar in magnitude at .24 versus .19, but its wide confidence interval approaches the null value and suggests the estimate is not very precise. This is worth noting, especially considering that BB % appears to have such a strong correlation with F-Strike % of -.79 with relatively narrow confidence intervals. Seidman observed a similar pattern—pitchers who get into an 0-1 count are more prone to not walking batters than striking them out.

First Strike %

       R-Value                    (95% CI)

K%

0.24

(.024, .455)

BB%

-0.72

(-.848, -.604)

WHIP

-0.52

(-.649, -.376)

FIP

-0.41

(-.576, -.237)

 

O-Swing %

O-Swing % is the percentage of pitches a pitcher pitched outside the zone but still generated a swinging strike. Think anyone facing Pablo Sandoval. Here we again see relatively moderate correlations with relatively tight confidence intervals ranging from 0.30 to 0.19. Pitchers who induce swings at pitches outside the zone may be especially tricky for hitters to do damage against. So far this season Adam Wainwright and Matt Harvey are both in the top three in O-Swing %, and top two in both WHIP and FIP.

O-Swing %

   R-Value        (95% CI)

K%

0.39

(.274, .548)

BB%

-0.44

(-.637, -.254)

WHIP

-0.50

(-.677, -.317)

FIP

-0.45

(-.650, -.283)

Z-Swing %

We can see from the results below that Z-Swing %, the rate of inducing swings at pitches in the zone, bears little relationship with any of these metrics. Seidman’s analysis showed that the correlations were negligible at best. The confidence intervals for all of these measure metrics include 0, meaning that we cannot be 95% confident that there is any relationship present. A quick glance at the leaderboards shows that Ian Kennedy and Miguel Gonzalez are near the top of the list this season, and these guys aren’t exactly shoving.

Z-Swing %

   R-Value        (95% CI)

K%

-0.17

(-.370, .035)

BB%

-0.17

(-.381, .048)

WHIP

-0.09

(-.276, .111)

FIP

0.10

(-0.09, .286)

All data courtesy of FanGraphs.

 Because I’m a believer in open data, you can find my R code here.