Using low-A Stats to Predict Future Performance

For a piece I wrote a couple of weeks ago, I used historical minor league stats to to construct a model that predicts how likely it is that a teenager in A-ball will make it to the major leagues. While this method produced some interesting results, it also had some flaws, most notably that it didn’t take scouting or defense into account. This basically meant that a great defensive player — or a raw, toolsy player — could easily get an undeserving low rating if he had a poor year at the plate. Another drawback was that it only applied to teenaged players in low-A, who represent a pretty small portion of players at the level, and just a sliver of the prospect population.

With these shortcomings in mind, I’ve taken another stab at predicting which players from the South Atlantic and Midwest leagues are most and least likely to make it to the show. Like last time, I ran a probit regression, which tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. But instead of limiting my analysis to players under the age of 20, I considered all players and included age as a variable in my model. I also attempted to quantify scouting by taking into account whether or not a player made Baseball America’s pre-season prospect rankings. The model still relies heavily on offensive performance, but isn’t entirely guilty of “scouting the stat line.”

It’s come to my attention that Chris St. John of Beyond the Boxscore is doing something very similar with his JAVIER projection system, and it will be interesting to see where his model and mine agree and disagree once I repeat this exercise for all minor leaguers. Chris named his system after Chicago Cubs prospect Javier Baez, so I’ll follow suit and also name mine after a prospect. Yankees’ prospect Gosuke Katoh was my original my inspiration for this idea, so I’ll call my methodology KATOH. Without further adu, here’s the resulting R output if you’re into that kind of stuff:

Low-A Output
All hitting stats were taken relative to league average and then scaled to 2014 low-A league averages.

A player’s age, prospect status, strikeout rate, ISO, and even BABIP all proved to be predictive in the direction you’d expect. But the show-stopper here is that a player’s walk rate isn’t at all predictive of whether or not he’ll make it to the majors. One possible explanation is that — unlike power or speed — plate discipline is a skill that can be learned, and many players in low-A are still developing their batting eye and learning to lay off pitches. As one example, Brian McCann walked less than 5% of the time as a 19-year-old in the Sally League, but still developed into a relatively patient big leaguer.

Another possibility is that you don’t have to be a particularly good hitter to run a high walk rate in low-A. Pitchers at that level often have little idea where the ball’s going, which enables hitters to take an ultra-passive approach in the hopes that they’ll see four balls before they see three strikes. That strategy might work in the low minors, but can lose it’s effectiveness in the upper-levels where pitchers have a better handle on their control. I’ve included an excerpt of what KATOH spits out for modern-day players in low-A who logged at least 250 plate appearances through July 7th. The full list of qualifying players can be seen here.

Player Name Organization Player’s Age MLB Probability
David Dahl COL 20 89%
Jake Bauers SDP 18 89%
J.P. Crawford PHI 19 87%
Dominic Smith NYM 19 79%
Willy Adames DET 18 78%
Chance Sisco BAL 19 74%
Reese McGuire PIT 19 73%
Andrew Velazquez ARI 19 70%
Manuel Margot BOS 19 69%
Ryan McMahon COL 19 68%
Franmil Reyes SDP 18 66%
Brett Phillips HOU 20 65%
Wendell Rijo BOS 18 64%
Carson Kelly STL 19 63%
Kean Wong TBR 19 63%
Trey Michalczewski CHW 19 62%
Clint Frazier CLE 19 62%
Clint Coulter MIL 20 62%
Evan Van Hoosier TEX 20 59%
Austin Dean MIA 20 59%
Drew Ward WSN 19 58%
Raimel Tapia COL 20 56%
Tanner Rahier CIN 20 55%
Correlle Prime COL 20 55%
Carlos Asuaje BOS 22 54%
Dustin Peterson SDP 19 54%
Jesmuel Valentin LAD 20 54%
Dawel Lugo TOR 19 54%
Avery Romero MIA 21 53%
Chad Wallach MIA 22 53%
Nomar Mazara TEX 19 52%

Over the next couple of weeks, I plan to repeat this exercise for all levels of minor league play. As I climb the minor league ladder, it will be interesting to see when — or even if — a hitter’s walk rate starts to be predictive of whether or not he’ll make it to the majors. Keep an eye out for the next iteration, which will look at high-A stats and slap probabilities on current high-A players.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.


Pitch(er)’s F/x

The MLB is not facing a crisis yet, but it may be soon. In an age of instant gratification and the desire to see the biggest, loudest, and longest of highlights, baseball is getting slower and lower scoring. Although picking up the pace would be a simple task for the Commissioner’s Office, picking up the scoring would be much, much more difficult. The reason for the decline in runs per game is not obvious at first glance. But, like all things in the MLB these days, the key lies in the data.

At the turn of the century, the Steroid Era was going on strong. Even when league-wide PED testing was implemented in 2003, runs per game increased from 2003 (4.73) to 2006 (4.86). Since then, runs have dropped significantly, hovering just above four. Rather than looking to possible reasons, such as PED use, the real proof lies in observation. The major change from 2006 to now is the use of PitchF/x data. In 2006, PitchF/x became a staple in every MLB ballpark. The applications for the system are endless, but the focus for scouting hitters is Hot Zones.

Nearly every hitter has a “hole” in their swing. Even Mike Trout struggles hitting balls up in the zone. Miguel Cabrera has (some) trouble with balls on the outer edge, although limited. Pitchers meanwhile dictate the zone. Although they may prefer to throw to one side of the plate or a certain elevation, elite pitchers have no problem working the ball to all parts of the zone and outside it. The game’s most dominant pitcher this year (not up for argument) has scattered pitches everywhere, especially to lefties. For Kershaw of course, the Heat Map does little justice to his ability to locate the ball. Most hitters have a similar hole, so he is more likely to throw it there than he is all over the heat map. It does show his ability to pitch the ball to a spot better than a hitter can make good contact on a pitch in a certain spot. Let’s take a peek at an example.

Paul Goldschmidt is a really, really good hitter of white balls with red laces. If you don’t believe me, ask Tim Lincecum. First, let’s take a look at Goldschmidt’s Heat Map over his career. Nothing too surprising, he likes his baseballs on the inner half of the zone. Once you get out of the zone on the inside though, he becomes not-so-amazing. Now if we take a peek at DJ Pauly G (I will never call him this to his face because I like my current face structure) vs Kershaw, you can see that Kershaw has been pretty good at targeting his cooler zones. The result of this has been a batting average of just over the Mendoza Line. When you look at him against Lincecum, you see something a lot different. This is probably why Lincecum typically has a sore neck the day after he faces the Diamondbacks. While Kershaw has been able to get it out of the zone low and in, Lincecum has tended to leave them over the plate, resulting in the ball coming to rest in the stands.

At first glance, it may be a pretty simple difference that one pitcher is hitting his spots and one is not. At second glance, it might still look the same. If you really squint though, you can see that conventional wisdom would say very rarely throw it inside to Goldschmidt. Goldy would have been pitched around 10 years ago, and almost all the balls would have been dotting the lefty batter’s box. Prior to the installation of PitchF/x, pitchers would likely have been scared to throw it inside to the slugger. Advanced data available via Heat Maps can show something different, which Kershaw has capitalized on.

From a hitter’s perspective, you probably have a decent idea of what you can and cannot do at the plate. Prior to Pitch F/x, hitters kind of knew what to expect. There was once a hitter that pitchers really didn’t know what to do when they faced, so they walked him. His name was Barry, and a large part of why he couldn’t be pitched to was because pitchers had no idea what to do when he came to the plate. In a 2001 USA Today article, it got to the point where the question was asked “How do you pitch to Bonds?” Bonds had no holes, or so it was thought. I would venture to guess that Bonds, and other greats, would have hit far fewer home runs in an age where pitchers knew the specific places hitters could and could not put the ball over the wall.

Now, hitters are faced with more of a dilemma due to the hyper-advanced scouting. Back when it was a simple “he likes to chase sliders outside the zone late in the count”, hitters had some expectations of what they would likely face. Now, their approach has changed to, “I better look for the low and away slider, but he might try to get me with the high heat since I have a high whiff rate there. Or maybe he’ll go for the change since I have trouble when I am behind in the count and I have fouled off two pitches after seeing one or more sinkers on the outer half of the zone during night games played on the West Coast.” The moral is, pitchers have so much data they can know a hitter better than he can know himself. A hitter’s guess on what he may face is much less educated than it was prior to PitchF/x, making it a lot harder to put the barrel on the ball.

Although there are surely outside causes, PitchF/x is a large part of the reason that runs are on the decline. Pitchers have control on where the ball will end up 60’6” later, and if they are able to put it in a place where the hitter is poor, there will be fewer runs. The new data available has helped pitchers much more than hitters thus far, and until something changes in hitters’ approaches or new data comes along favoring batters, we can expect more of the same. Unfortunately for fans like myself who loved watching Barry knock them into the bay in high scoring affairs, it looks like the Steroid Era’s high scoring affairs are long gone. Low scoring baseball is here to stay.


In an Imperfect World, Chase Utley is a Hall-of-Famer

“Criminally underrated” is now an overused phrase, meaning exactly what I want it to mean in regards to Chase Utley.

Overshadowed by inferiors, Utley has flown under the mainstream for the most part because of the common fans obsession with statistics that, while not useless, are very much flawed.

“Inferior” does not mean bad.  Ryan Howard was a good baseball player for a number of years.  Ditto for Jimmy Rollins.  The two players range somewhere in the above-average range, to just plain good.

But neither player can touch Utley in either peak seasons, or cumulative value.

But this isn’t written to compare Utley to non-Hall of Famers.  And it’s not written to compare him to Hall of Famers that are probably not deserving of the honor, either.

Utley stands up well to the actual Hall of Famers.  The players who already have their plaques enshrined in Cooperstown.  And the guys that aren’t there yet, but should be eventually (not voted in yet/not eligible).  He is one of the all-time greats and he still has some mediocre to good baseball left, especially since he is currently on pace to exceed five wins again this year, if one were to assume good health.  Which with Utley though, is not necessarily a safe assumption.

He knocked out five 7-7.9 win seasons in five consecutive seasons from 2005-2009.  It’s not like my normal loose threshold of Hall of Fame caliber seasons that I set at 6 wins.  Utley eclipsed the *6* by at least a win, in every one of those five seasons.

I get that 58 wins is generally perceived to be a borderline Hall of Famer.  And Utley has not reached the counting stats that so many of the current Hall of Fame voters have grown — and adopted permanently, apparently — a love for.  So if an observer of baseball does not consider advanced statistics and/or sabermetrics then the case for Utley seems less apparent.

But with that said, the right to vote should at least be exercised by observers of the game who realize that playing a certain position, and playing it well, matter greatly.  It’s not necessarily the case, but it should be.  You don’t have to be infatuated with WAR and WARP to know that a guy who can handle second base defensively has more value than a guy that can only handle first base.

Utley could obviously handle 2B.  But he wasn’t just an adequate “handler” of the position as much as one of the better handlers of the position of all time.  Perennially a good defender, perennially a 2B, perennially one of the best-hitting 2B ever…and what he have is a guy that might just end up getting lost in an extremely crowded ballot.

58 wins may not be enough.  But if he ages with any kind of grace, I don’t see how 65 is out of the realm of possibility.

The one thing Utley has going for him is that sabermetrics is growing.  And there will still be hard-headed voters when Utley’s case ultimately rolls around.  But there should be less stubborn, “set-in-their-ways” voters, than we currently have to deal with.  And most likely, there will be guys that just don’t view Utley as a Hall of Famer with any kind of non superhero like finish to his career.

That’s their right.

But Chase Utley was — at his best — better than Whitaker.  He was better than Biggio.  And he was better than Alomar.

If he retired after this season, he’d get my vote.  But since it is likely he stays healthy enough to produce at a decent-enough level for a few more seasons, he may get a lot of other people’s votes as well.

In reflection, Chase Utley will look better when the ballot rolls around, to the voters, than he does to them now.  Even his peak years will.


Historic Lack of Positional Development All-Star Team

As a Royals fan, I have subject to a horrific progression of shortstops during my lifetime that seems to have finally come to an end with Alcides Escobar.  That’s good because I am not sure I could have taken any more Neifi Perez, Angel Berroa, Tony Pena Jr., or Yuniesky Betancourt seasons.  Only Freddie Patek accumulated more than 10 WAR in his Royals career for SS with more than 1000 PAs, and he of course retired the year before I was born.  All the shortstops who meet that criteria for the Royals added together have 59.9 WAR from 1969 through 2013, so for the Royals existence they are averaging about 1.3 WAR per season at short.  Is that the worst organization at SS ever?  Let’s find out.

I went position by position to find which organization is the most inept historically at each.  Only players who had 1000+ PAs for the team though they didn’t need to play exclusively at that position and I am not including anything in 2014.

Catcher –

The Rays put up an impressively bad 0.55 WAR/year at catcher, but in the end I am going to give the Astros the nod for the first position of my All-Star team.  Over a 52-year span, their organization’s best catcher was Alan Ashby who only managed 9.7 WAR for the team.  Not even one player in double digits of WAR in half a century is pretty impressive.  All told this group managed 48 WAR for a paltry 0.9 per season, or the value of Humberto Quintero last year as a back-up.  It is hard to keep up that bad of a pace for so long.

1B –

First base is traditionally manned by a large person who can mash.  That has not been the case for the Nationals/Expos.  The Diamondbacks gave them a run here, but Paul Goldschmidt kept them from taking the position.  The Nats/Expos best first baseman by accumulated WAR has been Ron Fairly at 17.5 total.  If your best 1B option in 45 years slugged .440 for you, basically Jorge Cantu or Brad Wilkerson, then you are doing something wrong.  They did have better players, but at the wrong time.  They had young Andres Galarraga and old Tony Perez who did most of their stat accumulation elsewhere, and of course the rented Adam Dunn for a couple years.  Still they have only managed 80.1 WAR for a traditionally big bopping position, and that is about 1.8 per season.

2B –

There were some solid contenders at second, but in the end the Rockies despite being relatively new were bad enough to get the spot thanks to Jim Gantner and Rickie Weeks being just good enough to save the Brewers.  The Rockies have been around for 21 years now, and in that time their best player by WAR at second base is Eric Young at 9.5 total.  Even that is cheating since he only played at 2B about half of the time, but my arbitrary parameters for the team allow all  to be counted.  In second for them is Clint Barmes at 3.9, so it is quite a steep drop-off from not so lofty heights.  Their second basemen have only managed 14.7 WAR total in over 2 decades for a rate of 0.7 per year.  Babe Ruth once put up more WAR than that in one season.

SS –

I was truly expecting the Royals to run away with this, but Patek was enough to keep them out of short, though they still managed to make the team.  In the end, the Padres were just too weak to ignore.  In 45 years the best they have been able to manage from a player at short is the 8.7 WAR that Khalil Greene managed to amass.  Their BEST player at the position had a career slash line of .245/.302/.422, which is not so good.  They had the first four years of Ozzie Smith’s career, so at one point they had a future Hall of Famer at the position, but they even managed to screw that up by trading him for which they received Gary Templeton (second to Greene at 8.4 WAR), Sixto Lezcano, and Luis DeLeon.  Ouch, how is this trade not discussed more for its awfulness?  Total Padre SS WAR of 42.5 gives them a 0.9 WAR/season.

3B –

This was the only position where I selected a team with more than 2 WAR per year, though how they got there makes it less impressive.  I went back and forth on this, but the Tigers ended up getting it due to the more than 100 years of marginal to terrible play at third.  That is a long time to fail to produce any good players.  Their WAR leader at the position is Miguel Cabrera, of course, at 35.1, so to get a good player at third they had to trade for a stud and then play him out of position for two years to get to the plate appearance level I set.  Before Miggy, the best they had managed at third was Travis Fryman followed closely by George Kell who put up 24.6 and 23.4 WAR respectively for Detroit.  Those aren’t terrible players, but again they had over 100 years and that is the best they could do.  With Cabrera they ended up at 231.5 WAR in 113 seasons for just over 2 WAR per year, but before he moved to 3B in 2012 they were at 1.8 per year.

LF –

Another corner position where you expect some power production…unless you are a Mets fan.  The Mariners do get a nod for having a slightly lower WAR per season figure, but the Mets extra decade and a half gave them the edge.  Cleon Jones topped the Mets LF list at 18.1 WAR, which is not awful.  He played 12 seasons, 8 of them as a full time player, and hit only 93 HRs.  At left field that is pretty mediocre production from your best ever.  Kevin McReynolds is their only player at the position to break the 100 homer mark.  Their total was 96.9 WAR in 52 years for a rate a little shy of 1.9 per year.

CF –

The Marlins looked like a slam dunk at first here with their top guy being Juan Pierre, seriously that should get you spot on the team shouldn’t it?  Then the Rangers came along and stole the spot out from under them.  Josh Hamilton got 1400+ PAs to keep this from being a complete disaster of a position for Texas.  His 21.8 WAR while he was with the team is almost double their second place center fielder, 11.1 for Don Lock.  Prior to Hamilton the Rangers had managed one double digit WAR player in center over a 53 year span.  With Hamilton their total and rate are 79.5 and 1.5 per year, but prior to Hamilton (pre-2008) it was 57.7 and 1.2 per season.

RF –

After shortstop was done I thought my team was in the clear.  Then we got to right field.  The Royals have had some decent right fielders like Jermain Dye and Al Cowens, but they have also had Jeff Francoeur and Jose Guillen.  Danny Tartabull is tops with 13.9 WAR, and is the only one in double digits and he was only a Royal for five seasons and fought injuries a lot in the final three years only playing in 133, 88, and 132 games those years.  Right fielders for the Royals have accumulated 59.9 WAR over 45 seasons for a rate 1.3 per year, and I am now convinced that Wil Myers was traded to avoid losing this distinction.

SP –

For pitching I looked at each teams top 5 starters by WAR all time.  There were only 3 contenders for sum of those 5 divided by years for the organization, and they were the Brewers, Padres, and Rangers.  The Rangers already have center field and the Padres shortstop so I thought about giving it to the Brewers for no repeats.  Instead I am going to make the rotation all three teams because I can.  Here are their respective rotations.

Brewers Padres Rangers
Ben Sheets (29.6 WAR) Jake Peavy (24.6) Kenny Rogers (26.1)
Teddy Higuera (28) Randy Jones (21.4) Charlie Hough (23.7)
Moose Haas (20.6) Andy Benes (21) Kevin Brown (22.3)
Chris Bosio (19.9) Andy Ashby (15.2) Fergie Jenkins (22.1)
Yovani Gallardo (16.9) Bruce Hurst (14.8) Nolan Ryan (21.6)

Texas gets the first spot in the rotation.  They are tied with the Padres for the worst 2.2 WAR/season of existence for their top 5, but what sets them apart is that three of their 5 are players from other organization so only Kenny Rogers and  Kevin Brown were developed by them.  The Padres top three were all drafted and developed in house, so they get to go second.  The Brewers rate was a bit better at 2.6 WAR per year and go third.  Only the Rangers have a real chance of escaping their current position if Yu Darvish can continue being awesome, he already has 13.4 WAR and is only 27, so he could pass Ryan in a couple more years.  The top of the other two rotations right now are Yovani Gallardo, currently 5th for the Brewers all time, but he is trending the wrong way and is only controlled through next year.  Ian Kennedy is at the top for San Diego right now, which explains a lot about their season.

There you have it, historic ineptitude by position.  I am going to go ahead and leave the relief pitchers alone.  That will be my fan vote I guess, so go figure out your favorite and comment below.


On Sabermetric Rhetoric

Dear FanGraphs community,

This isn’t a post about baseball, per se, but rather about the way we talk about it. Lately, I’ve been thinking a lot about how to improve the quality of dialogue surrounding sabermetrics. Please excuse my rambling, as I tend to get rather emotional and philosophical when discussing this particular topic.

When reading posts and especially comments, I sometimes get the sense that we think we are right merely due to the fact that statistics are objective. In a sense, this is true. As long as the methodology is clearly laid out, stats really are just numbers. But people are biased. All language is persuasive in some sense, and the inherent neutrality of numbers is often hijacked by various human agendas. Sabermetrics are not exempt from this phenomenon.

Most modern discourse surrounding baseball analysis pits “old-school” vs. “new-school” in a largely arbitrary ideological cage fight. These sorts of polemical constructs make for good television, but slow progress. Its easy to get caught up in the excitement of a debate while completely missing out on what really matters. Baseball is a beautiful game and it brings people together. It’s America’s pastime for a reason! It transcends cultural differences, generation gaps, and even language itself.

Statistics help us to understand and evaluate how well this great game is being played. They act as a mental “handle” by which we can intellectually grasp the importance of each individual event and performance. Everyone, regardless of their stance on sabermetrics, wants statistics that are both intuitive and accurate. So let’s set aside our agendas for a minute and think about how to proactively bridge the gap between these two sides that have so much to offer!

For starters, we should minimize our implementation of hostile methodologies. Getting on a soapbox and proclaiming the evils of traditionalism simply doesn’t do anybody any good. It feeds our pride, as well as the opposition’s presumption that we care more about our statistics than we do about, you know, actual baseball. Over the last few years, I’ve begun to think of myself more as a teacher of sabermetrics than a defender of them. This approach has two important ramifications.

First, it dictates that we get along with those who disagree with us. In my experience, people are only open to new information in the context of a trusting relationship. As fellow baseball fanatics, we have an easy point of contact with traditionalists: we both like baseball. Duh! Focus on that first rather than stuffing a lecture on DIPS theory down their throats.

Second, a teaching disposition encourages us to refine and adapt our communication of sabermetric concepts. Next time you want to call someone a nincompoop on a message board, first ask yourself, “What could I have done to explain this idea more clearly.” Chances are, the person isn’t stupid, just unenlightened and/or overly argumentative. Over my next few posts, I’ll get into the nitty-gritty of how we might make this happen.

Contrary to popular belief, numbers aren’t evil. Baseball statistics in particular have come a long way toward being less deceptive. Let’s represent them well, shall we?

Sincerely yours,

KK-Swizzle


Jedd Gyorko’s Struggles

A couple of months ago, I wrote a community post on FanGraphs stating that I felt as though Jedd Gyorko was a special player. I summed up the fact that Jedd Gyorko goes against the normal second baseman positional identity. Rather than being the slappy hitting second baseman,  Gyorko was a second baseman with some serious power. A second baseman with power is not something you see everyday. You can really only point to guys like Robinson Cano and Ian Kinsler in today’s game, that have played second base, and have had success because of their power.

Gyorko’s success last season was mainly driven by his power. Gyorko hit 23 homers to go along with a line of .249/.301/.444.  Gyorko’s contact rate was below league average in 2013 with a mark of 73%, and when you pair that with a walk rate of only 6.4%, you end up getting a player who makes most of his value from driving the ball a long ways.

This season has been a bit of a different story. Gyorko has been one of the worst hitters in the league. In just 56 games this season — before going down with a foot injury — Gyorko has hit an abysmal line of .162/.213/.270. Gyorko’s lack of production could be attributed to a below average BABIP of .192. Gyorko has been unlucky, but it’s also likely that he’s also just not been very good.

In 2013, Gyorko hit a slightly higher FB% than league average (39%), and that has remained the same for 2014. The difference this year has been that Gyorko has been hitting more groundballs, more IFFB’s, and less line drives. Whenever you’re hitting less line drives, you’re probably not getting as many hits.

Year O-Swing% Z-Swing % Swing % O-Contact% Z-Contact % Contact % Zone %
2013 33.6% 70.8% 50.1% 60.0% 82.1% 73.8% 44.4%
2014 30.0% 66.3% 47.5% 54.4% 84.8% 74.8% 48.1%

If you look at Gyorko’s plate discipline, the story hasn’t actually been that much different from 2013. For the most part there’s only been a +/- 6% margin between his plate discipline stats from 2013 to 2014. The contact rate has been steady. Gyorko is swinging at less pitches outside of the zone, however of those pitches outside of the zone he’s making less contact than he did in 2013. For the most part it looks as though Gyorko’s plate approach has remained relatively consistent.

Jedd Gyorko » Heatmaps » RAA/100P | FanGraphs Baseball.

In 2013, Gyorko’s heatmaps indicated that he had success mainly on pitches low and inside. However, he hit pretty well on pitches inside most of the strike zone excluding pitches up and in or low and outside.

Jedd Gyorko » Heatmaps » RAA/100P | FanGraphs Baseball.

In 2014 nearly all of locations in the strike zone Gyorko has struggled with hitting. Gyorko has only had success with pitches that are  low and inside, and even that location has a pretty small area. For the most part Gyorko has not been able to punish anything inside the zone.

Overall pitchers have been able to get away with throwing Gyorko strikes. However, the thing that is also mysterious about Gyorko is that the power has been gone. Even if Gyorko hasn’t been making a whole lot of contact, you would at least think that when he did make contact it would be going a long ways. Thanks to Baseball Savant’s Pitch F/x tool, I was able to take a look at the velocities of pitches which Gyorko was hitting home runs. None of Gyorko’s home runs came off of pitches that were slower than 90 MPH.

Ironically,  despite all of Gyorko’s home runs having come off of high velocity pitches, he has struggled against fastballs this season. In 2013 Gyorko had a 3.6 wRAA against fastballs. In 2014, Gyorko has had a -8.3 wRAA against fastballs: nearly a 12 run difference.  The struggle against fastballs is something that is new for Gyorko, but what has remained steady for Gyorko between 2013 and 2014 has been the struggle against breaking balls. Gyorko has posted negative wRAA against every single type of off-speed pitch. When you can’t hit anything very well, and have never been able to hit off-speed pitches well, it makes the pitchers job very easy.

This dilemma is not something I know how to fix. It may be something mechanical or it may be something mental. Right now, Jedd Gyorko is on the disabled list taking care of a foot injury. Hopefully he can take advantage of his rehabilitation and make some adjustments to his swing. In my posts a couple of months ago I mentioned Jedd Gyorko in the same sentence as Dan Uggla. This season Gyorko might be showing that he may never reach Uggla’s ceiling. He’s played like Uggla’s floor. However the good news is that there is a whole second half of baseball, and Gyorko still young. There’s still the chance that Gyorko can fix whatever it is that is making perform terribly, and be the second baseman that breaks positional identities.


Roster Doctor: Baltimore Orioles

With the simultaneous (if temporary) collapses of the Yankee and Red Sox dynasties, the Baltimore Orioles hit the All-Star break with a very real chance of emerging atop the smoking wreckage of the AL East.  If they miss the playoffs it will be at least in part for one reason the Washington Nationals did so last year: too many bad plate appearances from second base. Jonathan Schoop, the O’s primary second baseman, is slashing  a putrid .219/.257/.322, good for the 16th best WAR among AL second basemen. While dumpster-diving Dan Duquette has found serviceable patches for catcher (Nick Hundley) and left field (the incredibly powerful alien inhabiting Steve Pearce), a solution at second base continues to elude him. Schoop’s head is barely above replacement level water thanks to his stellar defense, but his bat is missing more balls than Julio Cesar.

For now the organization publicly and vigorously defends Schoop, who may yet turn out to be a high-quality two-way player. Ryan Flaherty seems to have taken up residence in Buck Showalter’s split-level dog house, having started just 12 games in June and July. His unimpressive .647 OPS still beats Schoop’s by 50 points. The farm offers little immediate hope; the only O’s middle infield prospect beside Schoop in the team’s Baseball America top 30, Adrian Marin, appears overmatched for now in high-A.

Should the Duke decide look outside the current roster, here’s a review of cellar-dwelling second basemen who may be on the block (contract status from Baseball Reference).

Chase Utley (.297/.354/.452   3.2 WAR) Signed thru 2015, 2 yrs/$25M (14-15) & 16-18 vesting option

Enjoying a Chipper Jonesian late-career resurgence, Utley remains the phace of the phading Phils. He also has a brutal contract and a full no-trade, so he might be cost-prohibitive even if Ruben Amaro was willing to trade him. (Utley has said he won’t waive is no-trade, but most players say that – Baltimore would be about the only place he could be traded and still spend homestands mostly at home.) If Amaro did trade Utley he would need to sleep in kevlar pajamas, so this move seems unlikely.

Darwin Barney (.224/.261./316 0.2 WAR) 1st-Year Arb Eligible, 1 yr/$2.3M (14)

Here’s something about Darwin Barney you might not have known: he doesn’t just do crosswords, he creates them. Here’s something about Darwin Barney you almost certainly know: he just can’t hit. At all. With essentially the same skill set as Schoop is showing this year, he’s not an option for the O’s. Another Cubs middle infielder, Arismendy Alcantara, would probably make Duke salivate, but AA would cost the Orioles at least two of their top three pitching prospects. With Kevin Gausman now firmly entrenched in the rotation (thanks to Ubaldo Jimenez’ heaven-sent trip to the List) he is almost certainly off the block. Dylan Bundy and Hunter Harvey together may be too high a price to pay for a still-raw position player, and one of them alone probably won’t be enough for Theo to pull the trigger.

Aaron Hill (.238/.273/.351 -0.9 WAR) Signed thru 2016, 5 yrs/$46M (12-16)

Aaron Hill’s principal remaining function in baseball is to serve as a warning to others. Disappearing bat speed, immobility in the field, and an albatross contract mean there’s really nothing to see here. Perhaps the O’s think they can fix Hill’s bat, but his 4:1 K/BB ratio suggests otherwise.

DJ LeMahieu (.279/.337/.346 1.1 WAR) Pre-Arb Eligible, 1 yr/$501k (14)

No one has unlocked the secret to winning at Coors yet, but loading up on heavy-groundball starters and assembling a stellar infield defense might be one of the few approaches that Dan O’Dowd hasn’t tried yet. LeMahieu would be a key component of any such strategy. LeMahieu is only 25 and still plays for the MLB equivalent of free; it would almost certainly take a significant package for the O’s to pry him away from the Rox. One problem the Orioles face is that their top-heavy system makes it hard to go after a guy like LeMahieu. He’s not worth any of the top 3 pitchers, and the O’s have little else that would entice a team to part with a solid but unspectacular player. (Christian Walker is raking in AA; maybe he could be part of the answer.) The Rox also have Josh Rutledge, who plays all the infield positions badly but can hit a little. He could form an offense/defense platoon with Schoop, and might be available at a reasonable cost.

Ben Zobrist (.268/.353/.406 2.7 WAR) 5 yrs/$23M (10-14) & 15 team option

In theory, Zobrist is the perfect answer for the Orioles — a short-term rental who could spur their pennant run while Schoop sorts things out at AAA. In practice, of course, he’s in the Orioles’ division. While the Rays have said they are even willing to trade David Price within the division, they have also said they will exact an intra-division premium. The same is presumably true for Zobrist. If he’s traded to a team with orange on their uniforms, it will probably be the Giants.

Brian Dozier (.237/.340/.414 2.7 WAR) Pre-Arb Eligible, 1 yr/$540k (14)

Dozier went from afterthought to asset by jumping his walk rate up this year (12.6% as compared to his career rate of 8.6%). Eddie Rosario’s plan to be the Twins’ starting 2B in 2015 went up in smoke earlier this year, and he has struggled in AA this year after returning from his suspension. (According to one of the better baseball headlines this year, Terry Ryan has offered “high praise” for Rosario since his return.) So Dozier is both more valuable and less expendable now than he seemed in spring training. The Twins minor league system is one of the best in the majors, so it’s hard to see a match here except in the unlikely event the O’s would be willing to part with one of the Big Three for Dozier.

It seems unlikely that any second baseman on the Texas Rangers would be a good trade fit. Rougned Odor, though struggling now, is presumably untouchable. Luis Sardinas has a bright future, but right now it’s unlikely he would be much of an upgrade over Flaherty, who the O’s can start without giving up any talent.

This list is obviously not exhaustive, but it suggests that Duquette’s options outside the organization may be little more appealing than the internal ones. In his tenure as Orioles GM, Duquette has shown a surprising ability to pull rabbits out of his baseball cap. How he solves the O’s second base conundrum will be one of the small but fascinating dramas to follow as this year’s trade deadline draws near.


The Cubs are Bettin’ on Bats

The Cubs are a team that is best described in the future tense. That is not to say that they are completely unwatchable at the major league level; they have a budding star 1st baseman in Anthony Rizzo and an enigmatically talented shortstop in Starlin Castro. But it is the players that have not yet reached The Show that intrigue baseball fans. Since trading Jeff Samardzija and Jason Hammel for wunderkind SS prospect Addison Russell and others, the mystique and potential of the Cubs system has increased dramatically. They have an amazingly talented and deep farm that according to prospect wizard Keith Law has the number 5,8,9 top prospects along with many more in the top 100. Almost all of those players having something in common-their jobs are to crush baseballs and eat planets.

Besides C.J. Edwards, (acquired in the Matt Garza heist) the future of the Cubs being a great team will be based on if those prospects hit. This is why many thought that Theo Epstein and Jed Hoyer would target a club with pitching prospects to send back in a trade. It seems however that such a deal never materialized so the front office did the smart thing and traded their two talented pitchers for the best over all assets which ended up being Addison Russell and co. In the process they created an interesting case study on rebuilding teams farm system composition. For the piece I’ll look at the Cubs with their hitter heavy system, the Astros with their more balanced system system and the Oriole’s pitcher heavy system.

What is perhaps the most important caveat to remember though is that GMs don’t get their way every time; assembling a farm system does not happen in a vacuum. The Cubs, Astros, and Orioles composed their farm systems with the parts that were available to them and who knows how each decision maker would build his ideal farm system. Each of the three franchises however do have amazing talent in the minor league systems and if everything breaks right those clubs will be well equipped to compete for the foreseeable future.

The way the Cubbies have constructed their farm could be described as putting all of their eggs in one basket, after all it’s great if you can average 5 runs a game but if you can’t get anyone out its a moot point. But the kind of eggs the Cubs are investing in are much less fragile than the pitching prospect variety. We live in a baseball age where fans fear the words “elbow soreness” and worry about their favorite pitcher throwing too many breaking balls. That is not to say that hitting prospects don’t get injured, just look at Miguel Sano and Carlos Correa, but as a whole hitters seem less likely to spontaneously explode. The Cubs front office knows that can’t-miss prospects do indeed miss all the time, but by having such a large amount of hitting talent they can hope a few of them at least will reach All-Star levels.

The Astros farm system is also very deep and talented like the Cubs but their top players are a mix of pitchers and hitters. Including the recently graduated Springer, Singleton and Santana (who promptly spilled his cup of major league coffee on himself) they still have Correa in the minors along with Aiken, Appel, and Foltynewicz to make a pretty enticing next generation of Astros. This is a more even approach than the Cubs that allows for the inevitable disappointment of a couple of those big names by having depth in both batters and hurlers. Unfortunately, Aiken apparently has a elbow ligament injury and has not even taken the mound yet. This along with the Correa injury takes out the headliners of both their pitching and hitting departments.

To be fair, those two players just happened to get hurt around the same time of this piece so in a sense I am cherry picking a bit. But it goes to show just how much has to go right for prospects to make an impact in the majors and by diversifying your assets you can sometimes spread yourself a little thin. Nothing is worse than watching a player get hurt but thankfully modern medicine has come along way and odds are that both of those prospects will be again be healthy and productive. However, nothing is a sure bet and injuries that require surgery are serious by definition.

The Oriole’s minor league system is not in the same class as the Cubs or Astros, but it does have three pitchers that are considered to be top of the line starters, if not outright aces. Dylan Bundy, Kevin Gausman, and Hunter Harvey are the pitchers Baltimore is hoping to have anchor its staff by 2016. Those guys each have filthy stuff and in a hitter friendly environment like Camden Yards, having dominant pitching is especially valuable. While the Oriole’s hitting prospects are nothing to write home about not many other systems (if any) can boast the top of the line pitching the Orioles have on hand.

But like any top heavy system there is the concern of injury wiping out the crème de la crème and being left with next to nothing. Already Bundy has gone under the steady hand of Dr. James Andrews (and has looked great so far, especially considering it hasn’t been a full year since he underwent surgery) and Harvey is still in Low A ball with plenty of time between now and the majors. Gausman on the other hand has already pitched for the Orioles and at times has been excellent which makes the teams handling of him curious to say the least. While having all three of those guys become aces seems unlikely, even if only two of them reach their potential that would still give Baltimore a pair of feared fire breathing hurlers to hold court in the AL East. On the other hand I’m sure most still remember Generation K back in 1995 with the promise they showed and while that is an oversimplified comparison it is a reminder of how pitching prospects can break your heart.

Another factor that I believe demonstrates building a farm system with mostly hitters is the way to go is based on the players who are likely to test free-agency in the next couple of years. Rarely do elite position players enter free agency and if they do, they do so with their best years likely behind them and cost the GDP of countries to sign. That is not to say that elite pitchers are flooding the free agent market, but the talent of pitching that will be in the free agent market is indubitably better than the hitting. For your entertainment, here are a couple of the best hitting free agents-to-be in the 2015 class and their 2014 WAR so far in parentheses (I have not included players that have any sort of option for 2015)- Victor Martinez (2.5), Adam LaRoche (1.1), Chase Headley (1.1), Hanley Ramirez (2.4), Russell Martin (2.1), Melky Cabrera (1.8).

If your eyeballs still work after reading that remind yourself that all those guys are going to be at least 30 years old when the 2015 season starts and many have injury histories. Sure V-Mart is a great hitter but he is 35 and almost strictly a DH at this point. Ramirez can be a real difference-maker when healthy, but he unfortunately hasn’t been able to stay on the field the last two years. The free agent pitching class is headlined by Max Scherzer, James Shields, Jon Lester and the immortal Edinson Volquez. While Scherzer, Shields, and Lester all have their warts, they have the potential to anchor a staff for at least a few more years. And in 2016 there are some incredibly attractive starting pitchers who could test the market.

So while having the arms the Orioles can trot out or the excellent combination of hitting and pitching the Astros have on the farm is an enviable position for a GM, having a surplus of athletic hitting prospects who can play multiple positions like the Cubs have seems to be the safest approach to building a major league roster. For a club like the Cubbies that has suffered for years you can’t help but hope this incoming tsunami of talent will be the core of their next great team. And in the process perhaps the idea of hoarding hitting prospects in a time when scoring runs is at a premium will be copied by other franchises looking to rebuild. Until then the Cubs doubling down on bats will be a fascinating storyline.


Breaking Down the Aging Curve Some More

Now that I have gone through the individual cohorts in parts 1, 2, 3, and 4 (click them if you need some background in what I am doing).  To start I will show you three charts with some simple, and I don’t think overly shocking, things to remember.  Then I will get into some regressions that will hopefully help explain what I think is going on.  Keep in mind throughout this that the groups that should be trusted most are the larger cohorts, 22 to 26 year old first full seasons, as the others might have some sample size issues and you will see in these charts that 19 and 20 year cohorts don’t behave well in almost all cases.

First up is this:

 photo 1stYearofMaxByCohort_zps2f9ded4d.jpg

 

If you look at the average percent of max for each cohort in their first season, it shows an upward sloping line for both hitting skill and overall value.  The younger cohorts are therefore farther from their peak production when they show up in the league and should be expected to grow if they stick around.  You see a lot higher percentages for wRC+ versus WAR mostly from a scaling and volatility difference.  Going from 1 WAR to 2 WAR is a 100% improvement and not terribly hard to do.  Going from 80 wRC+ to 160 wRC+ is much, much harder, and 1 standard deviation for wRC+ is about 25% of the average while it is almost 100% of average for WAR so wRC+ is significantly less volatile relatively.

Those characteristics mean that randomness around your true talent level means that 50% of max WAR on average means that the cohort might already be at peak true talent level from 24/25 years old and due to volatility it is hard to get very close to 100%, but the hitting gets much closer.  Anyway, players coming up later are much closer to their peak on average and just don’t have much room to grow.  Next let’s look at the two stats, starting with wRC+, at overall level rather than percent of max production:

 photo 1stwRCVSmax_zpsb9114e86.jpg

 

In the first full season each cohort performs at a very similar level, and the older cohorts might actually slightly outperform the younger.  That is a pretty flat line for first year average.  If you take each players best season though, the younger cohorts destroy the older cohorts.  Every cohort before age 25 has an average best of 120 wRC+ or better, so most of the players in those cohorts are going to put up at least one season in the Chase Utley of the last 2 years range, which is pretty good.  After that the difference between the average of the first full season and the peak shrinks down to 10 to 20 wRC+, well within one standard deviation, so the peak looks more like a season where luck pushed a player above average rather than a change in expected performance level.  That’s why we saw players in the cohorts after 24 seem to be at peak and only decline after entering the league.  WAR behaves similarly:

 photo 1stWARvsMaxWAR_zpsd7bc79b6.jpg

 

Again, 19 and 20 year olds are few and far between, but seriously and average best season of 5 to 6 WAR is pretty staggering as last year only 12 position players made it to 6 WAR or better.  On average the cohorts mostly show up around 1.5 WAR in their first season, and again the older cohorts probably are a little better in their first year.  The best season averages are again much better with a downward slope on the best season averages that starts to flatten out in the mid to late 20s, and I think it is easier to see on this chart than the first.  On average players enter the league at about the same level hitting and as overall producers, but those who can manage that at a younger age (before 25) generally go on to higher performance levels than the players who debut older.

Next I am going to show three regression outputs to try and explain what I think is important to remember for aging of players.  I will try to explain what I am doing so that if you don’t have a background in regression analysis you can still get the point.  If you do have a regression background, know that I am focusing on a couple of key ingredients so they are not intended to be perfect models.  Mostly I am trying to use data to illustrate a point.

 photo REG1_zps32536599.jpg

 

So first I went back to all data and ran this OLS specification with wRC+ as the dependent variable.  I was looking at two things, we expect age to affect players in a nonlinear fashion (aging CURVE) so I put in an age and age squared term and did the same for experience where 1st year in the big leagues is 1, 2nd is 2, etc.  AL and NL are probably not necessary but are controlled for in wRC+ and I just went ahead and stripped that part out since I had it there in dummy variable form.  Then I added interaction terms where I multiplied age and experience to see if the combination of the two is important rather than them acting independently.  The only term that came back insignificant was experience square which gave experience a purely linear relationship to hitting performance and also shows why this would be a bad model to lean on in predicting player performance.

The coefficient for experience is 17.4 so the model is saying each year of experience helps the player’s wRC+ increase by an average of that amount.  Other factors, age and age/experience interaction are negative and working against that, but this strong positive experience coefficient makes it so that if you model out a generic player of any cohort they get better at hitting for an unreasonable amount of time before the negative coefficients catch up because age*experience as a multiplier is getting bigger faster.  For the age 21 cohort the first year a player would start to decline would therefor be predicted in year 13 at age 33, and for the 27 cohort year 10 age 36 going against everything we know.

This is I think mostly due to survivor bias (I have discussed this before).  Let me show you what causes this with another regression output.  In this one I intentionally bias the sample by only including players who have 10 or more full seasons.  This reduces my original number of player from 2,054 down to 390, so about 19% of position players that get a full season end up with 10 or more for their career according to this set of players and they have an inordinate effect on a regression of the whole group.

 photo 10plusyearREG_zps46d068c3.jpg

 

In the first regression there were 11,379 observations (player years), but 5,097 came from this group of players that made it 10+ years.  That means 19% of the players are making up almost 45% of data being used!  They are also in general the best players, which is why they stuck around for so long and thus made it look like experience was a huge positive above.  Within just these players you see that effect is still strong with an experience coefficient of 14.6, but it is no longer linear as experience squared is now a significant negative showing the curve I would expect of experience.  Experience, at least in my expectation, should be beneficial to a player, but have diminishing returns (less effect in each year of experience) and this model shows that.  If you play this model out for the same cohorts I did before it does a better job of showing the peak in the mid 20s, but then continuing production for a lot longer than we would expect for an average player.  That’s fine, I just wanted to show why it is hard to tell how the general player ages because of the undue power of the players who stick around for so long.

Finally, I want to show you one more regression and discuss some things I think are important for aging in baseball players.  In this one I focused on differencing of wRC+ (e.g. year 2 minus year 1) and created a variable called sustained.  Sustained is a dummy variable that shows years in which a player was better than a previous wRC+ level in two consecutive years.  So if a player had a wRC+ of 100, then 112 the next year and 108 the next it was sustaining higher performance.  Also, since I am using differences in wRC+ instead of the values themselves all 1st year player data is gone since there is nothing to difference it from.  This could be considered as biasing data again, but since we are looking at aging curves players need to stick in the league to see anything so I am doing a study only on those players rather than one and dones.  Here is the output, then more discussion:

 photo REGlogit_zpscaff049a.jpg

 

Sustained is now the dependent variable, and it is a binomial variable, so I had to move to a logit model.  That means the coefficients are now hard to directly interpret them since they are log odds of the sustained outcome rather than actual units of wRC+ as before.  This model does show what I believe to be the case after breaking all of the aging curve into age cohorts.  It does not show age or age squared as significant, it is showing that experience matters and that the interaction of experience and age matters.  Players who can get major league experience benefit most from getting that experience younger.  There is an obvious endogeneity issue here that that it may be the other way around, players that can get to the majors younger are better players.  I think there is truth in both statements though.

Yes, a player who can handle playing at the major league level at a younger age is likely better and should have a higher expected peak.  On top of that though, the model here is showing that the experience for such a player may also matter.  Playing against better competition makes players better, this is a commonly held belief and there is research to back it up if you want to go over to Google Scholar if you want to search around and read some formal pieces on that topic.  For an anecdotal example let’s look at a couple of players. Jose Guillen came up at 21 and muddled around for several years posting 82, 83, 67, and 88 wRC+ numbers in from 1997 through 2000, only got 145 and 259 plate appearances the next two years, and then finally put up a 138 wRC+ followed by three more above average seasons.  Around the same time there was a guy named Travis Lee who was not in the majors until 23 and posted a 102 wRC+ as a rookie.  He hung around for awhile with a peak of 112 wRC+ in 2003, but had a pretty unspectacular career.

Would Travis Lee have been able to put up an 82 wRC+ a couple years before his 102 at age 23?  I have no idea, but it is possible that if he had, and had two more years experience before that 1998 season that was his rookie year that he might have developed very differently.  The interaction term of age and experience is therefore very important in my opinion.  The model shows that experience is an arc that first increases, peaks, and then decreases in probability of reach new sustained performance levels.  If you look at it in conjunction with the age times experience and squared term of age and experience it shows that the probability of reaching a new and higher level of production is higher for a younger cohort (I’ll forgo posting the numbers for expediency), peaks in the mid 20s, and then drops off fairly quickly.  That is what the aging curve probably looks like based on all I have done so far.


Dominant Players (a la XKCD)

With apologies to Randall Munroe:

Dominant players

Click to embiggen

If you’d like to make your own graph like this one, I’ve pasted the R code I used here.