Everyone knows that a strong farm system is key to the long-term success of a major league organization. They make it possible for clubs to field competitive teams at affordable salaries and stay beneath the luxury tax threshold, but how much value can an organization truly expect from their farm system? How much more value do the best farm systems generate compared to the worst ones? I decided to take a closer look.
The first thing I did was gather the player information and rankings from the Baseball America’s Prospect Handbooks from 2001-14 and entered them into a database. I then found players’ total fWAR produced over the next six seasons, and I added them together to find the values that each farm system produced. I chose six seasons to ensure that teams wouldn’t get credit for a player’s non-team-controlled years, since the value produced would not be guaranteed for the player’s current organization. This method will reduce the total value produced by players that are further away from the majors, but the purpose of this analysis is to focus on the value of the entire farm system and not an individual player’s value over the course of their career.
Let’s look at the 2014 Minnesota Twins as an example. Below is a list of the thirty players that were ranked and the amount of WAR that each player has produced by season. Read the rest of this entry »
Prospects are the lifeblood of any baseball organization. They have the ability to provide large amounts of value for their team while making a fraction of what they could earn on the open market. This provides a huge competitive advantage for teams that have a superior player development system. Every organization has a different plan for their prospects and the purpose of this research was to attempt to determine which development plan yields the most production in a team’s cost controlled years for each group of players.
The first step in gathering the data was to find every hitter that debuted from 1995-2009. I stopped at 2009, because this covers most of the prospect’s cost controlled years. I chose to start in 1995, because it gave me a big sample size and I got to avoid the strike year of 1994. Next, I omitted anyone who debuted at the age of 29 or older. I did this, because players that are over 28 are usually not considered prospects and their clubs would not consider them to be future building blocks for their organization.
The final step was to eliminate anyone who did not exceed their rookie limits. I decided to omit these players, because any player that cannot amass 130 at bats in their career was probably never considered a serious prospect. If they were, at least one team would have given them more opportunities to earn a starting job.
To determine a player’s production during his cost controlled years, I found when every player exceeded their rookie status and added the next five years of WAR to their total. If the player had previous major league experience prior to the season they lost their rookie status, I included those numbers as well. For a player’s minor league plate appearances total, I included all of their plate appearances from the start of their professional career up to and including the year they lost their rookie status.
I then broke up the data by player groups. I split up the data by players who attended college, American born players that did not attend college and international born players that did not attend college. Throughout the rest of this article, I will simply refer to these groups as college players, high school players and international players.
Next, I partitioned the data by minor league plate appearances. I decided to split the plate appearances into groups of 500. I chose this amount of plate appearances, because it is a nice proxy for a full season of production and it splits the data into a fairly even distribution of players among the groups.
I’ll start by giving a simple overview of total player production over their cost controlled years. The table below shows the median WAR for each grouping. I decided to use median instead of average throughout this article, because the WAR measurement is right skewed instead of normally distributed.
Median WAR for All Players
View post on imgur.com
As you can see in the table above, college players need the least amount of plate appearances to produce a high level of WAR, but there is a sharp decline in production when a college player amasses over 2500 plate appearances. It makes sense that this player group is the quickest to develop, because they have had several more years of amateur competition to help hone their skills for professional baseball. This should create a smoother transition period for these players and reduce the amount of plate appearances needed to become a valued member of the major league club.
High School Observations
Unlike their college counterparts, American high school players take an extra 500 plate appearances before they reach their peak value of 15.4 WAR. However, high school players also have a wider range of success than either college or international players. High school players also produce more than the other two groups of players. This result may seem counter-intuitive, since it is commonly accepted that high school players are riskier prospects than college players. It is important to remember that this process does not account for all of the high school prospects that never receive an at bat in the majors. We therefore create a selection bias where we only look at the players that were good enough to make it to the majors in the first place. This means that if a high school player is good enough to make it to the majors; he’s probably going to be a productive major leaguer.
The international player group offers the least amount of production. I believe there are several factors that contribute to this result. One of the main factors could be that many of these players have not played as much organized baseball as their counterparts. I also think that there could potentially be a language barrier issue that makes it more difficult for an organization to teach foreign players as opposed to their English speaking teammates. Of course that conclusion is just pure speculation on my part, but I believe that it is a reasonable assumption to make.
Total Player Summary
As the table above shows, the longer a prospect is in the minor leagues, the less chance they have of making an impact in the major leagues. This makes sense, because if a prospect is outperforming everyone in the minor leagues, they will be called up much sooner to help the major league club than everyone else. This leads me to believe that this table may not be the most informative for every minor leaguer. Perhaps, if we segment the data between Baseball America’s top 100 prospects and every other prospect, we will get a more accurate depiction of minor league development. It is essential to remember that the more we split the data, the less accurate our individual values may be. Therefore, we should not take the numerical value of WAR for each grouping too seriously. It is more important to take an overall view of the values in the tables below before drawing any conclusions about player development.
Median WAR for Top 100 Prospects
Top 100 Prospects Summary
Yet again, we see that college players develop the quickest and that high school players take a little longer to develop. College players also have a quick drop in production after 1000 plate appearances, but they still yield the highest production of the three groups. International prospects are a bit of a mystery here. There does not seem to be a pattern in their production. I assume this is because there are major differences in baseball development between South American prospects, Japanese prospects and Canadian prospects, and any other nation’s prospects you can think of. In the future I may revisit this issue, but for now I’ll have to make do with what I have.
Median WAR for Non-Top 100 Prospects
Non-Top 100 Prospects Summary
As expected, we see a dramatic drop in overall WAR across the board. This means that Baseball America is usually correct when identifying the most impactful future major league players. Kudos to you Baseball America. We also observe that these groups of players develop a bit more slowly than their more heralded prospects. These college players continue to peak early, but they are still 500 plate appearances in development behind the top prospects. High school players take even longer to develop now with a peak of 2.8 WAR in the 2001-2500 plate appearances group as opposed to 15.4 WAR in the 1001-1500 plate appearances group for the top high school prospects. International players are much more consistent in this table than the previous one. Unfortunately, they also have the worst total median WAR of 0.1.
So let’s do a quick recap. Usually the less time a player spends in the minors, the more productive they will be in the majors. High school prospects offer the most production, while international prospects offer the least production and college prospects fall somewhere in-between. We also observed that college prospects develop the quickest, high school prospects develop a little slower and international prospects are a bit of a mixed bag. I attributed this to simply combining all foreign born players into one group instead of by nation or continent. I hope this article has been informative and that it provides some guidance on when teams should consider calling up their most prized assets.
Baseball is a game driven by stars. They create the most exciting highlight reels that captivate audiences and leave us all in awe. However, eventually every star player loses their battle with Father Time. The purpose of this research was to try and determine when a star player’s production declines to the point where they can become easily replaceable. I decided to use a process called survival analysis to determine when this event occurs.
Survival analysis attempts to determine the probability of when an event will occur. In any survival analysis problem, you need to determine three things. You need to determine the requirements for your population, the variables to predict the time of event, and the event.
For this problem, I decided that I would include any player that had their first season of 4 WAR or higher between 1920 and 1999 in my population. I decided to use for my variables: the age when they recorded their first star season, body mass index, offensive runs above average per 150 games, and defensive runs above average per 150 games as my variables. The event I chose to predict was when the player would have his first season below 1 WAR following their star season. The cutoffs for determining stars and scrubs were fairly arbitrary, but I chose these cutoffs because the FanGraphs glossary loosely defines an All-Star season as 4-5 WAR and a scrub season as 0-1 WAR.
Determining the variables was much more difficult. I wanted to pick variables that would represent a player’s performance, age, and overall health. The age was simple enough to find, but it was difficult to find any injury history for players so I decided to calculate a player’s BMI from their listed height and weight. Obviously this isn’t a perfect representation, because a player’s weight is constantly changing throughout his career, but it’s the best that I could do given my limited resources. In order to limit my performance variables, I thought it was best to settle for the offensive runs and defensive runs component of WAR. However, since these are accumulating statistics, I had to recreate them as rate statistics in order to avoid creating correlation issues with the age variable in the model. I would have liked to use more offensive variables, but I feared that adding more inputs would make the model too convoluted and affect the accuracy of the player predictions. Alright, that’s enough preparation; let’s dive into the actual data.
Survival Rate Data
As a jumping off point, I’ll start by presenting a table of the survival rates for my population. Each season indicates the percentage of players from the original population that had not yet recorded a scrub season.
Let’s make some quick observations. The data shows that no star player has gone more than 20 seasons without recording a season below 1 WAR. It also appears that the survival function decays exponentially. I also found it interesting that over 50% of stars turn into scrubs by their fifth season and that only 17% of star players survive 10 years in the majors before they register a scrub season. Looking at this data really helps to appreciate how rare it is when players like Derek Jeter and Adrian Beltre perform at a consistent level on a year to year basis.
Hazard Rate Data
Next, we will look at the hazard rate of the players in the population. One of the purposes of examining the hazard rate is to see how the rate of failure changes in a population over time. To find the hazard rate for each time period, you divide the amount of events recorded during a time period by the amount of players that have not yet registered a scrub season. Below is the following calculation for each time period in table format.
As you can see by the table above, the hazard rate generally increases with each passing season. This makes sense, because as players age, their skill level decreases and their odds of registering a scrub season will increase. However, the hazard rates are fairly constant for the first ten years and then rapidly increase from then on. I’m rather surprised that the hazard rates stayed so consistent for the first ten or so years. I would have guessed that the hazard function would have increased much more rapidly with each passing season.
Determining the Model
It is important to identify the trend of the hazard function, because it helps determine which distribution to use when creating a parametric model. If the hazard rate increases exponentially, you are supposed to use a Weibull distribution. If the hazard rate is constant, you are supposed to use an exponential distribution. Since the hazard function was increasing, I originally attempted to the use the Weibull distribution for the model but I found that the model was predicting too many players to fail in the first few seasons, so I decided to try an exponential distribution instead.
I found that the exponential distribution model was more accurate at predicting survival rates in the first ten years, but severely under predicted the amount of players that would record a scrub season after ten years. I decided to use the exponential distribution, because I believe that it would be far more useful to accurately predict the first ten years instead of the last ten years, since only 17% of players survive ten years. I also believe that any franchise would be thrilled to obtain ten years of stardom from a player and anymore production is just an added bonus.
Survival Rate Estimates
Below is a table of each star player from 2000 to 2014 with the year they entered the population, the time until they became a scrub, every variable included in the model and their predicted survival rate for each of their first ten seasons since becoming a star.
After looking at this table, we can draw several conclusions. First, this Mike Trout guy is really good at baseball. Secondly, age is the main variable in determining the time until failure. The players with the highest survival rates are all under twenty-five and all the lowest survival rates are over thirty. This makes sense, because it is much easier for a twenty-year-old star to remain effective until he is thirty compared to a thirty-year-old star attempting to remain effective until he is forty. This is because older players face more challenges such as eroding skills, an increased chance of sustaining injuries and having their playing time reduced to prevent injuries.
It also appears that offensive stars survive longer than defensive stars. This is probably due to the fact that defensive skills usually deteriorate faster than offensive skills. I also believe that since defensive statistics are more volatile than offensive statistics, that players that derive much of their value from their defense are more likely to have their WAR fluctuate from year to year. This makes it more likely that a defensive star could register a scrub season one year and then become a star again the next year. And this brings me to my next point.
Things to Keep in Mind
If a player records a scrub season that does not necessarily mean that he is finished. If this were the case, players like Aramis Ramirez, Robinson Cano and Troy Tulowitzki would have had much less productive careers. It is also important to remember that a player enters the population as soon as they record their first star season, so it is quite possible that a player could improve after their first star season and make it more likely that they can outlast their projected survival rate. The main thing to remember is that no model is perfect and no model is meant to replace the human decision-making process. Models are only meant to improve the decision-making process and it is my hope that this model has accomplished that goal.