Fantasy Rankings: Why Methodology Matters
By far, the hardest thing about fantasy baseball is the fact that you can’t predict the future. Every year, a Matt Carpenter or a Chris Davis vastly outperforms expectations and wins a fantasy league for somebody, and a Matt Kemp battles injuries all year and makes somebody else tear their hair out. But you learn to deal with that sort of thing, or you take up a less stressful hobby, like Russian roulette. C’est la vie, and all that.
What this article is about, however, is that the second-hardest thing about fantasy baseball is trying to juggle categories. Which is better, Mike Trout’s five-category production, or Miguel Cabrera’s dominance in four categories? How much is it worth to have Billy Hamilton singlehandedly win stolen bases for you while contributing nothing in the other categories? Can you absorb Pedro Alvarez’s batting average hit for the home runs he gives you? Over the years, people have come up with a few different ways to try to answer those questions. Standing Gain Points (SGP) is one popular method. Z-scores are another. There are others, but those are the two I see the most, so they’re the two I’m going to talk about. The point of this article isn’t to compare all of the ranking systems out there and figure out which one is “right.” The point of this article is to call attention to the fact that your choice of ranking system matters, probably more than you think.
Of course, most fantasy ranking systems start with projections. Personally, I like to use composite projections, because I think there’s value in combining projections and smoothing out spots where one system might be exceptionally high or low on a player. You can disagree with the projections – that’s not the point. The point is, you (or your fantasy expert of choice, if you use published rankings) can take the same projections, plug them into different ranking systems, and get substantially different results.
For the purposes of this article, I’m keeping things very simple, perhaps a little too simple. I don’t care about volatility, risk, upside, injuries, etc. I’m assuming that these projections are accurate. And I’m not going to bother with positional adjustment, because I’m lazy and these aren’t the rankings I’m drafting from, and it doesn’t matter anyway. I’m concerned with how using different methods changes players’ rankings relative to each other, not how much to bump Buster Posey up my draft board because I need a catcher. And I’m looking at rankings, not auction values, because that’s another step that I don’t feel like taking right now.
I’m going to look at the shortstop position (specifically the top 14, because I play in a 14-team league) for this article, because I need to narrow things down to a manageable number of players. I’m assuming a standard 5×5 league. And what I’m looking at is SGP (using the formula here), compared to two slightly different ways of calculating z-scores. In all cases, I’m looking at the rankings of each player among shortstops and among all hitters. Really, though, I’m concerned with the overall rankings because I want to see how players move around – the choice to focus on shortstops is just a convenient way to select a handful of players to look at.
Anyway, on to the fun stuff:
SGP shortstop rankings:
Player Name | AB | H | R | HR | RBI | SB | AVG | ORANK | SSRANK |
Troy Tulowitzki | 525 | 157 | 84 | 28 | 91 | 3 | 0.300 | 19 | 1 |
Hanley Ramirez | 510 | 146 | 81 | 23 | 81 | 16 | 0.287 | 22 | 2 |
Jose Reyes | 573 | 169 | 88 | 12 | 54 | 26 | 0.295 | 32 | 3 |
Jean Segura | 592 | 164 | 77 | 10 | 51 | 37 | 0.277 | 39 | 4 |
Ian Desmond | 568 | 156 | 72 | 20 | 77 | 19 | 0.275 | 41 | 5 |
Elvis Andrus | 612 | 168 | 80 | 5 | 60 | 35 | 0.275 | 47 | 6 |
Everth Cabrera | 575 | 149 | 76 | 4 | 42 | 49 | 0.259 | 50 | 7 |
Ben Zobrist | 580 | 157 | 82 | 15 | 77 | 11 | 0.271 | 71 | 8 |
Starlin Castro | 636 | 177 | 77 | 12 | 58 | 14 | 0.278 | 94 | 9 |
Asdrubal Cabrera | 539 | 141 | 70 | 16 | 68 | 11 | 0.261 | 108 | 10 |
Andrelton Simmons | 578 | 157 | 73 | 14 | 61 | 9 | 0.271 | 115 | 11 |
J.J. Hardy | 577 | 151 | 70 | 23 | 69 | 1 | 0.262 | 116 | 12 |
Alexei Ramirez | 595 | 161 | 63 | 8 | 57 | 21 | 0.270 | 118 | 13 |
Bradley Miller | 522 | 142 | 71 | 14 | 57 | 11 | 0.271 | 119 | 14 |
Looks reasonable. I don’t know. We don’t have anything to compare it to yet. So let’s compare it to z-scores. For this example, I’m going to calculate my average and standard deviation for each category using all players projected for over 300 at bats.
Z-score shortstop rankings using all players with >300 AB:
Player Name | AB | H | R | HR | RBI | SB | AVG | ORANK | SSRANK |
Troy Tulowitzki | 525 | 157 | 84 | 28 | 91 | 3 | 0.300 | 16 | 1 |
Hanley Ramirez | 510 | 146 | 81 | 23 | 81 | 16 | 0.287 | 22 | 2 |
Jose Reyes | 573 | 169 | 88 | 12 | 54 | 26 | 0.295 | 36 | 3 |
Ian Desmond | 568 | 156 | 72 | 20 | 77 | 19 | 0.275 | 43 | 4 |
Jean Segura | 592 | 164 | 77 | 10 | 51 | 37 | 0.277 | 51 | 5 |
Elvis Andrus | 612 | 168 | 80 | 5 | 60 | 35 | 0.275 | 57 | 6 |
Ben Zobrist | 580 | 157 | 82 | 15 | 77 | 11 | 0.271 | 65 | 7 |
Everth Cabrera | 575 | 149 | 76 | 4 | 42 | 49 | 0.259 | 71 | 8 |
Starlin Castro | 636 | 177 | 77 | 12 | 58 | 14 | 0.278 | 91 | 9 |
Asdrubal Cabrera | 539 | 141 | 70 | 16 | 68 | 11 | 0.261 | 110 | 10 |
J.J. Hardy | 577 | 151 | 70 | 23 | 69 | 1 | 0.262 | 112 | 11 |
Andrelton Simmons | 578 | 157 | 73 | 14 | 61 | 9 | 0.271 | 114 | 12 |
Bradley Miller | 522 | 142 | 71 | 14 | 57 | 11 | 0.271 | 119 | 13 |
Alexei Ramirez | 595 | 161 | 63 | 8 | 57 | 21 | 0.270 | 125 | 14 |
Comparing those two tables, the methods agree on the top 14 shortstops. For the most part, these rankings are pretty similar. But Tulowitzki moves up a few spots in the overall rankings, which isn’t insignificant that early in the draft. Segura drops a round or two, and swaps spots with Desmond in the shortstop rankings. Andrus moves down the overall rankings a bit. Everth Cabrera moves down the overall rankings quite a lot, going from a mid-round steal to a guy who’s probably merely a decent value at his ADP.
So we learned a few things there, maybe. But when I use z-scores, I don’t think it makes sense to calculate them using every player who sees significant playing time – most of those will probably never be rostered in your fantasy league. I want to compare fantasy-relevant players to other fantasy-relevant players, not waiver wire fodder. So let’s take the top 200 hitters, as determined by the initial z-score rankings, recalculate the average and standard deviation for each category using only those players, and try again.
Z-score shortstop rankings using the top 200 players:
Player Name | AB | H | R | HR | RBI | SB | AVG | ORANK | SSRANK |
Troy Tulowitzki | 525 | 157 | 84 | 28 | 91 | 3 | 0.300 | 14 | 1 |
Hanley Ramirez | 510 | 146 | 81 | 23 | 81 | 16 | 0.287 | 23 | 2 |
Jose Reyes | 573 | 169 | 88 | 12 | 54 | 26 | 0.295 | 36 | 3 |
Ian Desmond | 568 | 156 | 72 | 20 | 77 | 19 | 0.275 | 49 | 4 |
Jean Segura | 592 | 164 | 77 | 10 | 51 | 37 | 0.277 | 59 | 5 |
Elvis Andrus | 612 | 168 | 80 | 5 | 60 | 35 | 0.275 | 63 | 6 |
Ben Zobrist | 580 | 157 | 82 | 15 | 77 | 11 | 0.271 | 64 | 7 |
Everth Cabrera | 575 | 149 | 76 | 4 | 42 | 49 | 0.259 | 93 | 8 |
Starlin Castro | 636 | 177 | 77 | 12 | 58 | 14 | 0.278 | 94 | 9 |
J.J. Hardy | 577 | 151 | 70 | 23 | 69 | 1 | 0.262 | 108 | 10 |
Asdrubal Cabrera | 539 | 141 | 70 | 16 | 68 | 11 | 0.261 | 111 | 11 |
Andrelton Simmons | 578 | 157 | 73 | 14 | 61 | 9 | 0.271 | 115 | 12 |
Bradley Miller | 522 | 142 | 71 | 14 | 57 | 11 | 0.271 | 119 | 13 |
Jed Lowrie | 538 | 145 | 70 | 15 | 65 | 3 | 0.269 | 126 | 14 |
Again, everything looks pretty similar at first glance. Alexei Ramirez drops off the list in favor of Jed Lowrie, but that’s no big deal. But Tulowitzki moves up another couple spots – he’s pushing first-round value now, even before positional adjustments. Segura and Andrus drop a little further in the overall rankings. Cabrera, who was already worth less using z-scores, is even worse with a smaller player pool. Remember, that rank of 93 is only among hitters – factor in pitchers, and Cabrera, a mid-round steal using SGP, now looks overvalued at his ADP of 106 (though we can’t say that for sure without applying positional adjustments). All things considered, simply changing the size of the player pool had as much of an effect as changing from SGP to z-scores in the first place.
Depending on which ranking method you use, you’re going to place a pretty different value on some of these players (again, with the caveat that I didn’t do positional adjustments). At the top of the shortstop rankings, Tulowitzki could be anywhere from a late second round pick to a borderline first-rounder. Cabrera’s value swings wildly depending on what system you use – he’s either a player to target fairly early, or borderline undraftable where you’d have to take him. Other players, like Hanley Ramirez or Brad Miller, are remarkably consistent across all three methods, but there’s no way to know how much of that is chance.
The natural thing now is to wonder is which of these systems is right. This seems like it should be solvable. I really want there to be an answer to this, a clear way to combine five categories of production into a single overall rank. Unfortunately, I’m not convinced that exists. People smarter than me have come up with a few different ways to reach that goal, and the results don’t agree with each other. Even if they did, the needs of your team are going to evolve as the draft goes on. When you pick whatever method you prefer and compile your pre-draft rankings, the numbers you get are going to look pretty absolute, there in black and white in your spreadsheet. But really, they’re more like ballpark estimates, and they could easily be totally different.
Interesting analysis. I have been using the third method for most of my rankings simply because it makes more sense to me than using SGP in a H2H league, but it seems like I need to think about and compare my methodology a bit more.
H2H leagues have a complicating factor that I haven’t seen accounted for anywhere. On a week-to-week basis, luck obviously plays a huge role; the team with an on-paper advantage in any given category probably only wins that category a little over half the time. Where it gets interesting is, not all categories are equally volatile. For instance, a projected advantage in HR probably holds up a little more often than a comparable projected advantage in AVG. Or at least I think so, given that HR rate stabilizes more quickly than AVG.
I’m not sure how to rank the week-to-week volatility of the other categories – in and of itself, that would be an interesting thing to know for H2H leagues. But after you’ve figured that out, do you give more value to the less volatile stat categories, since an advantage there is more likely to pay off and the more volatile categories are a crapshoot anyway? Or do you just try to get a balanced team, figuring that a smaller advantage in a lot of categories maximizes your chances of winning at least half of them?
I’ve thought a lot about this over the years. I won my 2010 H2H despite having the worst season stats in nearly every category. I would have been dead last in a roto league.
We often hear about players having better performance in the first half than the second due to a lack of durability. How does that consistency hold up day to day though? There are other factors too, like maybe the player’s team is facing a string of like-handed pitchers that he struggles against. We hear about that a lot too. So many people play in H2H leagues, I’m a little surprised this is largely just ignored. It is difficult to pick out what is and isn’t relevant, and at the same time, it seems really important.
I like this discussion. Personally if I were to guess the best (because they are less volatile) categories for a H2H team to excel at they would be R HR RBI on the hitting side and K on the pitching side.
My guess is you want to be really good at the less volatile categories and mediocre in the others.
Fun article. I’ve also noticed that seemingly small choices can lead to large differences in player valuations. However, here are two reasons NOT to use any of these methods:
1) Projected stats understate the true dispersion in players’ performance. If you calculate z-scores using the standard deviation of your projected stats, you will over value hard-to-predict categories. Instead, use historical standard deviations of actual performance. Doing so will help you avoid paying through the nose for projected pitcher wins that are never realized. (If I understand the SGP method correctly, this issue can lead to the same kind of problems in that method.)
2) Projected plate appearances are irrelevant for setting the cutoff because projected playing time is probably bunk, and because 300 plate appearances of Billy Hamilton might be a lot more valuable than 700 plate appearances of Alexei Ramirez. Instead, use a two-step approach. Calculate z-scores based on the entire population of players with major league projections. Do an initial overall ranking. Then keep only players who are above whatever your league replacement level might be. Now do a second z-score based on that population. Now you have z-scores that measure value relative to the draft-able player population, exactly what you should care about when determining your rankings.
Wouldn’t it be easier to just take into account point #1 in your projections by not applying a wide range of values for the categories which are difficult to project? It seems that’s what Steamer has done at least.
More info here:
http://tangotiger.com/index.php/site/article/difference-between-forecasting-results-with-an-without-an-identifier
That’s true, and exactly the source of the problem. Good projection systems like Steamer will project a narrower range of values than will actually be realized. However, when you z-score the stats you divide by a too small standard deviation and get back a wide range of values, essentially undoing all that good mean reversion in the projections. It’s similar to what would happen if you decided to z-score stolen bases using a standard deviation calculated only among catchers. You’d end up paying a ton for a C with 5 projected SB versus the another identical C with only 4 SB.
One thing I’ve considered in the past is using a few seasons of data to find the correlation between projected and actual values for each category, then multiplying the z-score for each category by its correlation. In theory, that would give more weight to the stats that can be projected with some accuracy. I hadn’t considered using last year’s actual performance to get standard deviations, but it’s an interesting idea. I’m not sure if you create bias by mixing actual and projected stats, though – you probably do, because bias is everywhere.
Which is really the issue this article was meant to highlight. We have a bunch of ways of ranking players, but they’re mostly based on what seems intuitively logical to the person who came up with them. They give you a bunch of different results, and we don’t really know which one most closely matches the actual value of players in terms of winning a fantasy league.
Stat categories in H2H leagues vary wildly in value due to aforementioned variability over what is a short time period of one week. My data work shows batting average is almost all luck, over a week. SB is worth almost double HR, both are very valuable. In pitching, Ks are most valuable, whip is more stable than era. In general pitching categories are more stable, thus more valuable over a week, factoring in a min of 30 ip. I love that this critical info is almost completely ignored, it’s allowed me to win league after league, ESP Y!
Pitchers are way under drafted, it plays right into the face of conventional wisdom, it’s perfect. Humans still over rate batting average, even in fantasy…perfect. It’s near useless in a H2H league.