Author: Jon L.

Author Archive

Ranking Batters in Fantasy Leagues with Alternate Stats

by Jon L.

March 20, 2014

Draft prep: Framing the problem

So you’re preparing for your fantasy draft. You’re caught up on FanGraphs, checked for recent injuries at Rotoworld, maybe skimmed a few headlines from your other top 11 baseball news sites. Maybe you’ve even downloaded the FanGraphs positional rankings, and are planning to keep the file open during the draft as a reality check against the pre-set rankings of the site your league uses.

But really, what do the guys at FanGraphs know? Sure, they know a lot about baseball, and statistics, and this year’s projections, and a handful of underlying stats that tend to predict future performance. But what they don’t know is whether your league uses OBP instead of AVG, or OPS, SLG, or batters’ strikeouts, or maybe holds and FIP and pitcher fielding percentage. If this is your situation, then I feel your pain. My fantasy league uses eight statistics for batters and pitchers, three each beyond the usual five. (In case you’re curious, the mysterious six are: Batter hits, K’s, & OPS; Pitcher holds, losses & complete games).

These differences matter. If your league uses OBP, Joey Votto turns from a fantasy player who’s solid in four categories (including average, where his impact is limited because he walks all the time) to a guy with a truly elite skill. Maybe it’s easy for you to account for the relative value of a Joey Votto, but how well can you project the 25th through 35th outfielders? Some might be much better or worse in your league. If you have batter strikeouts, as in my league, how do you value Mark Trumbo and his home run power against the elite contact skills of Norichika Aoki?

Generating your own rankings

One answer, and the one I opted for, is to generate rankings based on your own league’s stats. Now, this may sound a bit too work-intensive and time-consuming for most of you (especially those of you with relatively normal priorities), but in reality it wasn’t as time-consuming as I expected.*

First of all, there’s no need to reinvent the wheel. There are lots of projection systems out there that are available to the public, and some of them are quite good. I decided I would simply download all the projections listed on FanGraphs, and average them out. And then, after thinking for a little while about the costs and benefits of that approach, I decided I wouldn’t do that at all, and instead would use the results of just one projection system. But which one should I use? Luckily, that’s yet another bit of analysis we don’t need to bother with, because the Interwebs are full of crazy mathematicians who love baseball and have nothing better to do. After searching for a few articles that evaluate projection systems, like this one and this meta-one, I decided that the forecasts I trusted most (and were easiest to obtain) were Steamer for batters and FanGraphs fans for pitchers. (The high accuracy of the latter shocked me at first, but then I realized that fans assimilate the results of all the projection systems into their own player projections, departing from them only as dictated by common sense, inside scoop, and hope.)

Operationalizing the Solution

Here’s where it gets tricky. What advanced data manipulation packages and techniques are best for downloading reams of data from the FanGraphs site into your spreadsheet? Certainly there was no need for me to copy and paste the data 50 players at a time like someone living the dark ages, was there? No, of course not. And I probably never really did that.

Instead – bear with me if you’re not technically inclined – I hit the gray “Export Data” button to the upper right of my chosen projection page. This involved a lot of loading the correct page, hovering my mouse over the text, and clicking, but in the end it was worth all the work, because 5 minutes of sweat, plus a beer, had finally paid off in spreadsheets full of data.

*If you’re not interested in these details, the fun stuff is posted in a couple of tables towards the end. (I like writing, so this is likely to go on for a while.)

Z-scoring your data points

Z-scoring batter projections is easy. The problem lies in determining what set of players to use in order to calculate means and standard deviations.

This is an important question, at least to the extent that any question in fantasy baseball is important. For example, if you must use every hitter in the league, including the guys projected for 8 at-bats, you create the illusion that lots of players bat .220 or score only 4 runs, as opposed to your league’s reality in which .270 with 70 runs is pretty ordinary. For a little math fun, I compared the results generated using means and deviations 500 players deep (the equivalent of a 25-team league that rosters 20 position players) versus one with more reasonable assumptions. It caused huge increases in variance in runs and rbi’s, so a guy who drove in and scored 100 compared no better to the mean either way (~2+ standard deviations), but smaller increases in the variance in SB’s, HR’s, and OPS, which, together with the lower means, meanings this system overvalues guys who produce in these categories. Martin Prado and Torii Hunter were made sad, whereas Billy Hamilton was elevated to a demigod (or at least a top-40 hitter).

So how do you generate values that represent your player pool?

One method – and a very reasonable one – is to use the final statistics compiled by your league the previous year. With this data, it’s easy to generate per-slot averages based on last year’s performance, and to compare projected performance against it. But I did not choose this method. A more savvy number-cruncher might say that projection systems, while designed to be as accurate as possible for each player, may be systematically biased on the whole, and therefore determining the value of this year’s projections based on last year’s actual statistics is tantamount to comparing apples and oranges.

I was more worried about lazy owners. Any league can have a couple of careless owners who are in it just for fun (the gall!), or who keep BJ Upton when he can’t even see the Mendoza line, because of that one time his cousin shook BJ’s hand at a Jay-Z concert. I know of what I speak. If your goal is to win your league, you want to base your evaluation on the best players available, rather than the happenstance of which Atlanta outfielders spent the whole year on someone’s roster.

I generated means using very precise data, plus a random stab in the dark. First, I looked up the exact number of players at each position in my league from the previous year. Then I mostly ignored this data. Although it’s true that player values vary greatly between leagues depending on how many players start, and how many are rostered, this is the sort of thing you can keep track of during the draft. Don’t draft another first baseman if you already have three of them and no shortstop, and don’t draft a first baseman just because he’s ranked ahead of a shortstop if there are another seven first basemen ranked close behind.

My league rostered only 123 regulars last year. Not a deep league. I used a lot more than 123 in my calculations in an effort to lower the means a bit, to account for the existence of catchers and second basemen. I then haphazardly created sort variables so I could bring the best 150 to 180 players to the fore, with the goal of getting a fair representation of the quality of players in my league. I tried various formulas like [(HR+1) * R * RBI * (SB +1) * AVG * OPS] (adding 1’s so as not to exclude players projected for 0 HR’s or SB’s ) and PA * wOBA. Virtually every one of them produced a good representation of the best hitters projected for regular playing time. In the end, the best way to evaluate the sort is to look at the list and see if the guys near the cutoff are fringe players who are familiar from last year’s waiver wire.

Calculating projected player values

Once you determine which players you want to include, Excel is happy to instantaneously calculate averages and standard deviations for each stat. Once you have these values, you can re-include the entire player pool, or as much of it as you wish, and the formula for each player in each category is simply (his projected value – the average projected value)/standard deviation.

The next challenge is to generate ranks from the Z-scores. The simplest way is simply to add them together (being sure to subtract ones where lower scores are better, such as pitcher walks or batter strikeouts). But here, I discovered another issue. A potential superstar who might not have a full-time job could end up ranked about the same or below a mediocre player who was guaranteed to start. If I wanted my draft rankings to make sense at a glance when I have just 90 seconds to pick a player while eating a sandwich, I needed to distinguish accumulators from guys with potential.

Ranking performance and potential

It matters whether a player is an okay guaranteed performer or a unpredictable potential star. If I find myself with no second basemen in the 22nd round, I might want to take the best guy who’s pretty much guaranteed 140 days in the starting lineup, like an Anthony Rendon or a Howie Kendrick. If my roster’s pretty much set, I might prefer a hitter who has a better chance to bust out and hit 45 home runs, like Chris Carter (unless I’m in my league, in which his 80% strikeout rate falls 37 standard deviations below the mean).

What I decided to do was generate two rankings for each batter, one based on projected totals, and one based on projections per plate appearance. Luckily, Steamer has already done the work for us by projecting everyone in both ways. For instance, Everth Cabrera is projected as the 479th-best player by wOBA, with 74 runs and 45 stolen bases. At the other extreme, Colorado’s Kris Parker is projected to be the 50th-best hitter in the league, just ahead of Dustin Pedroia, with a .279 batting average and .465 slugging percentage, despite getting only one plate appearance, and not getting a hit.

At this point, there are 2 sets of columns for each batter: 1 set of columns for his Steamer projections for each relevant stat, and 1 for the associated Z-scores. To this, I added 2 more sets of columns: 1 for per plate-appearance projections for each stat, and 1 for those associated Z-scores. (Dividing hits into plate appearances rather than at-bats feels unnatural, but that’s what you need to do if your league counts total hits.) Calculating per-PA quality is then easy, as you can just add the Z-scores (or subtract for negative statistics). But once you have projected rate statistics in your per-PA rankings, it becomes apparent that it doesn’t make sense to include the exact same values in your projected accumulated totals.

To handle this, I weighted the Z-scores for the rate stats. I multiplied the Z-score for AVG by projected AB’s/average projected AB’s, and you can do the same for OBP, using PA’s. My league uses OPS, a value generated by adding two fractions with different denominators (aka OBP & SLG), so to weight those Z-scores I multiplied them by projected (AB’s + PA’s)/average projected (AB’s + PA’s). I then added these weighted Z-scores to the other Z-scores for projected totals. The result of adding these weights is that a player who is one standard deviation above average in both AVG and OPS, and who has an average number of AB’s and PA’s, would get +2 from these categories in the variable used to rank projected totals. By the same lights, the aforementioned Kyle Parker’s AVG and OPS would essentially get no weighting at all, and have no effect at all on his projected totals, just as in real life his performance is not expected to have any effect at all on the rate stats of your team.

The Fun Stuff

And that’s about it. Once you have Z-scores, it’s very easy to rank players, to change the formulas to rank them by different systems, or to sort players by certain categories to see who stands out the most.

Two common variations on the traditional 5 stats are to include OBP instead of AVG, or to play in a points league. (For a points league, just change the Z-score weighting to reflect the point system). Here are the top players in these alternate systems using this evaluation method (I threw my own league in too, just for kicks):

Rank	Trad 5	OBP 5	Points	Crazy 8s
1	Miguel Cabrera	Miguel Cabrera	Miguel Cabrera	Miguel Cabrera
2	Mike Trout	Mike Trout	Mike Trout	Mike Trout
3	Carlos Gonzalez	Carlos Gonzalez	Joey Votto	Carlos Gonzalez
4	Yasiel Puig	Paul Goldschmidt	Paul Goldschmidt	Andrew McCutchen
5	Paul Goldschmidt	Jose Bautista	Andrew McCutchen	Troy Tulowitzki
6	Andrew McCutchen	Prince Fielder	Prince Fielder	Adrian Beltre
7	Troy Tulowitzki	Andrew McCutchen	Carlos Gonzalez	Prince Fielder
8	Ryan Braun	Edwin Encarnacion	Troy Tulowitzki	Yasiel Puig
9	Prince Fielder	Jose Abreu	Giancarlo Stanton	Paul Goldschmidt
10	Jose Abreu	Yasiel Puig	Jose Bautista	Edwin Encarnacion
11	Chris Davis	Giancarlo Stanton	Yasiel Puig	Albert Pujols
12	Edwin Encarnacion	Chris Davis	Edwin Encarnacion	Ryan Braun
13	Jose Bautista	Troy Tulowitzki	Ryan Braun	Robinson Cano
14	Adrian Beltre	Ryan Braun	Chris Davis	Adrian Gonzalez
15	Giancarlo Stanton	Joey Votto	Shin-Soo Choo	Jacoby Ellsbury
16	Albert Pujols	Shin-Soo Choo	Jose Abreu	Buster Posey
17	Jacoby Ellsbury	Albert Pujols	David Ortiz	Jose Bautista
18	Wilin Rosario	David Ortiz	Adrian Gonzalez	Joey Votto
19	David Ortiz	Adrian Beltre	Adrian Beltre	Jose Abreu
20	Adam Jones	Evan Longoria	Albert Pujols	Eric Hosmer
21	Joey Votto	Bryce Harper	Anthony Rizzo	Billy Butler
22	Carlos Beltran	Jacoby Ellsbury	Robinson Cano	David Ortiz
23	Shin-Soo Choo	Anthony Rizzo	Evan Longoria	Carlos Beltran
24	Adrian Gonzalez	Carlos Beltran	Buster Posey	Chris Davis
25	Robinson Cano	David Wright	David Wright	Anthony Rizzo
26	Bryce Harper	Matt Holliday	Matt Holliday	Giancarlo Stanton
27	Anthony Rizzo	Adrian Gonzalez	Billy Butler	Shin-Soo Choo
28	Evan Longoria	Robinson Cano	Joe Mauer	Adam Jones
29	Eric Hosmer	Jason Heyward	Freddie Freeman	Jose Reyes
30	Michael Cuddyer	Adam Jones	Carlos Beltran	Allen Craig
31	Carlos Gomez	Billy Butler	Bryce Harper	Matt Holliday
32	David Wright	Freddie Freeman	Allen Craig	Norichika Aoki
33	Matt Holliday	Carlos Gomez	Eric Hosmer	Pablo Sandoval
34	Billy Butler	Eric Hosmer	Pablo Sandoval	David Wright
35	Buster Posey	Justin Upton	Michael Cuddyer	Dustin Pedroia
36	Alex Rios	Wilin Rosario	Jacoby Ellsbury	Michael Cuddyer
37	Matt Kemp	Buster Posey	Alex Gordon	Wilin Rosario
38	Hanley Ramirez	Matt Kemp	Jason Heyward	Joe Mauer
39	Freddie Freeman	Michael Cuddyer	Carlos Santana	Martin Prado
40	Jose Reyes	Jay Bruce	Justin Upton	Bryce Harper

(Note: I evaluated points leagues the same way as the other leagues, generating both a points total and a points/PA score for each player. I scaled the two values to give them approximately equal weight, and ranked players by the mean of the two.)

I expected Joey Votto to be a stud in OBP leagues, but in reality Joey Bats benefits more. Jason Heyward too. Meanwhile, CarGo is top 3 in every other system, but falls to the bottom half of the first round in a points league. In my own crazy league, Norichika Aoki projects as a contact-hitting top-40 stud, while Mark Trumbo’s contact deficiencies show up in strikeouts and hits, as well as AVG, and he drops to 82nd.

I also thought it would be cool to see which players project to be affected most under different scoring systems. Here are the players with the largest variation in ranks between systems (weighted to prefer higher-ranked and therefore more interesting players):

Player	Trad 5	OBP 5	Points
Billy Hamilton	42	45	166
Joey Votto	21	15	3
Carlos Santana	101	46	39
Carlos Gonzalez	3	3	7
Carlos Gomez	31	33	69
Yasiel Puig	4	10	11
Alex Rios	36	60	90
Jose Bautista	13	5	10
Adam Jones	20	30	46
Rajai Davis	102	115	208
Joe Mauer	67	57	28
Wilin Rosario	18	36	43
Leonys Martin	58	72	121
Jacoby Ellsbury	17	22	36
Ben Zobrist	93	68	45
Starling Marte	45	67	92
Troy Tulowitzki	7	13	8
Matt Carpenter	125	119	62
Jose Abreu	10	9	16
Martin Prado	88	105	53
Josh Willingham	121	71	73
Jean Segura	51	81	96
Jonathan Villar	139	132	220
Pablo Sandoval	52	63	34
Miguel Montero	197	155	110
Ryan Braun	8	14	13
Allen Craig	41	55	32
Yoenis Cespedes	46	47	72
Giancarlo Stanton	15	11	9
Mike Napoli	99	58	89
Mark Teixeira	71	42	59
Drew Stubbs	135	126	197
George Springer	206	184	293
Jason Heyward	48	29	38
Prince Fielder	9	6	6
Shin-Soo Choo	23	16	15
Nick Swisher	107	79	68
Adam Dunn	239	151	230
Coco Crisp	56	51	78
Alfonso Soriano	90	93	133

Billy Hamilton projects to be a one-category stud in any system that ranks stolen bases, but many people doubt whether he’ll be an especially good ballplayer in 2014, and the points system shares their skepticism. Carlos Santana will benefit enormously from any league using deeper measures than AVG, while Adam Dunn jumps from irrelevance to potential rosterability in OBP leagues only. A couple more notable players: Alex Rios is vastly more valuable in leagues with the standard five categories, and least valuable in points league, and Adam Jones follows a very similar, if somewhat less drastic, pattern.

And there you have it – the results of one approach to generating player values for leagues with alternative categories.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG