Z-Scores in Sports (a Supporting Argument for zDefense)

by Walter King

April 8, 2015

This is part 3 of the Player Evaluator and Calculated Expectancy (PEACE) model, which is an alternative to Wins Above Replacement. This article will introduce evidence that z-scores can be converted into runs (or points in other sports) with accuracy and reliability, as well as analyze the results that zDefense has produced.

Recall that zDefense is broken down into 4 components: zFielding, zRange, zOuts, and zDoublePlays. The fielding and range components depend on the accuracy of Calculated Runs Expectancy, which I introduced in Part 1. Outs and double plays, though, use a different technique: they take z-scores for the relevant rate statistics, then multiply by factors of playing time. Here were the equations:

zOuts = [(Player O/BIZ – Positional O/BIZ) / Positional O/BIZ Standard Deviation] * (Player Innings / Team Innings) * (√ Player BIZ / 2)

zDoublePlays = [(Player DP/BIZ – Positional DP/BIZ) / Positional DP/BIZ Standard Deviation] * (Player Innings / Team Innings) * (√ Player BIZ / 2) * Positional DP/BIZ

We can set up models in other sports that estimate point differentials using very similar techniques. I’ve developed one for college football and another for the NBA.

For the first model, I’ve used the data for every Division I FBS football team from 2000-2014 (1,802 teams), and I defined the relevant statistics and their “weights” as such:

zPassing = [[Completion Percentage z-score * Completions per Game] + [Passing Yards per Attempt z-score * Passing Attempts per Game]] / 10
zRushing = [Rushing Yards per Attempt z-score * Rushing Attempts per Game] / 10
zTurnovers = [Turnovers per Game z-score]
zPlays = [Number of Offensive Plays per Game z-score]

These 4 components summed make up zOffense, while taking each team’s opponents’ calculations results in zDefense.

What I found after summing the different components was that the resulting number, when divided by the number of games played, was a very accurate estimator for a team’s average point differential.

Among the nearly 2,000 college football teams, the average difference between zPoints (calculated margin of victory) and actual MOV was just 3.21 points, with a median of 2.77, and a max difference of 13.97 points. About 20% of teams’ MOV were calculated to within 1 point or less, 53% were accurate to 3 points or less, 79% to 5 points or less, and 99% to 10 points or less. The regression model for this dataset can be seen below:

http://imgur.com/kUDwbA7

The NBA model has similar results using 6 parts:

z3P (3-point shots) = [[3P FG% z-score * 3-point attempts * 3] / 10
z2P (2-point shots) = [2P FG% z-score * 2-point attempts * 2] / 10
zFreeThrows = [FT% z-score * free throw attempts] / 10
zTurnovers = [Turnovers per Minute z-score * League Average Points per Possession] * 2
zORB (offensive rebounds) = [Offensive Rebounds per Minute z-score * League Average Points per Possession]
zDRB (defensive rebounds) = [Defensive Rebounds per Minute z-score * League Average Points per Possession]

Similar to the football model, these 6 components make up zOffense, while each team’s opponents’ calculations make zDefense. I particularly like z3P, z2P, and zFT because they multiply the z-score by the “weight”: 1, 2, or 3 points. Recall that zRange is multiplied by the IF/OF Constant, which is just the difference, on average, in runs between balls hit to the outfield vs. balls that remain in the infield.

I’ve only done the calculations for the 2013-2014 season, where teams averaged 1.033 points per possession. To convert to zPoints in this model, add zOffense and zDefense, then divide by 5.

In most seasons, elite teams will have an average point differential of +10, while terrible ones will hover around -10. On average, the NBA model had an average difference between the calculated and actual differential of just 1.331 points, with a median of 0.800. 17 out of 30 teams were calculated within 1 point, 25 within 2, and 29 out of 30 were accurate to within 5 points per game.

The fact that these models can be created using the same general principle (rate statistic z-scores multiplied by a factor of playing time equates relative points) provides some evidence that similar results are calculable in baseball. This is the basis for zDefense in PEACE. Let’s look at the results.

Most sabermetricians would turn to the Fielding Bible Awards for a list of the best fielders by position in any given year, so we’ll use those results to compare. If we assume that the Fielding Bible is accurate, then we would expect zDefense to produce similar conclusions. Comparing the 2014 winners to the players ranked as the best at their position by zDefense, we can see some overlap. The number in parentheses is the positional ranking of the Fielding Bible Award winner by zDefense.

Position: Fielding Bible Winner (#)…zDefense Winner

C: Jonathan Lucroy (12)…Yadier Molina
1B: Adrian Gonzalez (1)…Adrian Gonzalez
2B: Dustin Pedroia (2)…Ian Kinsler
3B: Josh Donaldson (2)…Kyle Seager
SS: Andrelton Simmons (8)…Zack Cozart
LF:Alex Gordon (1)…Alex Gordon
CF: Juan Lagares (3)…Jacoby Ellsbury
RF: Jason Heyward (1)…Jason Heyward
P: Dallas Keuchel (5)…Hisashi Iwakuma

The multi-position winner, Lorenzo Cain, was also rated very favorably by zDefense. While most positions don’t have a perfect match, every single Fielding Bible winner was near the very top of their position for zDefense. This is the case for almost every instance, which isn’t surprising: if there were drastic disagreements about who is truly elite, then we would suspect one of the metrics to be egregiously inaccurate. Instead, we see many similarities at the top, which provides some solid evidence that zDefense is a valid measure.

As always, feel free to comment with any questions, thoughts, or concerns.

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Matt

10 years ago

Interesting article. I have used a similar approach for a custom scoring dynasty baseball league to rate players performance. The default rankings that are used by the site and do not reflect the players’ value under our custom scoring system. By ranking each scoring cat using population statistics I develop a composite player value based off each payers categorical Z score which I then use to assist in roster decisions and trades. Seems to work pretty well for me….

Nick

Before I throw out a flurry of questions, I would first like to acknowledge the time and effort applied to your cleverly-named metric. I am thoroughly impressed and excited about the potential of PEACE. Thank you for sharing this with the world. I’m not entirely sure I fully comprehend the methods used to derive each calculation, so please forgive any misconceptions.

I noticed a significant difference in the rank of catchers listed by the two systems. I was expecting something like this when reading the primer. Players in the field respond to situations that are much easier to analyze objectively. The dance between pitcher, catcher, batter and umpire is much more nuanced, so I would expect it to be harder to quantify. However, attributing wild pitches to the catcher seems unfair to me, even though an elite catcher can prevent wild pitches from even appearing in the box score. With the majority of action between the pitcher and catcher, have you seen anything that could refine the model even further? Is pickoff data assigned to the individual fielder, or is it a general PK that leaves you to question who made the throw?

LuCroy is revered for his ability to increase called strikes. Is there any pitch-framing correlation you’ve been able to identify without having to look at PitchF/X data? Regarding pitchers, could a similar method be applied to individual pitch types to weigh a CRE equivalent to each type for a given pitcher? It feels like I’m peeking into a very deep rabbit hole here. Again, thanks for igniting my imagination. I’m excited to see how this information is applied in the future.

Walter King

Reply to Nick

Thanks for your response. You bring up a really good point about crediting wild pitches to catchers rather than pitchers. I’ll definitely look into fixing that as I move forward. As far as pickoffs, I currently credit those to the pitcher; I’m not familiar with data sets that separate pickoffs into which base (or more importantly, who applied the tag) the play occurred.

I also have yet to examine pitch framing or PitchF/X data. At the moment, the model only considers pitch blocking and management of baserunners by catchers, but it would be very interesting to include a component of pitch framing.

As far as pitch types, it would probably require a very precise and detailed data set, one that isolates batting and baserunning statistics based on the pitch thrown at the start of each play. Maybe someday we’ll have automated sabermetricians to track these types of things!

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG