Testing the Eye Test: Part 1

by vslyke

April 2, 2015

As long as I can remember, I’ve been a fan of good defense. Growing up my favorite player was Andy Van Slyke, and as a Braves fan I’ve had the privilege of rooting for defensive wizards such as Greg Maddux, Andruw Jones, and now Andrelton Simmons. Advanced defensive statistics are one of the things that drew me into sabermetrics and I spend entirely too much time obsessing over pitch framing.

Foremost among the new wave of statistics is UZR, Ultimate Zone Rating, which is the metric that is used to calculate the defensive portion of fWAR. In addition, Fangraphs also carries DRS and FSR, or Fans Scouting Report. While UZR is my preferred metric, I’ve always been intrigued by FSR. After all, I pride myself on my knowledge of the defensive ability of players on my favorite team and it makes sense to me that there is a wide population that has a pretty good idea of the quality of Chirs Johnson’s defense (namely, that it sucks but improved a lot in 2014).

I decided to take a look at the correlation between a player’s FSR and the components of his UZR (ARM, DPR, RngR, and ErrR, as well as total UZR). For this exercise, I pulled the defensive stats of every player who qualified (minimum of 900 innings) at a position from 2009-2014 (FSR data is only available for those 6 seasons on Fangraphs). I then disregarded catchers, as UZR does not cover the position. Likewise, pitchers are left out because they are not covered by UZR or FSR. That left me with 761 player seasons across the other seven positions. Here’s the correlations between FSR and UZR and its components for those seven positions:

Position |#    |ARM |DPR |RngR |ErrR   |UZR
1B           |118 |N/A   |0.213 |0.285 |0.320 |0.396
2B          |117 |N/A   |0.159 |0.470 |0.547 |0.637
3B          |107 |N/A   |0.154 |0.632 |0.261 |0.673
SS           |130|N/A   |0.363 |0.428 |0.344 |0.592
LF           | 71 |0.510 |N/A |0.526 |0.186 |0.664
CF           |115 |0.237 |N/A |0.493 |0.071 |0.548
RF           |103|0.214 |N/A |0.541 |0.067 |0.613

There’s a lot to look at there, but first let me draw your attention to one fact: UZR has a higher correlation for every position than any one of its components at the same position. That’s a big plus for FSR, as it shows the fans don’t get so caught up in one area of a position to ignore how it fits into the whole. It also runs counter to my expectations, as I expected the fans to strongly favor players who avoided making errors (as it seems the voters of the Gold Gloves do). Instead, the component that averages the strongest correlation is range, with ARM (which is only calculated for outfielders) a distant second. Errors only beat out double play runs, which is an indication of how informed fans have moved from using errors as the primary way to evaluate defense. Indeed, errors had a strongest correlation of any component at only two positions: 1B and 2B. Further, errors had an extremely weak correlation with FSR in the outfield, with CF and RF featuring almost no relationship at all.

I was also struck by how strong the correlation between FSR and UZR was at every position. With the exception of 1B, every position’s correlation between the two metrics was above .5, with four of the seven positions above .6. The correlation between FSR and UZR was strongest at 3B, with LF a close runner up. 3B also features the strongest correlation between FSR and a component of UZR – in this case, RngR – and the smallest gap between UZR and one of its components. This finding surprised me, as I typically picture range as a CF tracking down a fly ball hit far over his head. Indeed, the average correlation between RngR and FSR is higher in the OF (0.520) than in the IF (0.454) despite the strength of the correlation at 3B.

I was also surprised to see the strongest correlation between ARM and FSR in LF, not RF which is typically known as the haven for strong arms. I have two theories to explain this incongruity: the first is that this simply is a small sample quirk. The other is that the selection bias for RF creates a situation where the distribution between the strongest and weakest arms is simply too small to make a significant difference in the data. Indeed, the range between the highest ARM in RF (Jeff Francoeur’s 9.7 in 2010) and lowest (Curtis Granderson’s -7.4 in 2014) was approximately 3 runs smaller than the difference in LF between Yoenis Cespedes’ 2014 (12.4) and Ryan Braun’s 2010 (-7.9).

Overall, this shows the strength of FSR. While its certainly not the same as UZR, the correlations are strongest between total UZR and FSR, and the components with the strongest correlations appear to generally be appropriate for the position. In Part 2, I will examine which components are over or under-emphasized by FSR.

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

suicide squeezeMember since 2016

10 years ago

Good findings. One thing that crosses my mind is that the FSR ratings may very well be influenced by UZR (the FSR is hosted by Tango after all, so a lot of the participants are probably sabermetrically aware). Maybe you could try to factor in the previous three years worth of UZR to see if that (in part) explains some of the high correlations.

vslykeMember since 2020

Reply to suicide squeeze

Thanks! And yes, I was wondering the same myself. I’m not sure how I would go about factoring in previous years of UZR but its something to consider.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG