Fun with Game Score: xW, xL, and xND

by Chris Jeske

February 27, 2016

Game Score was first published in the 1988 Bill James Baseball Abstract as Bill James’ “annual fun stat.” Although the stat was created by one of the most prolific sabermetricians of all time and is now published in most box scores, it hasn’t been widely adopted for use in sabermetric analysis and instead remains mostly a stat that is “fun to play around with,” as James wrote 28 years ago.

Generally the game score metric makes it into headlines on two distinct occasions: first, if a pitcher exceeds a score of 100, due to how rare it is for this to occur; and second, as a means to compare whether a no-hitter or perfect game was more or less dominant than other no-hitters or perfect games throughout MLB history.

There are a few examples over the years of sabermetricians using game score as more than simply a “fun stat,” but these are few and far between. While the weights are indeed mostly arbitrary and it is based on simple counting stats, there is value in the simplicity of Bill James’ version of game score. The simplicity is two-fold: first, game score is easy to calculate; and second, it essentially converts a starting pitcher’s box score into a single number.

GmSc = 50 + 1 * outs recorded + 2 * innings completed after the 4^th + 1 * strikeouts – 2 * hits allowed – 4 * earned runs allowed – 2 * unearned runs allowed – 1 * walks

The results of this formula are as follows: values approaching 100 are outstanding, values around 50 are average, and values approaching 0 are terrible. In rare cases a game score can exceed the 0 and 100 extremes, but it is designed to rate the quality of a start on essentially a 0-100 scale.

In the following analysis I collected game score data for all games started in the six seasons from 2010 to 2015 and calculated the percentage of times each game score value resulted in a recorded win, loss, or no decision for the starting pitcher. These values serve as the inputs for a basic formula that can be used to calculate expected wins, losses, and no decisions.

Designing xW, xL, and xND weights for each GmSc

I pulled all GmSc data as well as the starting pitcher’s recorded W, L, or ND for each game from 2010 through 2015. There were 14,579 games in this six-year time frame and 29,158 Game Scores recorded (one for each starting pitcher, so two per game).

I then calculated the total wins, losses, and no-decisions for each game score value (0 to 100) and divided the total wins, losses, and no decisions for each game score by the total times each game score was recorded to get the win, loss, and no-decision percentage for each game score. To smooth the results, I applied three-median smoothing once and hanning five times.

This resulted in the values listed below, which are the expected win, loss, or no-decision percentage that will be applied to each game score value.

Link to spreadsheet of GmSc xW, xL, xND smoothed weights

The actual results closely match what we would expect: higher game scores result in a high likelihood that the starting pitcher records a win; lower game scores result in a high likelihood that the starting pitcher records a loss; and average game scores have roughly an equal chance to result in a win, loss, or no-decision.

To calculate a starting pitcher’s expected win, loss, and no-decision percent, it simply requires averaging the expected win, loss, and no decision percent for each game score value that pitcher recorded. The chart below shows what this would look like for a pitcher with three starts and game score values of 57, 65, and 28.

Game	GmSc	xW Pct	xL Pct	xND Pct
1	57	.402	.244	.354
2	65	.567	.133	.298
3	28	.037	.743	.223
Avg (xPct)		.335	.373	.292

Using these expected percentages, it is easy to calculate each starter’s expected wins, losses, and no-decisions by multiplying the average (expected percentage) by the number of games started. For the example above, 3 x .335 = 1.0 xW, 3 x .373 = 1.1 xL, and 3 x .292 = .9 xND.

Below is a table of all starting pitchers with at least 10 games started in 2015. You can sort by the difference to evaluate the luckiest and unluckiest starters in terms of wins and losses. Darker red shading indicates a pitcher was lucky and darker blue indicates a pitcher was unlucky. You will need to download or copy the data to be able to manipulate it on your own.

Link to spreadsheet of 2015 SP GmSc xW, xL, xND

The Lucky

Collin McHugh outperformed his expected wins more than any other starter on both a percentage and counting basis. His actual record of 19-7 was much better than his expected record of 12-10.
Nathan Eovaldi was among the luckiest starters in outperforming both his expected wins and expected losses. His actual record of 14-3 was much better than his expected record of 6-7.
Drew Hutchison, like Eovaldi, was among the luckiest starters in outperforming both his expected wins and expected losses. His actual record of 13-5 was much better than his expected record of 8-13.
Michael Wacha outperformed his expected wins with the fifth highest win percentage difference. His actual record of 17-7 was much better than his expected record of 12-9.
Colby Lewis also outperformed his expected wins. His actual record of 17-9 was much better than his expected record of 12-12.

The Unlucky

Chris Bassitt was the unluckiest starter in all of baseball on a rate basis last year as he led all starters in win percentage difference and loss percentage difference. In 13 starts, Bassitt had a record of 1-8 which was much worse than his expected record of 5-4.
Shelby Miller was the unluckiest starter on a counting basis as he significantly underperformed both his expected wins and expected losses. His actual record of 6-17 was much worse than his expected record of 13-10.
Corey Kluber was nearly as unlucky as Miller and had the highest difference between his actual losses and expected losses of all starters. His actual record of 9-16 was much worse than his expected record of 15-8.
Scott Kazmir was among the unluckiest starters in underperforming wins. His actual record of 7-11 was much worse than his expected record of 12-10.
Max Scherzer was among the unluckiest starters in underperforming losses. His actual record of 14-12 was much worse than his expected record of 18-7.
Jesse Chavez was also among the unluckiest starters in underperforming losses. His actual record of 7-15 was much worse than his expected record of 9-10.
Three potential fantasy sleepers also appear near the top of the unlucky list. In 16 starts, Raisel Iglesias had a 3-7 record compared to his expected record of 7-5. In 17 starts, Kevin Gausman had a record of 3-7 compared to an expected record of 6-6. Lastly, in 20 starts, Justin Verlander had a record of 5-8 compared to an expected record of 9-6.

Other Outliers

Ivan Nova took a decision in all 17 of his starts to finish with an actual record of 6-11 compared to his expected record of 5-7.
Chase Anderson was nearly the opposite of Nova as he recorded a decision in only 12 of his 27 starts with an actual line of 6-6 compared to his expected record of 9-10.
Kyle Hendricks also recorded seven more no decisions than expected. In 32 starts, his actual record of 8-7 compared to an expected record of 12-11.

The Closest Match

Jordan Zimmermann was the starter whose average win percent, loss percent, and no decision percent had the smallest absolute difference, giving him the dubious distinction of being this system’s most accurately evaluated starter. His actual line of 13-10 matches favorably to his expected line of 13-11. It looks even closer when looking at decimal values: actual wins 13, expected wins 12.6; actual losses 10, expected losses 10.5; actual no decisions 10, expected no decisions 9.9.

If you are interested in how well the xW, xL, and xND percentages correlate year to year, the answer is not well at all. Comparing the expected win and loss percent in year one to the actual win and loss percent in year two shows practically no correlation. The expected percentages and the calculations used above are much more useful when relegated to evaluating past performance.

That said, there is one way that the expected percentages are predictive. In all cases I looked at over the past three years, the outliers (both lucky and unlucky) regressed toward the mean in such a way that no one showed up on the same over or underperforming list two years in a row. Thus, the featured lucky pitchers (in gaining extra wins or avoiding deserved losses) will be hard-pressed to match their luck again this year while the unlucky players (in gaining extra losses or avoiding deserved wins) should fare better this year.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG