How Well Did the FanGraphs Playoff Odds Work?

by Sean Dolinar

September 30, 2014

One of the more fan-accessible advanced stats are playoff odds [technically postseason probabilities]. Playoff odds range from 0% – 100% telling the fan the probability that a certain team will reach the MLB postseason. These are determined by creating a Monte Carlo simulation which runs the baseball season thousands of times [10,000 times specifically for FanGraphs]. In those simulations, if a team reaches the postseason 5,000 times, then the team is predicted to have a 50% probability for making the postseason. FanGraphs runs these every day, so playoff odds can be collected every day and show the story of a team’s season if they are graphed.

Above is a composite graph of the three different types of teams. The Dodgers were identified as a good team early in the season and their playoff odds stayed high because of consistently good play. The Brewers started their season off strong but had two steep drop offs in early July and early September. Even though the Brewers had more wins than the Dodgers, the FanGraphs playoff odds never valued the Brewers more than the Dodgers. The Royals started slow and had a strong finish to secure themselves their first postseason birth since 1985. All these seasons are different and their stories are captured by the graph. Generally, this is how fans will remember their team’s season — by the storyline.

Since the playoff odds change every day and become either 100% or 0% by the end of the season, the projections need to be compared to the actual results at the end of the season. The interpretation of having a playoff probability of 85% means that 85% of the time teams with the given parameters will make the postseason.

I gathered the entire 2014 season playoff odds from FanGraphs, put the predictions in buckets containing 10% increments of playoff probability. The bucket containing all the predictions for 20% means that 20% of all the predictions in that bucket will go on to postseason. This can be applied to all the buckets 0%, 10%, 20%, etc.

Above is a chart comparing the buckets to the actual results. Since this is only using one year of data and only 10 teams made the playoffs, the results don’t quite match up to the buckets. The desired pattern is encouraging, but I would insist on looking at multiple years before making any real conclusions. The results for any given year is subject to the ‘stories’ of the 30 teams that played that season. For example, the 2014 season did not have a team like the 2011 Red Sox, who failed to make the postseason after having a > 95% playoff probability. This is colloquially considered an epic ‘collapse’, but the 95% probability prediction not only implies there’s chance the team might fail, but it PREDICTS that 5% of the teams will fail. So there would be nothing wrong with the playoff odds model if ‘collapses’ like the Red Sox only happened once in a while.

The playoff probability model relies on an expected winning percentage. Unlike a binary variable like making the postseason, a winning percentage has a more continuous quality to the data, so this will make the evaluation of the model easier. For the most part most teams do a good job staying around the initial predicted winning percentage coming really close to the prediction by the end of the season. Not every prediction is correct, but if there are enough good predictions the predictive model is useful.

Teams also aren’t static, so teams can become worse by trading away players at the trade deadline or improve by acquiring those good players who were traded. There are also factors like injuries or player improvement, that the prediction system can’t account for because they are unpredictable by definition. The following line graph allows you to pick a team and check to see how they did relative to the predicted winning percentage. Some teams are spot on like the Pirates, but there are a few like the Orioles which are really far off.

The residual distribution [the actual values – the predicted values] should be a normal distribution centered around 0 wins. The following graph shows the residual distribution in numbers of wins, the teams in the middle had their actual results close to the predicted values. The values on the edges of the distribution are more extreme deviations. You would expect that improved teams would balance out the teams that got worse. However, the graph is skewed toward the teams that become much worse implying that there would be some mechanism that makes bad teams lose more often. This is where attitude, trades, and changes in strategy would come into play. I’d would go so far to say this is evidence that soft skills of a team like chemistry break down.

Since I don’t have access to more years of FanGraphs projections or other projection systems, I can’t do a full evaluation of the team projections. More years of playoff odds should yield probability buckets that reflect the expectation much better than a single year. This would allow for more than 10 different paths to the postseason to be present in the data. In the absence of this, I would say the playoff odds and predicted win expectancy are on the right track and a good predictor of how a team will perform.

Evaluating the Eno Sarris Pitcher Analysis Method

Cardinals — Dodgers NLDS Preview

I build things here.

10 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

rusty

11 years ago

Two methodological thoughts:
– It looks like you took each day’s Playoff Odds number — the residuals for each team won’t be independent (e.g. the Royals sitting in the 20% bucket for much of the season) because they’re based in part on a team performance projection. More years of data should reduce the standard error, but any test statistic should take this into account.
– In looking at the difference between predicted and actual wins, I don’t understand why the difference isn’t centered on 0 — I assume fangraphs adjusts their projections so that the total pool of wins adds up to 2430, which should match the real-life total.

In any case, following up to determine the success of a predictive model is a very important part of the work, so kudos to you for getting started!

Sean DolinarFanGraphs Staff

11 years ago

Reply to rusty

I took each day’s playoff odds and gathered them together. Ideally, I’d like to look at the number from a certain point in the season to make sure that the predictive power would be uniform through out the season (which I doubt it is). This is had a lot of teams surge late in the season like the O’s, KC, and PIT. And MIL/ATL fail to make the playoffs. If we had a season like 2013, where there wasn’t as many swings, it might fit better.

The ‘residuals’ aren’t really residuals because I’m comparing the percentage of teams that made the playoffs to the bucket value. I need to learn some more rigorous methodology.

D Myers

11 years ago

Reply to Sean Dolinar

The Orioles did not surge “late in the season”. They started with the lead before the all-star break and completed a successful west coast swing (6-4 vs. A’s, Angels and Mariners) immediately after. July is, to my mind, not “late in the season”.

Sean DolinarFanGraphs Staff

11 years ago

Reply to D Myers

I’ll admit that ‘surge’ and ‘late in the season’ aren’t well defined terms. Their playoff odds weren’t very high at the beginning of the season. Sustained second-half success might be a better way to describe what the O’s did. Their record consistently improved since July.

My overall point was that the O’s, Royals, and Pirates had a lot of days were their playoff odds were not very high which affects the evaluation of playoff odds.

11 years ago

Reply to Sean Dolinar

What if you split the season into three two-month buckets:

April + May – when the pre-season predictions and roster composition still carry a lot of weight

June + July – when actual results start to change the odds most significantly (actual W/L, injuries, etc.)

August + September – when teams have decided whether they are “in”, “out”, or “undecided” about making roster moves aimed at the postseason.

11 years ago

Reply to tz

By the way, how did you gather the daily playoff odds?

Was there a single place on the site having all this, or did you have to cut and paste the updated odds daily?

Sean DolinarFanGraphs Staff

11 years ago

Reply to tz

I scraped them off all the individual pages using a Python script Sunday night after all the games were over. FanGraphs has a previous day/calendar option to see the playoff odds for every day. But I tested to see if this was worth pursuing by doing exactly what you suggested.

I could do the same thing for BP by collecting daily, but I’d have to decide that before the season.

The two-month bucket idea would probably be a good approach to see the difference over time. Especially because teams aren’t completely random, they change by rational decisions at the trade deadline, etc. I would think you would see a lot more 10%-60% at the beginning of the season vs a lot of 0%-5% or 90%-100% at the end of the season.

D Myers

11 years ago

Oh… somebody at FanGraphs (I haven’t been able to find the article) pointed out before the season started that the Orioles had the toughest schedule. Interesting…

DavidKB

11 years ago

This is a nice look at the success of the playoff odds. One hypothesis on the skew is that teams who are out of the running in late season may underperform because there is nothing to perform for, whereas teams that are in the hunt will try to maximize their wins. For the bottom-most teams there is even the perverse incentive of an earlier draft pick, though that’s somewhat less impactful in baseball than in other sports.

Mike Pozar

11 years ago

The bucketing is a great way of evaluating the quality of the playoff probabilities. A couple other thoughts though:
– The residual distribution can wind up looking close to a normal distribution regardless of how good the model is; you could predict every team to go 81-81 and still have a normal distribution for the residual. The goal is to minimize the difference between actual and predicted (e.g. minimize standard deviation or RMSE) as much as possible; as we get better at modeling/predicting teams’ records, we should expect the range to get smaller.
– It’s quite a leap to say the skew is evidence of team chemistry breaking down. This is much more likely due to teams being sellers at the deadline, calling up prospects and giving them more playing time, shutting down regulars with small injuries, etc.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG