Applying Peta’s Wagering Methodology in 2020
For those unfamiliar with Joe Peta’s groundbreaking 2013 book Trading Bases, the author is a successful financial analyst and former Wall Street trader. Seriously injured in a traffic accident, Peta’s long and painful recovery included employing his professional skills to develop a baseball wagering methodology. His book is about more than that though, including observations about the 2008 economic meltdown and sports wagering writ large. Peta’s anecdotes alone make it worth the read — imagine being hit by a NYC ambulance and then being billed by the city for the ride to the hospital.
At its highest level, the Peta methodology is based on the utilization of a team’s previous season performance adjusted for cluster luck (a regression of OBP/SLG/ISO to arrive at “hits per run”) and WAR, as well as upcoming-season projected WAR. Arriving at an estimate of a team’s season win total, it is then used to identify and capitalize on inefficiencies between the model’s estimates and wagering lines.
Peta’s work produces two products: a season-long projection of wins (the long game) and the ability to handicap individual games through adjustments to each team’s lineup, starting pitcher, and home field. While conceptually straightforward, it is time-consuming to operate, requiring familiarity with Excel (particularly the ability to link sheets). In lieu of Peta’s regression calculation of cluster luck, I utilized FanGraphs’ calculation of BaseRuns, convinced of its utility as a proxy after reading a 2019 article at samkonmodels.com arguing it was one of a number of comparable and readily available such calculations.
The Model’s 2020 Season
When MLB management and labor finally got their collective act together, the model’s projection of team wins was compared to the Vegas line just prior to Opening Day. This is shown in the table below, as well as each team’s strength of schedule:
Team (SOS) | Model Wins | FanDuel | Dif |
---|---|---|---|
Angels (5) | 31.0 | 31.5 | -0.5 |
Astros (19) | 35.5 | 35.5 | 0.0 |
Athletics (18) | 32.6 | 33.5 | -0.9 |
Blue Jays (8.5) | 29.0 | 27.5 | 1.5 |
Braves (16) | 33.2 | 33.5 | -0.3 |
Brewers (20) | 31.2 | 30.5 | 0.7 |
Cardinals (27) | 31.4 | 31.5 | -0.1 |
Cubs (23) | 30.6 | 31.5 | -0.9 |
D-backs (8.5) | 29.1 | 31.5 | -2.4 |
Dodgers (26) | 37.1 | 37.5 | -0.4 |
Giants (4) | 26.3 | 25.5 | 0.8 |
Indians (29) | 31.4 | 33.5 | -2.1 |
Mariners (2) | 26.1 | 24.5 | 1.6 |
Marlins (3) | 27.1 | 24.5 | 2.6 |
Mets (12.5) | 29.8 | 32.5 | -2.7 |
Nationals (21.5) | 31.9 | 33.5 | -1.6 |
Orioles (1) | 23.2 | 20.5 | 2.7 |
Padres (12.5) | 29.7 | 30.5 | -0.8 |
Phillies (11) | 29.3 | 31.5 | -2.2 |
Pirates (10) | 25.6 | 25.5 | 0.1 |
Rangers (7) | 27.6 | 28.5 | -0.9 |
Rays (21.5) | 33.7 | 33.5 | 0.2 |
Red Sox (14.5) | 30.0 | 30.5 | -0.5 |
Reds (25) | 31.5 | 31.5 | 0.0 |
Rockies (6) | 28.2 | 26.5 | 1.7 |
Royals (17) | 25.8 | 24.5 | 1.3 |
Tigers (14.5) | 25.0 | 21.5 | 3.5 |
Twins (30) | 32.8 | 34.5 | -1.7 |
White Sox (28) | 30.3 | 31.5 | -1.2 |
Yankees (24) | 33.9 | 37.5 | -3.6 |
Total | 900 | 906 |
Opening Day Concerns
In writing this article, it was instructive to look back at my notes from the previous July. At the time, there were several concerns:
- The methodology is built around WAR. Projected WARs had to be adjusted to a 60-game season, essentially cutting the original WARs down to 37%. How would they hold up?
- COVID impacted more than the total number of games. Articles from both Sports Illustrated and The Athletic highlighted the variance in each division’s strength of schedule. For example, the SI piece noted the Orioles as having the league’s most difficult schedule, as seen in the table above.
- Finally, will there be serious performance impacts for players who contracted COVID? Recall that Nick Markakis initially decided to opt out after hearing just how seriously sick his friend Freddie Freeman had been.
Ultimately, I decided the last thing I needed to do was start “thinking for myself.” It is one thing to use another identified calculation like BaseRuns to account for cluster luck, but it’s quite another to start tinkering to account for strength of schedule or performance impacts following contracting COVID. Instead, I addressed these concerns through conservative application of the model.
Applying the Numbers
In a 162-game season, Peta suggests that season win totals deviating from the line by more than four games represented “unrepeatable results” and therefore were worth a wager. In a 60-game season, that would be a deviation of 1.48. The concerns outlined above made my wagering approach even more conservative, so I only looked at teams with deviations greater than two. That left eight clubs:
Team | Proj | FanDuel | Dif |
---|---|---|---|
D-backs | 29.1 | 31.5 | -2.4 |
Indians | 31.4 | 33.5 | -2.1 |
Marlins | 27.1 | 24.5 | 2.6 |
Mets | 29.8 | 32.5 | -2.7 |
Orioles | 23.2 | 20.5 | 2.7 |
Phillies | 29.3 | 31.5 | -2.2 |
Tigers | 25.0 | 21.5 | 3.5 |
Yankees | 33.9 | 37.5 | -3.6 |
Two of the outliers, the Tigers and Yankees, are good examples of the interplay between WAR and BaseRuns, arriving at win projections starkly different from those in Vegas:
- The 2020 FanGraphs Opening Day WAR suggested that the Detroit Tigers would hit better than in 2019, combined with what seemed to be a terrible case of offensive cluster luck that had cost them 50 runs.
- Conversely, the New York Yankees’ 2020 FanGraphs Opening Day WAR suggested only a modest increase from 2019 (improved pitching offset by a loss in offense), with the team’s overall 2019 cluster luck having benefitted them to the tune of 70 runs.
Long-Term Model Performance
Money management is a key part of Peta’s methodology. Drawing from the work of thoroughbred handicapper Andrew Beyer, Peta suggests reserving 10% of a bankroll for the long plays. Using a total bankroll of $2,000, we arrive at eight bets on the outliers at $25 each. As seen in the table below, those eight wagers would have resulted in seven wins and a profit of 65% on the year.
Team | Proj | FanDuel | Dif | Result | Bet | Result | Odds | $$ |
---|---|---|---|---|---|---|---|---|
D-backs | 29.1 | 31.5 | -2.4 | 25 | Under | W | -105 | $ 23.81 |
Indians | 31.4 | 33.5 | -2.1 | 35 | Under | L | -115 | $ (25.00) |
Marlins | 27.1 | 24.5 | 2.6 | 31 | Over | W | 105 | $ 26.25 |
Mets | 29.8 | 32.5 | -2.7 | 26 | Under | W | -110 | $ 22.73 |
Orioles | 23.2 | 20.5 | 2.7 | 25 | Over | W | 105 | $ 26.25 |
Phillies | 29.3 | 31.5 | -2.2 | 28 | Under | W | -120 | $ 20.83 |
Tigers | 25 | 21.5 | 3.5 | 23 | Over | W | -165 | $ 15.15 |
Yankees | 33.9 | 37.5 | -3.6 | 33 | Under | W | -120 | $ 20.83 |
130.85 |
A Daily Grind
While we have identified deviations in sportsbooks, those investments would take months to come to fruition. As the season unfolded, I applied those projections to individual games with the goal of uncovering daily opportunities created by the difference between the model’s projections and the oddsmakers’ lines.
This is a capital accrual methodology, the complete antithesis of what Danny Ocean said in Ocean’s Eleven:
“Play long enough, you never change the stakes, the house takes you. Unless, when that perfect hand comes along, you bet big, and then you take the house.”
If you can never wager against your favorite team, this methodology may not be for you. You must be prepared to bet on the team that does not necessarily have the better chance to win. This money management follows the tenets of the Kelly Criterion, used with great success by thoroughbred handicappers such as Beyer and Mark Cramer. The basic concept the wager amount increases only in relation to the perceived advantage.
The Trading Bases Methodology (Individual Game Version)
Recall that the Peta methodology is based on the utilization of a team’s previous season performance adjusted for cluster luck, WAR, and the team’s upcoming season projected WAR. Peta’s work allows for a projection of total season wins as well as the capability to handicap individual games through adjustments to each team’s lineup, starting pitcher, and home field.
For example, currently FanGraphs projects (Depth Charts) Clayton Kershaw to make 29 starts with a WAR of 3.7. But what if Kershaw could start all 162 games? He’d be worth more than 20 WAR! A similar calculation can also be made for hitters. Corey Seager is currently projected to play 151 games with a WAR of 5.4. Projecting him to play every game would increase his WAR to 5.8 on the year.
This calculation, plus a 4% advantage to the home club, produces a win probability for each team which — when compared to the sportsbooks’ win probability and after converting the money line (ML) to its percentage equivalent — will uncover potential opportunities. Additionally, when utilizing a strategy based on the Kelly Criterion, the wager amount increases in relationship with the perceived advantage.
Applying the Numbers
This is a somewhat simplified description. There are also in-season adjustments accounting for what actually transpires as the season progresses (as adjusted for base runs). Let’s look at two examples from the 2020 season.
On September 5th, the Reds (Anthony DeSclafani) were set to face the Pirates (Trevor Williams) at PNC Park. With over half the season in the books, the model’s projection of each team’s total runs scored and runs allowed for the year has changed twice to account for reality and changes in team WAR.
On Opening Day, the Reds were projected to score 772 runs and allow 706 on the year, good for a win percentage of 54% (normalized to 31.5 wins). By September, that had changed to 724 runs scored and 633 runs allowed — a win percentage of 56%, or 33.6 wins. Meanwhile, the Pirates moved from being projected for 745 runs scored and 853 allowed on Opening Day (a 44% win percentage, normalized to 25.6 wins). Come September that projection had moved to 611 runs scored, 881 runs allowed, 34% of games won, and just 20 victories on the year.
If you made no other adjustments, a team with a 56% winning percentage would be expected to defeat a team with a 34% win percentage more than 70% of the times they face each others. However, as discussed above, the Peta model adjusts those numbers further to account for the lineup and home team. The lineup rolled out by Cincinnati on this day would, if played every day, have scored seven fewer runs over the course of the season while allowing 45 more. However, the Pirates lineup would have scored an additional 22 runs while allowing just an additional three. This reduces the Reds’ projected winning chance from 70% to 67%. Folding in the fact that the Pirates are the home team, and the model’s final assessment gives the Reds a 64% probability of winning.
The money line also made the Reds the favorite, just less of one than the model. Given these projected odds and the ML, applying the money management principles discussed above results in a 40-basis point wager (four-tenths of 1% of bankroll) on the Reds, who do win the game, 6-2. Take note in the table that the model’s line adds to 100%, while the ML adds up to 104%, the difference being the book’s take.
Teams | Model Win % | ML | Difference |
---|---|---|---|
Reds | 64 | 57 | 7 |
Pirates | 36 | 47 | -9 |
The model allows the handicapper to hunt for advantage, regardless which team has the better chance to win. Let’s consider this game on September 8th between the Rays (Ryan Yarbrough) and the the Nationals (Aníbal Sánchez) in Washington:
Teams | Model Win % | ML | Difference |
---|---|---|---|
Rays | 52 | 60 | -8 |
Nats | 48 | 42 | 6 |
The ML made the Rays a heavy favorite, and the model favored them to win as well — just less so. However, while applying the money management principles discussed above resulted in another 40-basis point wager, this time it saw advantage in the underdog Nationals, who indeed won the game, 5-3.
Daily Performance
I ended up modeling 508 games (including playoffs) from the 2020 season. Approximately three-quarters of those games were playable, meaning there was a perceived advantage between the model’s estimate and the money line. The overall win percentage came out to be 49%, with a profit of 9%, or $156 on a $1,800 bankroll.
Bankroll | End Amt | # Games | # Plays | Wins | Losses | No bet | Profit | Profit % | Win % |
---|---|---|---|---|---|---|---|---|---|
$1,800 | $1,956 | 508 | 380 | 188 | 192 | 128 | $156.00 | 9% | 49% |
Candidly, the model was probably capable of even better performance, it just needed a smarter human. As discussed above, the model needs to be adjusted throughout the season to account for actual results. I flat out failed to adjust correctly at the end of the first quarter, and as a result it was the only quarter in which the model experienced a negative return. Backing out 2Q, the model’s performance increased to a win percentage of 52% and profit of 13%.
Practical Lessons
Here are some things I learned from this maiden effort:
- I thought I was ready – I wasn’t. Prior to the start of the 2020 season, I had not done near enough individual game practice utilizing 2019 contests. This made the opening weeks of the season a struggle. I also had not developed the results tracking sheet, which turned out to be an enormous undertaking, especially when you are struggling to find a rhythm to modeling as many as 15 games a day.
- Even when I found a groove in terms of churning out projections, it was still a grind. It turns out major league managers have an annoying habit of working to their schedule and not releasing their lineups in what I considered a timely manner. A secondary issue was each manager’s lineup tendencies. David Ross of the Cubs was like clockwork, as at least seven of his nine starters could be penciled in even before his lineup was posted. On the flip side, Kevin Cash of the Rays was a nightmare, offering a lineup mix-and-match adventure every day. I rooted for the Dodgers in the World Series in part because Cash’s daily lineup changes drove me nuts during the season.
- Projections were only part of the battle, as staying on top of the changing money line was its own challenge. It was important to be aware of each sportsbooks’ take — which can vary widely, as this January 2020 New York Post article illustrates. A 60-game season results in 900 wins, yet add up the FanDuel line and you get 907 wins. The worst example is the online system in the District of Columbia, GamebetDC, which has a whopping 8% take on individual baseball games. It’s impossible to find wagering opportunities under those circumstances.
Special thanks to FanGraphs, which I discovered after joining an Ottoneu league. I quickly became a member, and I could not have done this research without FG.
Phillies phan. Some people want to be astronauts, I want to win my fantasy league and not sports bet like a total putz.
Excellent work here. I personally had much less betting volume in the 2020 season (adjusted for a 60 game season) than in previous years. Doubleheaders, lineup variability, and sports books moving away from listed pitchers added a lot of uncertainty to modeling games. I was also more patient prior to adjusting my projections with in-season data; I.e. I relied on my projections longer and with higher weight.
Although my handicapping process differs from Peta’s, his methodology for converting runs for/runs against into win% for a game is my Rosetta Stone translation to compare to the money line price.
Thank you for the kind words. I consider FG and its readers to be a very savvy bunch so I was a little nervous about putting my work out there. I understand exactly what you are saying about 2020. In the end, I decided to skip 7 inning doubleheaders and it took a while for me to figure out how to address bullpen games.
I was an enthusiastic thoroughbred handicapper before three children came to dominate my weekends and I was surprised by just how much of the book’s money management drew directly from handicapping giants like Beyer and Cramer.
I took the “easy way out” and skipped doubleheaders as well as those oddball bullpen games. As far as I’m concerned, there’s no shame in passing when confidence is low!
i was mean
Bill, i’ve been using Peta’s methodology for betting purposes for a few years. It worked great in 2014-2015.
Let me know if you might be interested in collaborating for 2021. You are right in that it is a GRIND, and some managers are the worst in getting lineups out in a reasonable manner.
Thanks Scott. Will keep your offer in mind.
Outstanding work! Well researched and well written. Good luck in 2021. Keep us posted.
Great article – curious as to how any of you using Peta’s approach (i am a devotee) are converting last year’s 60-game sample size into a full season’s/162 games worth of WAR to do the year-over-year WAR change calculations. I’ve tried a few ways so far and the numbers just don’t seem to pass the eye test. Is converting the 60 game sample size to 162 the best we can do, or are we just stuck with challenging data this year beyond our control?
Thanks for the kind words. You are right, last year I noticed when the 60 game season was announced FG took its original WAR projections and simply multiplied them by .37. So here we are, another year making again needing to make a significant WAR adjustment year over year. For 2021 I have taken the team 2020 WAR, divided by 60 to arrive at a “WAR per game” and then multiplied again by 162.
As I mentioned in the article, last year I didn’t want to get too far away from the Peta model by making some sort of potentially dangerous adjustment of my own. But like you, I wasn’t completely comfortable with the required WAR adjustment. So I utilized the results more cautiously. Peta suggests teams with outliers greater than 4 from the Vegas line (in a 162 game season) are worth considering. I may require a larger number before betting.
BTW, I know what you mean about the eye test. The Dodgers look like a stupendous team this year. And I have run the #’s multiple times already looking for errors. and just can’t find one.
Twins 92 Yankees 105 Astros 92
White Sox 89 Blue Jays 91 Angels 85
Indians 86 Rays 90 Athletics 83
Royals 73 Orioles 79 Rangers 75
Tigers 56 Red Sox 76 Mariners 65
Braves 94 Brewers 83 Dodgers 114
Mets 86 Reds 81 Padres 97
Phillies 78 Cardinals 76 Giants 78
Marlins 67 Cubs 72 Diamondbacks 65
Marlins 67 Pirates 60 Rockies 60
Great stuff, Bill. Any thoughts on blending/weighting 2019 WAR into the model, or would that, as you noted, stray too far away and essentially become “guesswork”? Something that has really stood out to me so far is trying to calculate a full season’s worth of runs scored and cluster luck based on 60 games. The Astros immediately stood out as converting 60 games to 162 assumes them continuing to play roughly .500 ball the rest of the way, which seems “off” but without real data I guess we do the best we can.
Thanks again for a great article and for responding!
Though I enjoy the math behind Peta’s work, I can’t say I am much of a mathematician. One of the reasons I liked utilizing FG base runs as the source for determining cluster luck was in part my concern over calculating Peta’s triple regression. I hadn’t done regressions since the mid-80s and barely scraped by in the class. Besides, the nice folks at FG are kind enough to calculate base runs for you everyday.
So I do admire folks like you who can blend and weight WARs across two seasons, but I doubt my own ability to calculate it. And as you noted, I don’t personally want to stray too far from the model. I will say that despite my Opening Day concerns, the model held up last season working reverse (going from 162 to 60).
After reading your note I did another season run and the Astros currently project at 92 games, enough to win the West. But are the Dodgers really a 114 win team? For me the best approach may be a more cautious use of the model when putting down real money, looking to for deviations greater than +/-4 for possible wagering opportunities.