As a Red Sox fan, I got very excited opening day when Dustin Pedroia hit two home runs. One of the big questions of this offseason is whether he has upper-single-digit homer power, or upper-teens homer power. Of course, as a thinking baseball fan, my head tells me to avoid getting overly excited about a small sample size. But does the two-HR outbreak actually tell us nothing? I think the expectations going into the season combined with Pedroia’s performance in his first game is a perfect situation to use Bayes’ Theorem.
To elaborate, I think Pedroia’s expectations going into this season have a bimodal distribution. If you look at his 2008-2012 seasons, he averaged 16 HR per year. His last two seasons averaged 8 HR per year. Was this due to a real decline, or due to injuries that sapped his power? While someone like Mike Trout might have a nice normally-distributed expectation around 35 HR, I expected Pedroia to have an either/or season: he’d either get back to 2008-2012 production, or continue as a 8-HR guy.
Now for a review of Bayes’ Theorem: it tells you how to update your prior beliefs given an observation. The formula for this is P(A|B) = P(B|A)*P(A)/P(B), where A and B are events, P(A) and P(B) are the probabilities of those events, and P(A|B) or P(B|A) should be read as “Probability of A given B,” or “Probability of B given A,” respectively. Specifically, in this case, A is “Dustin Pedroia is a 16-HR guy”, and B is “Dustin Pedroia hit 2 HR in his first game of the season”. I had a preseason belief about P(A), but I want to update it given that event B has occurred.
As implied above, I’m going to simplify Pedroia’s season outcomes into two possible outcomes: He is an 8-HR guy, or a 16-HR guy. Before the season, I’m going to guess that I had about a 50-50 belief that he was either one. Another assumption I’m going to make, to make the math easier, is that a season will see 640 plate appearances. You can make your own assumptions, but this is a demonstration of how much Bayes’ Theorem helps us update beliefs based on just one observation.
We need to determine three quantities to do our calculation now:
1. P(A)—probability that Pedroia is a 16-HR guy
2. P(B|A)—probability that we would see Pedroia hit 2 HR in his first 5 plate appearances, given that he is a 16-HR guy
3. P(B)—probability that we would see Pedroia hit 2 HR in his first 5 plate appearances (taking our 50-50 chance that he’s a 16 or 8-HR guy as a given)
1. Probability that Pedroia is a 16-HR guy
Easy. By assumption, P(A) is 50%.
2. Probability that we would see Pedroia hit 2 HR in his first 5 plate appearances, given that he’s a 16-HR guy
Tougher, but we can use a binomial probability model. That is 5C2*P(HR)^2*(1-P(HR))^3. When we have 16 HR in 640 plate appearances, P(HR) is 1/40, and 1-P(HR) is 39/40. This turns out to be .00579. P(B|A)= 0.579%.
3. Probability that we would see Pedroia hit 2 HR in his first 5 plate appearances, with preseason assumptions
This is the weighted average of all his possible season outcomes—so probability of 2HR in 5PA, given that he is a 16-HR guy, times the chance that he’s a 16-HR guy, PLUS, probability of 2HR in 5PA, times the chance that he’s an 8-HR guy. The same calculation as in number 2 can be done for if he’s an 8-HR guy, yielding an answer that the chance that he’d hit 2HR in 5PA is 0.151%. Given our calculation in the above paragraph, and our preseason assumption that it’s 50-50 that he’s an 8 or 16-HR guy, that gives us a weighted average P(B) = 0.365%.
So now we can mash all of those numbers into Bayes’ equation, and we find that .50*.00579/.00365 = .794, or 79.4%! Turns out that my Red Sox-loving lizard brain was not wrong! If you believed preseason that there was a 50%-50% chance that Pedroia would return to his 2008-2012 form, you should rationally update your beliefs to 80%-20% on the minuscule sample size of just two home runs in five plate appearances! Another note is that we should be forward-looking: since he has nearly a full season of plate appearances remaining, it might be rational to think that he’s likely to be an 18-HR guy, now that he has 2 in the bag.
This method could be adapted to a continuous expectation of outcomes, allowing a chance that Pedroia might be something besides an 8HR guy or a 16HR guy (although you and I know that that is clearly absurd).