Probabilistic Pitch Framing (part 2) by The Kudzu Kid September 25, 2013 This is part two of a three-part series detailing a method of judging pitch framing based on the prior probability of the pitch being called a strike. In part 1, we motivated the method. Here in part 2, we will formalize it. The formula we’ll use for judging catcher framing is pretty simple on its face. For each pitch delivered, we calculate a value IsCalledStrike + prob(CalledStrike) Here, IsCalledStrike is simply 1 if the pitch is called a strike, and 0 otherwise. The second term is the probability that the pitch would have been called a strike, absent any information about the catcher’s involvement. We add up these values for every called ball or strike that a catcher receives, and report the resulting number. Since this method is essentially identical to defensive plus/minus, I’ve taken to calling it Catcher Plus/Minus (CPM), although someone reading this can probably come up with something better. I should mention the following: it has been brought to my attention that this method has been developed before. However, I can’t find it written up anywhere on the web. So you are welcome to consider this the documentation of an existing method, if you’d like. Pitch F/X hands us the first addend above; we’ll have to work a bit to get the second. There are many ways to approximate these probabilities; we will follow the lead of Matthew Carruth and use a generalized additive model to build them (more on this later). Now we could simply build this model for all pitches: and that would probably be pretty good, but let’s instead also include two more pieces of information: the ball-strike count and the handedness of the batter. This will give us 24 different probability plots like the one above, one for each combination of count and handedness. Now, you might rightly object that this is willfully ignoring tons of information that MLB is giving us. We know so much about these pitches — the pitch type, the horizontal and vertical break, the handedness of the pitcher, the top of the batter’s strike zone, the wind, the stadium, the home plate umpire … any and all of these bits of info could paint a better picture of the strike zone. We have to use all the information at our disposal, don’t we? Well, we should … but it turns out it’s, um, really hard. The data gets really thin when you drill down too much and we can’t do something decent with that many variables without building a fully Bayesian model. Now, again, we should do this, but building such a model is much more difficult than the few lines of R it takes to build a GAM fit*. And while these other variables probably do influence the strike zone, the assumption here is that they do not influence it as much as handedness and count. On the other hand, if someone does want to do this and plug the numbers into the CPM formula, well, that would be just great. * Set up a data frame with fields CalledStrike, px, and pz called pitches. Then: require("mgcv") s <- gam(CalledStrike~s(px)+s(pz),family=binomial,data=pitches) # To get the probability at a point (my_x, my_z): my_point <- data.frame(px=my_x, pz=my_z) logit_prob <- predict(s,my_point) final_prob <- exp(logit_prob)/(1+exp(logit_prob)) Anyhow! We can use this to judge pitch framing, but more importantly, we can make some animated gifs! Let’s take yet another look at how handedness affects the strike zone: We can also look at how the count affects the strike zone. I for one was stunned at how much of an effect it had, but then again, I’m easily stunned. Heck, we can even look at the different strike zones for sliders and four-seam fastballs, even though we’ve decided not to use it: Aw, hell, there is a difference. Well, as we’ve already discussed, the way I’ve decided to go is definitely not the best way to compute these probabilities. But it’s pretty good, computationally tractable, and will hopefully give us decent results for catcher framing. I guess we’ll find out in part 3.