Graphical wOBA by Count
I am a big fan of graphs and baseball. Fangraphs made me excited because putting complex data into reasonably easy to understand graphs helps open up sabermetrics to more fans. I’m a big fan of statistical analysis, but after a while, a table full of numbers just starts running together and stops making sense. That’s what makes graphs such an effective tool.
I’ve dabbled in graphs myself. When people were creating the WAR graphs to compare hall of famers, I made a sample graph showing cumulative WAR by age on Tom Tango’s Book Blog:
(click for a larger image)
Of course, soon after Fangraphs came out with a far better looking one, saving me the headache of figuring out how to automate it.
Here is my latest foray into the world of graphs, looking at wOBA by count:
(click for a larger image)
Let me explain the mess you see above. The horizontal X-axis shows the amount of pitches. The first pitch is all the way to the left, and a full count is all the way to the right.
The vertical Y-axis shows the wOBA for all at-bats that go through that count. Since all at-bats go through the first pitch, the average wOBA is .323 (league average). The higher on the graph, the more likely a player is going to do something good. As you can see, the best count for hitters is 3-0, and the worst count is 0-2. On 3-0 the average hitter is a 2003 Barry Bonds, and on 0-2 they’re batting more like Aaron Cook.
The size of the counts (by area) are the amount of times that count has happened. There were 185,524 PA in 2010, so the first pitch is the biggest. There were only 9244 3-0 counts, so that is the smallest.
Each of the counts is a graph in and of itself showing what happened at that count. Blue is ball, red is strike, and gray means the play ended. As you can see, with 2 strikes the play ends with another, so there are only balls and ended at-bats.
So What?
I made this graph for my own use. It is a nice easy-reference tool to track what’s happening each pitch. I can follow and see if a batter’s chances went up or down, and how likely the at-bat is going to end on each pitch (really roughly). Ideally I would make one for each team, so that you can get one for your own team and use it when you’re watching games, or even for each player so that you can compare and contrast Vladimir Guerrero with Kevin Youkilis, or the Twins and the Yankees, etc.
And there’s a good chance that there are things that you can think of to use this graph for, so please let me know what they are in the comments.
References used:
Raw Data:
The graph was initially made in Excel to get the bubble positions and sizes, then imported into Adobe Illustrator to add the pie graphs, connecting lines, etc.
Editor’s Note: You can find more of Joshua Maciel’s work on his blog: Henkakyuu
I'm an expat living in Japan since 2003, doing sales and marketing work. More of my work is available on Henkakyuu, my personal blog. Also feel free to inspire me to use twitter more often @henkakyuu
Good stuff, Joshua. And I’m loving the blog.
Keep the good stuff coming!
They’ll keep coming as long as I keep having ideas. Feel free to let me know if there’s anything you want created graphically.
And I made a horrible mistake. The updated image is here:
http://img41.imageshack.us/img41/495/splitcount20101205.png
I forgot to remove IBB from the denominator, which is why 3-1 and 3-0 are about the same.
Really like the graph. What about something based on BB% for hitters? Is there a certain count that batters with a good eye are better are getting to? Do they make better decisions in all counts or in certain counts in particular?
I did a quick set of graphs comparing Albert Pujols (patient) and Miguel Tejada (not) but the difference wasn’t immediately apparent. I put the spreadsheet up on google docs, so you can play with it if you like. Just copy-paste whatever player’s data from baseball-reference.com and you can compare players yourself (I just set the individual sheets on the same row so that when I flip pages, the graph is in the same place, and you can see how it changes):
https://docs.google.com/leaf?id=0B_hKIaAw27e_MjcwZmY0NzktNjg2OC00NjIyLWE2ZjMtYjFjNWI2YzVmODYz&hl=en&authkey=CJLktYQK
If you really want to look at the difference in approach, I think the best way may be to see the differences between the wOBA at different counts, rather than seeing what the actual wOBA through each count is. That way you could see if patient hitters lose less by taking a first strike, or if they gain more from going 1-1, or whatever. The important thing is to find a way to deal with the fact that patient hitters are typically better hitters, and will have a resulting better wOBA on batted balls.
Interesting. I feel like announcers (especially this past year) are constantly reminding us how important that 1-1 count is, implying that the swing in advantage is biggest between 2-1 and 1-2. According to you graph, however, the largest swing in wOBA appears to be on a 2-1 count, between 3-1 and 2-2, by a fairly large margin.
Great stuff!
There are a lot more 1-1 counts than 2-1 counts, and I think that’s one of the major reasons why. Even though the swing is smaller, the net change from 1-1 is probably bigger (I’d have to do the math to make sure). Not saying the announcers are right, but taking a stab at why they’d say that. Either that, or it’s just one of those old adages that has no basis in reality.
I remember some story in Moneyball about the difference between 2-1 and 1-2 being the biggest of any single outcome. Although back then they presumably would have been talking about OBP or something, rather than wOBA.
Are we really at the point where announcers are quoting antiquated, wrong-headed numbers from Moneyball? Where can the rest of us find these announcers?
Wow also it would ruuule if Fangraphs started using something like this to do pitch-by-pitch live game graphs.
With OBP it looks pretty-much the same (larger gap between 3-1 and 2-2 than 2-1 and 1-2). I don’t remember the passage from Moneyball, but if you find it, I’ll check it out.
What I’d personally love to see fangraphs do is to create a graph like this that allows you to compare between players. So you can see how Pujols performs against league average, or how Francoeur does.
If anyone can do it, Dave Allen can with his R skills.
Great stuff! Looking forward to more.
This is so cool. You made a very complicated set of numbers into an easy to read and understand, and cool, graph. I would love to see more graphs like this.
I also agree with fang2415; it would be amazing if this could be combined with win expectancy to get pitch-by-pitch WE figures.
Joshua,
Very cool. I think you could improve this graph by differentiating the strikes. A strike could be a swing and a miss, a foul, or looking. Breaking out those events will add a lot of richness.
Matt, unfortunately baseball reference doesn’t have that information. If you know of somewhere that has that info available, I’d love to see it though.
Joshua,
These are great. Did you see that Tango linked to you over at The Book Blog?
One comment: I would much prefer if you kept the ball/strike divisions even when they ended an at bat. (Sorry if this has been mentioned, I haven’t read all the comments.)
That’s likely an issue of the data set, but it’d be really nice to know what % of at bats in a particular count ended in strike outs, what % ended with walks, and what % ended with a ball in play.
Is it possible to do that? It would make an already fascinating visual even more informative.
Sorry, just to be clear, for all counts with 2 strikes you want to me to keep the red portion of the graph that shows how many strikeouts, and vice versa for 3 ball counts and walks? I can do that. Give me a little bit.
Patrick,
Here’s the graph you requested (I think):
http://img220.imageshack.us/img220/9473/splitcount20101214.png
I don’t have a copy of Moneyball handy, but the passage is quoted at http://www.fantasybaseballcafe.com/forums/viewtopic.php?t=363213. Lewis’s phrasing doesn’t make it completely clear, but it looks like they might be talking about BA (oddly). I could actually see that being the case since you’re discounting the much higher likelihood of a walk after the second ball.
I can’t see how that’s true.
Through a 2-1 Count: .268/.401/.441
Through a 1-2 Count: .192/.243/.297
Through a 3-1 Count: .355/.689/.651
Through a 2-2 Count: .195/.201/.303
I just don’t know any way that the quote in question makes any sense. Unless the baseball reference numbers are wrong.
Yikes, made some mistakes with my copy-paste there…
Through a 2-1 Count: .268/.401/.441
Through a 1-2 Count: .192/.243/.297
Through a 3-1 Count: .289/.590/.505
Through a 2-2 Count: .207/.305/.332
So BA varies by .076 after 1-1 vs .082 after 2-1;
OBP varies by .158 after 1-1 vs .285 after 2-1;
and SLG by .144 after 1-1 vs .173 after 2-1.
So that difference for BA is pretty small — I could see it changing that much since 2003 or whatever. Or maybe dePodesta was counting some split that we aren’t or something. But clearly if you’re looking at OBP that third ball buys you an extra zillion points over any other pitch.
Amazing graph. Love it.
PS – you can reply directly to people’s comments, you don’t have to start a new comment thread. Makes reading your replies to people much easier.
Fits with the old-school adage that the longer the at-bat, the more in favor of the hitter it becomes.
@Lee, I don’t know why, but for this thread there is no “reply” button for me. Maybe I broke it?
I see a few here are making the mistake of thinking hitters get exceptionally better when the count is in their favor. That’s mostly an artifact of the way we calculate hitting statistics. Swing and miss on an 3-0 count and what is the statistical impact on BA or wOBA? Nothing, right? But swing and miss on a 3-2 count and the impact is non-zero. What I think we want to know here is, is there a difference in the probability of getting a hit by count (where taking a ball or strike are part of the sample space), and does that probability go up with count leverage for the batter.
It appears to me the graphs are simply calculating the result of AB that end on various counts. Let’s not perpetuate the myth that taking a strike or two is the worst thing a hitter can do.
All that aside, nice effort and interesting way to present the data.
Peter, they are not “on the count”, they are “through the count”. The impact of the first ball is eliminating the possibility of an 0-2 count, which raises the wOBA of the at-bat. When you go from 2-1 to 3-1, you get a huge increase because of the change between 3-1 and 2-2 (walk vs. strikeout). Hitters get better outcomes when the count is in their favor, and it isn’t an artifact of the way we calculate things.
Joshua,
Did you include fouls on 2 strike counts? So a 2-strike count that ended in neither a ball or in play? Seems to me like 2 strike counts should be significantly more common based on fouling off pitches. How many times have we seen a guy like Manny foul off 10 straight pitches (seemingly). Or is this offset by guys like Tejada and Juan Uribe who start swinging from the moment they step in the batters box and (presumably) would be in fewer of those situations.
Never mind, the edited graph you put up answers my question. This is honestly one of the cooler graphs I have seen, so thanks!
Really interesting graph. I’d like to make sure that I’m understanding it though.
“Through” the count. So for 0 – 0 every at bat goes through 0 – 0 so it is at the league average.
And for 1 – 0 the wOBA listed is for every count with at least one ball?
Is that correct?
If that is correct, it might be interesting to put the wOBA for each count on each circle. Just for comparison and more data is usually more interesting.
You’ve got it filihok. That graph already exists here:
http://img97.imageshack.us/img97/6922/splitcount20101214bb.png
The grey numbers are for the wOBA on that count (the grey portion of the circle), and the walks/strikeouts were added in.