There are 12 “states” of the count in baseball: 0-0, 0-1, 0-2, 1-0, 1-1, 1-2, 2-0, 2-1, 2-2, 3-0, 3-1, 3-2. In addition there are 3 “states” in which a plate appearance can end: strikeout, walk, and ball in play. This means that MLB plate appearances lend themselves wonderfully to analysis with Markov chains.
Every pitch thrown in MLB can be classified as a swinging strike, called strike, ball, foul, or ball in play. Each of these classifications has a defined effect in each count. For example, a swinging strike in an 0-1 count leads to an 0-2 count, and a foul in a 2-2 count leads to another 2-2 count.
Using PITCHf/x plate discipline statistics and a little algebra, it is possible to calculate the chance of each of these occurrences on any given pitch. Called strikes, swinging strikes, and balls are easy enough to calculate, but it gets tricky with fouls and balls in play. They both have the same requirements, in that the batter must swing and must make contact. To separate fouls from balls in play, then, we need to find how many pitches a pitcher allowed to be contacted, and then subtract the number of pitches that were put into play. This is easily found, since every batter faced by a pitcher either strikes out, walks, or puts the ball in play.
Unfortunately for the Markov process, major league players do not act randomly. In different counts, pitchers are more or less likely to throw the ball in the zone, and hitters are more or less likely to swing. This must be accounted for or the simulation will bear only a passing resemblance to the game actually played on the field. Using BaseballSavant, I found the rate at which pitchers throw in and out of the zone on every count, and then created an index stat like wRC+, where 100 is average and 110 is 10% more than average. For example, 3-0 counts have a Zone index of 129, and 0-2 counts have a Zone index of just 62. I did the same thing for Z-swing% and O-swing%. One caveat is that the Zone% numbers I got on BaseballSavant do not match those found in the PITCHf/x plate discipline stats. However, since these index stats are all RELATIVE to league average, it should not make a difference.
|
ZONE+ |
ZSWING+ |
OSWING+ |
0-0 |
110 |
61 |
53 |
0-1 |
88 |
112 |
98 |
0-2 |
62 |
131 |
117 |
1-0 |
113 |
91 |
82 |
1-1 |
99 |
119 |
115 |
1-2 |
75 |
134 |
135 |
2-0 |
121 |
91 |
80 |
2-1 |
115 |
123 |
120 |
2-2 |
95 |
137 |
152 |
3-0 |
129 |
18 |
19 |
3-1 |
128 |
114 |
106 |
3-2 |
122 |
139 |
169 |
Once we have all this data for a pitcher, we can use a Markov chain to essentially simulate an infinite number of plate appearances for him. Every plate appearance starts at 0-0. By knowing the chances of all the per-pitch results, we can estimate how many 1-0 and 0-1 counts the pitcher would get into, and how many times the pitch would be put into play. From 1-0, we can estimate how many counts become 2-0 or 1-1 or balls in play, and from 0-1, we can estimate how many become 0-2 or 1-1 or balls in play. Simulating in this way, every plate appearance will eventually lead to a strikeout, walk, or ball in play.
For every pitcher who qualified for the ERA title in 2014, I imported his Zone%, Z-swing%, O-swing%, Z-contact%, O-contact%, TBF, K, BB, and HBP (the last 4 only to calculate fair/foul%). Using these, I created a transition matrix for each pitcher that shows the probabilities of moving to any state of the count from any other given count. For example, here is Clayton Kershaw’s 2014 transition matrix.
|
0-0 |
0-1 |
0-2 |
1-0 |
1-1 |
1-2 |
2-0 |
2-1 |
2-2 |
3-0 |
3-1 |
3-2 |
K |
BB |
IP |
0-0 |
0 |
0.546 |
0 |
0.344 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.110 |
0-1 |
0 |
0 |
0.471 |
0 |
0.350 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.180 |
0-2 |
0 |
0 |
0.207 |
0 |
0 |
0.395 |
0 |
0 |
0 |
0 |
0 |
0 |
0.221 |
0 |
0.177 |
1-0 |
0 |
0 |
0 |
0 |
0.542 |
0 |
0.290 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.168 |
1-1 |
0 |
0 |
0 |
0 |
0 |
0.509 |
0 |
0.283 |
0 |
0 |
0 |
0 |
0 |
0 |
0.208 |
1-2 |
0 |
0 |
0 |
0 |
0 |
0.240 |
0 |
0 |
0.317 |
0 |
0 |
0 |
0.238 |
0 |
0.204 |
2-0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.564 |
0 |
0.260 |
0 |
0 |
0 |
0 |
0.175 |
2-1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.541 |
0 |
0.225 |
0 |
0 |
0 |
0.234 |
2-2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.283 |
0 |
0 |
0.231 |
0.246 |
0 |
0.241 |
3-0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.664 |
0 |
0 |
0.298 |
0.038 |
3-1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.567 |
0 |
0.203 |
0.229 |
3-2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.332 |
0.242 |
0.144 |
0.282 |
K |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
BB |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
IP |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
The left column represents the count before a given pitch is thrown. The top row represents the count after that pitch has been thrown. The intersection of any column and row is the chance of that particular transition occurring. So, for 2014 Kershaw, there was a 54.6% chance that he would get ahead of a batter 0-1, a 34.4% chance he would fall behind 1-0, and an 11% chance the batter would put the first pitch into play. Since the transition matrix shows the probabilities associated with throwing one pitch, raising the matrix to the second power simulates throwing 2 pitches. Similarly, finding the limit of the matrix simulates throwing an infinite number of pitches, after which a plate appearance is certain to be over. This is why the limit of Kershaw’s matrix (shown below) only has non-zero probabilities in the last 3 columns; after an infinite number of pitches, a plate appearance will have finally reached a conclusion of a strikeout, walk, or ball in play.
|
0-0 |
0-1 |
0-2 |
1-0 |
1-1 |
1-2 |
2-0 |
2-1 |
2-2 |
3-0 |
3-1 |
3-2 |
K |
BB |
IP |
0-0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.285 |
0.041 |
0.674 |
0-1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.369 |
0.023 |
0.608 |
0-2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.530 |
0.014 |
0.455 |
1-0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.243 |
0.082 |
0.675 |
1-1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.341 |
0.046 |
0.613 |
1-2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.505 |
0.029 |
0.466 |
2-0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.202 |
0.197 |
0.602 |
2-1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.295 |
0.111 |
0.594 |
2-2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.459 |
0.069 |
0.471 |
3-0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.136 |
0.515 |
0.349 |
3-1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.205 |
0.326 |
0.469 |
3-2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.362 |
0.216 |
0.422 |
K |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
BB |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
IP |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
Now, to predict Kershaw’s K% and BB%, we need only look at the top row, since all plate appearances begin with an 0-0 count. After a 0-0 count, we estimate Kershaw has a 28.5% chance to strike out any given batter and a 4.1% chance to walk him. Kershaw in 2014 actually had a 31.9% strikeout rate and a 4.1% walk rate.
This method produces a very robust r-squared of .86 when plotting xK% vs. actual K%. Unfortunately, r-squared drops to .54 when plotting xBB% vs. actual BB%.
I then imported the same statistics for batters, because there really is no reason why this method should not work equally well for both pitchers and hitters. It actually seems to work better as a whole on batters, with an r-squared of .81 for batters’ strikeouts and .77 for batters’ walks.
If there are any players in particular you’re interested in, I have included the full list of all qualified pitchers and position players, with both their expected and actual strikeout and walk rates.
Player |
xK% |
2014 K% |
xBB% |
2014 BB% |
Hughes |
19.1 |
21.8 |
1.8 |
1.9 |
Kershaw |
28.5 |
31.9 |
4.1 |
4.1 |
Price |
25.6 |
26.9 |
3.7 |
3.8 |
Sale |
31.3 |
30.4 |
6.1 |
5.7 |
Zimmermann |
23.2 |
22.8 |
3.3 |
3.6 |
Scherzer |
28.5 |
27.9 |
6.4 |
7 |
Bumgarner |
24.9 |
25.1 |
5.1 |
4.9 |
Lackey |
22 |
19.7 |
3.7 |
5.6 |
Kluber |
28.1 |
28.3 |
4.9 |
5.4 |
Strasburg |
26.8 |
27.9 |
5.6 |
5 |
Samardzija |
24.4 |
23 |
4.8 |
4.9 |
Hamels |
24.8 |
23.9 |
5.6 |
7.1 |
McCarthy |
20.1 |
20.9 |
4.4 |
3.9 |
Cueto |
23.6 |
25.2 |
6.8 |
6.8 |
Wood |
24.9 |
24.5 |
6.3 |
6.5 |
Kennedy |
24.8 |
24.5 |
7.5 |
8.3 |
Greinke |
25.4 |
25.2 |
6.4 |
5.2 |
Odorizzi |
24.1 |
24.2 |
9 |
8.2 |
Hutchison |
23.1 |
23.4 |
7.2 |
7.6 |
Teheran |
21.4 |
21 |
5.2 |
5.8 |
Harang |
20.8 |
18.4 |
6 |
8.1 |
Eovaldi |
17.9 |
16.6 |
4.9 |
5 |
Felix |
26 |
27.2 |
6.8 |
5 |
Dickey |
21.9 |
18.9 |
6.3 |
8.1 |
Fat Bartolo |
17.4 |
17.8 |
4 |
3.5 |
Kazmir |
20.9 |
21.1 |
6.5 |
6.4 |
Wainwright |
21.4 |
19.9 |
5.1 |
5.6 |
Wheeler |
25.2 |
23.6 |
9.8 |
9.9 |
Ventura |
21.2 |
20.3 |
6.9 |
8.8 |
Fister |
17.8 |
14.8 |
4.4 |
3.6 |
Chen |
17.8 |
17.6 |
5.9 |
4.5 |
Norris |
20 |
20.2 |
8.4 |
7.6 |
Lester |
22.6 |
24.9 |
8.1 |
5.4 |
Richards |
24.9 |
24.2 |
8 |
7.5 |
Porcello |
18.3 |
15.4 |
4.6 |
4.9 |
Shields |
20.8 |
19.2 |
6.5 |
4.7 |
Lewis |
18.8 |
17.5 |
5.7 |
6.3 |
Simon |
18.3 |
15.5 |
5.5 |
6.8 |
Iwakuma |
18.6 |
21.7 |
5.5 |
3 |
Lynn |
20.3 |
20.9 |
8.6 |
8.3 |
Wood |
18.3 |
18.7 |
8.4 |
9.7 |
Hammel |
22.5 |
22.1 |
7.7 |
6.2 |
Noesi |
18.9 |
16.8 |
6.3 |
7.6 |
Verlander |
18.5 |
17.8 |
6.8 |
7.3 |
Miller |
17.9 |
16.6 |
6.9 |
9.6 |
Young |
18.2 |
15.7 |
7.5 |
8.7 |
Koehler |
20.3 |
19.1 |
6.6 |
8.8 |
Archer |
22.9 |
21 |
8.1 |
8.8 |
Roark |
19 |
17.3 |
5.9 |
4.9 |
Haren |
19.1 |
18.7 |
7.6 |
4.6 |
Peavy |
18.4 |
18.5 |
7.4 |
7.4 |
Ross |
25.2 |
24 |
9 |
8.9 |
Niese |
17.9 |
17.6 |
5.3 |
5.7 |
Tillman |
17.5 |
17.2 |
7.9 |
7.6 |
Cobb |
22.6 |
21.9 |
8.4 |
6.9 |
Danks |
19.4 |
15.1 |
7.1 |
8.7 |
Garza |
18.3 |
18.5 |
7.4 |
7.4 |
Santana |
22.1 |
21.9 |
7.3 |
7.7 |
Quintana |
20.4 |
21.4 |
9.1 |
6.3 |
Alvarez |
15.7 |
14.4 |
3.9 |
4.3 |
Liriano |
27.3 |
25.3 |
10.8 |
11.7 |
Volquez |
20.5 |
17.3 |
7.1 |
8.8 |
Guthrie |
16.1 |
14.4 |
6.3 |
5.7 |
Buchholz |
18.7 |
17.9 |
7.3 |
7.3 |
Gray |
20.5 |
20.4 |
7.7 |
8.2 |
Burnett |
21.5 |
20.3 |
8.8 |
10.3 |
Collmenter |
16.4 |
16 |
6.9 |
5.4 |
Vargas |
19.2 |
16.2 |
6.9 |
5.2 |
Lohse |
17.4 |
17.3 |
6.9 |
5.5 |
de la Rosa |
19.2 |
18.1 |
10 |
8.7 |
Leake |
16.7 |
18.2 |
6.9 |
5.5 |
Vogelsong |
17.9 |
19.4 |
9.6 |
7.4 |
Cosart |
18 |
15 |
8.4 |
9.5 |
Weaver |
19.2 |
19 |
8.2 |
7.3 |
Hudson |
16.1 |
15.2 |
5.8 |
4.3 |
Feldman |
15.6 |
14 |
8.3 |
6.5 |
Kuroda |
17.3 |
17.8 |
8 |
4.3 |
Hernandez |
17.4 |
14.5 |
9 |
10.1 |
Buehrle |
14.7 |
13.9 |
6.2 |
5.4 |
Keuchel |
18.5 |
18.1 |
8.3 |
5.9 |
Peralta |
16.5 |
18.4 |
9.5 |
7.3 |
Elias |
19.5 |
20.6 |
11.1 |
9.2 |
Miley |
18.3 |
21.1 |
10.3 |
8.7 |
Kendrick |
15.1 |
14 |
7.3 |
6.6 |
Wilson |
21 |
19.8 |
13.1 |
11.2 |
Gibson |
15.8 |
14.1 |
8.5 |
7.5 |
Stults |
14.5 |
14.5 |
8.7 |
5.9 |
Gallardo |
16.3 |
17.9 |
11.1 |
6.6 |
McCutchen |
19.8 |
17.7 |
11.7 |
13 |
V-Mart |
13.9 |
6.6 |
8.3 |
10.9 |
Abreu |
23.7 |
21.1 |
6.5 |
8.2 |
Stanton |
27.6 |
26.6 |
12.9 |
14.7 |
Trout |
27.9 |
26.1 |
12.4 |
11.8 |
Bautista |
19.8 |
14.3 |
12 |
15.5 |
Rizzo |
23.2 |
18.8 |
9.5 |
11.9 |
E5 |
20.3 |
15.1 |
9.5 |
11.4 |
Brantley |
10.9 |
8.3 |
8 |
7.7 |
Cabrera |
17.6 |
17.1 |
7.1 |
8.8 |
Beltre |
16.6 |
12.1 |
6.7 |
9.3 |
Puig |
17.5 |
19.4 |
10.4 |
10.5 |
Werth |
24 |
18 |
11.4 |
13.2 |
Freeman |
18.7 |
20.5 |
12.3 |
12.7 |
Morneau |
11.8 |
10.9 |
5.7 |
6.2 |
Posey |
15.5 |
11.4 |
7.5 |
7.8 |
Cruz |
22 |
20.6 |
7 |
8.1 |
Kemp |
24.8 |
24.2 |
7.9 |
8.7 |
Ortiz |
16.7 |
15.8 |
11.1 |
12.5 |
Lucroy |
18.3 |
10.8 |
6.4 |
10.1 |
Gomez |
19.4 |
21.9 |
7.1 |
7.3 |
Harrison |
17.9 |
14.7 |
3.8 |
4 |
Upton |
27 |
26.7 |
8.2 |
9.4 |
Altuve |
9 |
7.5 |
3.3 |
5.1 |
Han-Ram |
16.2 |
16.4 |
9.9 |
10.9 |
Duda |
25.3 |
22.7 |
11.5 |
11.6 |
Rendon |
17.9 |
15.2 |
8.7 |
8.5 |
Cano |
12.2 |
10.2 |
6.6 |
9.2 |
Holliday |
14.3 |
15 |
9.5 |
11.1 |
Marte |
25.2 |
24 |
6.3 |
6.1 |
Smith |
20 |
16.7 |
11.6 |
13.2 |
LaRoche |
19.8 |
18.4 |
12.5 |
14 |
Walker |
15.2 |
15.4 |
9 |
7.9 |
Cabrera |
13.5 |
10.8 |
7.1 |
6.9 |
Santana |
22.6 |
18.8 |
14.2 |
17.1 |
Gonzalez |
19.3 |
17 |
6.1 |
8.5 |
Donaldson |
19.9 |
18.7 |
10.5 |
10.9 |
Frazier |
22 |
21.1 |
8.4 |
7.9 |
Fowler |
20.8 |
21.4 |
13.2 |
13.1 |
Seager |
18.7 |
18 |
9.7 |
8 |
Gordon |
22.9 |
19.6 |
9.9 |
10.1 |
Carter |
32.4 |
31.8 |
9.1 |
9.8 |
Peralta |
19 |
17.8 |
8.7 |
9.2 |
Valbuena |
24.7 |
20.7 |
8.8 |
11.9 |
Span |
14.3 |
9.7 |
5.6 |
7.5 |
Calhoun |
19.7 |
19.4 |
6.3 |
7.1 |
Castro |
18 |
17.6 |
7.3 |
6.2 |
Yelich |
22.9 |
20.8 |
10.9 |
10.6 |
Pence |
20.8 |
18.4 |
8.6 |
7.3 |
Jones |
20 |
19.5 |
5.2 |
2.8 |
Gomes |
23 |
23.2 |
5.6 |
4.6 |
Eaton |
20.7 |
15.4 |
5.3 |
8 |
Pujols |
14.7 |
10.2 |
5.7 |
6.9 |
Braun |
19.9 |
19.5 |
6 |
7.1 |
Chisenhall |
20.2 |
18.6 |
5.2 |
7.3 |
Dozier |
25.9 |
18.2 |
8.6 |
12.6 |
Moss |
27.8 |
26.4 |
9.7 |
11.6 |
Blackmon |
16.3 |
14.8 |
5.7 |
4.8 |
Carpenter |
25.1 |
15.7 |
9.9 |
13.4 |
Ozuna |
27.8 |
26.8 |
6.8 |
6.7 |
Adams |
19 |
20.2 |
5.6 |
4.6 |
Hunter |
16 |
15.2 |
4.6 |
3.9 |
Ramirez |
13.9 |
14.1 |
4.7 |
4 |
Dunn |
30.9 |
31.1 |
14.1 |
13.9 |
Zobrist |
17.6 |
12.8 |
9.4 |
11.5 |
Gardner |
25.6 |
21.1 |
9.6 |
8.8 |
Plouffe |
19.7 |
18.7 |
9.3 |
9.1 |
Davis |
21.6 |
22.2 |
7.6 |
5.8 |
Gillaspie |
14.9 |
15.4 |
6.7 |
7.1 |
Byrd |
29.4 |
29 |
4.3 |
5.5 |
Heyward |
18 |
15.1 |
9.7 |
10.3 |
Desmond |
27.4 |
28.2 |
6.9 |
7.1 |
Kendrick |
19.9 |
16.3 |
5.7 |
7.1 |
Ellsbury |
14 |
14.6 |
7.9 |
7.7 |
Cespedes |
20.6 |
19.8 |
5.4 |
5.4 |
Markakis |
16.1 |
11.8 |
8.1 |
8.7 |
Utley |
15.8 |
12.8 |
8.5 |
8 |
Suzuki |
15.9 |
9.1 |
6.8 |
6.8 |
Prado |
18.2 |
14 |
6.9 |
4.5 |
Murphy |
13.4 |
13.4 |
6.4 |
6.1 |
Sandoval |
12.1 |
13.3 |
4.9 |
6.1 |
Mauer |
23.5 |
18.5 |
9.2 |
11.6 |
Choo |
26.7 |
24.8 |
9.9 |
11 |
Reyes |
12.7 |
11.1 |
5.6 |
5.8 |
Granderson |
25.3 |
21.6 |
10.1 |
12.1 |
Aoki |
11 |
8.9 |
8 |
7.8 |
Rollins |
21.4 |
16.4 |
8.2 |
10.5 |
McGehee |
16 |
14.8 |
8.5 |
9.7 |
Kinsler |
11.3 |
10.9 |
5.7 |
4 |
Loney |
12.7 |
12.3 |
7.2 |
6.3 |
Pedroia |
19.3 |
12.3 |
6 |
8.4 |
Solarte |
14.6 |
10.8 |
7.9 |
9.9 |
Teixeira |
24.2 |
21.5 |
10.3 |
11.4 |
Longoria |
20.3 |
19 |
6.2 |
8.1 |
Jones |
20.4 |
21.2 |
8.9 |
8.4 |
Headley |
21.9 |
23 |
10.9 |
9.6 |
Navarro |
18 |
14.6 |
5.9 |
6.2 |
Ramirez |
13.2 |
12.3 |
4.8 |
3.7 |
Crisp |
18.3 |
12.3 |
8.9 |
12.3 |
Freese |
24.9 |
24.3 |
7.3 |
7.4 |
Hosmer |
17.4 |
17 |
7.7 |
6.4 |
Jennings |
22.2 |
19.9 |
8.3 |
8.7 |
Gordon |
20.5 |
16.5 |
4.5 |
4.8 |
Butler |
17.3 |
15.9 |
5.6 |
6.8 |
de Aza |
24.6 |
22.5 |
6.4 |
7.4 |
Crawford |
24.8 |
22.9 |
7.4 |
10.5 |
Rios |
18.7 |
17.9 |
7.1 |
4.4 |
Wright |
18.7 |
19.3 |
7 |
7.2 |
Davis |
34.1 |
33 |
10 |
11.4 |
Aybar |
11 |
9.7 |
4.7 |
5.6 |
Cabrera |
16.7 |
17.5 |
7.2 |
8 |
Montero |
19.5 |
17.3 |
7.5 |
10 |
Castellanos |
23.9 |
24.2 |
6.7 |
6.2 |
Escobar |
14.8 |
13.4 |
4.6 |
3.7 |
Martin |
20.5 |
19.6 |
5.5 |
6.7 |
Howard |
30.1 |
29.3 |
9.8 |
10.3 |
McCann |
16.9 |
14.3 |
7 |
5.9 |
Ackley |
19.9 |
16.6 |
5.8 |
5.9 |
Revere |
15.1 |
7.8 |
3.9 |
2.1 |
Perez |
14 |
14 |
3.4 |
3.6 |
Hardy |
24.8 |
18.3 |
5 |
5.1 |
Viciedo |
20.2 |
21.7 |
6.5 |
5.7 |
Lowrie |
13.6 |
14 |
7.3 |
9 |
Mercer |
19.5 |
16 |
5.5 |
6.3 |
Escobar |
10.9 |
11.3 |
8.9 |
8.1 |
Parra |
14.8 |
17.4 |
7.3 |
5.6 |
Bogaerts |
26.3 |
23.2 |
6.8 |
6.6 |
Jackson |
23.4 |
22 |
8 |
7.2 |
LeMahieu |
16.5 |
18 |
6 |
6.1 |
Castro |
27 |
29.5 |
8.1 |
6.6 |
Andrus |
18.5 |
14 |
8.5 |
6.7 |
Hechavarria |
13.2 |
15 |
4 |
4.5 |
Hill |
17.7 |
17 |
7.4 |
5.2 |
Kipnis |
22.3 |
18 |
7.6 |
9 |
Johnson |
26 |
26 |
3.9 |
3.8 |
Bruce |
26.1 |
27.3 |
8.5 |
8.1 |
Hamilton |
20.4 |
19.1 |
6 |
5.6 |
Brown |
15.2 |
17.8 |
8.4 |
6.6 |
Infante |
14.6 |
11.8 |
6.2 |
5.7 |
Jeter |
12.1 |
13.7 |
5.5 |
5.5 |
Upton |
29.3 |
29.7 |
7.4 |
9.8 |
Simmons |
11.1 |
10.4 |
5.1 |
5.6 |
Segura |
15.6 |
12.6 |
4.3 |
5 |
Craig |
21.6 |
22.4 |
7.2 |
6.9 |
Dominguez |
21.9 |
20.6 |
5.2 |
4.8 |
Cozart |
15.3 |
14.5 |
5.3 |
4.6 |
One advantage of this method over any of the many regression based estimates using plate discipline stats is that this can be further tailored to each player. The reason for this is that ZONE+, ZSWING+, and OSWING+ are all league average indexes, and some players’ talents are just not captured by league averages. For example, Dustin Pedroia’s expected strikeout rate is nowhere near his actual strikeout rate. Presumably, Pedroia has swing tendencies in certain counts that are markedly different from the average hitter. By examining these swing tendencies, it is likely possible to predict Pedroia’s yearly strikeout rates with much greater accuracy, as those tendencies are probably part of his approach at the plate year after year. Still, as preliminary research into this area, these I think these results as a whole are very promising.
Great Job!!!! I love it.
This is interesting. Good work.
Is there such a thing as not fat Bartolo??
Captain – Did you miss the part where a Markov Chain requires that movement from state to state be independent of the method of achieving that movement. In plain language for a Markov to give acceptable results what happens at a 0-1 count would not depend on whether the first pitch was a called strike, a swinging strike or a foul ball. This is definitely not the case.
I am aware that a true Markov Chain involves memoryless states, and I should have made note of this in the article. I suppose the batter-pitcher matchup is more of a semi-Markov process, but I don’t know that that is considered a real mathematical model. In any case, I don’t think the sequencing effects you reference are significant in the aggregate. If they are, though, perhaps this method could indicate those pitchers who are better at sequencing their pitches effectively.
Actually, I think the biggest value of your model is as a kind of raw estimate of “sequence-neutral” expected outcomes for K% and BB%. If you sort the pitchers by (K% minus xK%), the leaders have a combination of great command and multiple plus pitches, while the laggards are typically guys with either just one pitch (Dickey) or not enough command to take advantage of sequencing (Volquez)
And on the flipside, hitters like Pedroia and Carpenter tend to outperform the Markov chain model, most likely because of their superior ability to stay alive on 2-strike counts. I’m not sure what to call the flipside of sequencing (insequenceableness?) but it feels like a good explanatory metric in its own right.
Bottom line, I just love this framework. I can see it being useful for many purposes beyond creating estimates for K and BB rates.
Thanks for the kind words. Even I keep finding new uses for it that I hadn’t intended. The framework can also be used to estimate pitches per plate appearance, along with the chance that a pitcher is in a certain count at any given time. For example, the model predicts that if Madison Bumgarner is about to throw a pitch, there is a 7.7% chance that the count is 0-2. That information could possibly be combined with wOBA by count or something else of that nature.
This is sort of to Peter’s point. Empirically the fair/foul% depends on the count. More foul balls happen with two strikes (choking up). So it didn’t surprise me that the average xK% exceeds the true K% for both batters and pitchers in your data.
But otherwise the implementation of a structural model of walks and strikeouts is well done. It is a major refinement from what I’ve used, a linear approximation of walks in strikeouts using the same variables (O-Swing%, etc).
You’re correct that the chance for a foul ball is higher with two strikes, but the effect is very small. Pitches contacted in 0-2 counts are 52% foul and 48% fair, compared to the overall 50/50 split in all counts (and that is the most extreme example). Implementing this information produces negligible results, changing expected strikeout rates two tenths of a percentage point at best.
The real difference seems to be between pitches in and out of the zone. Contact made on pitches in the zone is only 47% foul, while on pitches out of the zone it is 55% foul. I’m currently working on adding this to the model.
This is good stuff. Are you able to take this and simulate at-bats between a pitcher and hitter? If you can post an example, that would be awesome. Keep it up!