Author Archive

53 Things About a 53-Second Finnish Baseball Video

With no baseball being played on this Monday night as I write this, I thought I’d throw this out for a quick fix.  Granted, this is baseball as it’s played in Finland:

 

Below is a second-by-second recap of all the glorious action.

{note – because the Stone-Age author doesn’t know how to post GIFs into an article, you’ll have to pause the video yourself to freeze the action for each of the 53 seconds}

0:01 – Dude in the white-striped uniform way off the plate, obviously trying to avoid catcher’s interference because of the dude in the orange-and-blue uniform.

0:02 – Orange-and-blue apparently spots the pitcher striding towards the pitcher’s mound, which I guess in Finnish is the “tikli”.

0:03 – There’s a “ski” on the back of the hitter’s jersey, so he must be Sami Haapakoski.  Not likely to be another Polish guy on a Finnish baseball team.

0:04 – And he’s got his hands backwards.  (I’d love to see how he holds a light bulb to screw it in)

0:05 – And now the catcher flips the ball up in the air!  A combination hidden-ball trick/quick-pitch.

0:06 – First baseman charging in…Sami charging at the offering, which can only mean…

0:07 – A line drive over the first baseman’s head.  Well played Sami!

0:08 – Sami now runs down the THIRD-BASE LINE!!!! (being half-Polish myself I have no more capacity to joke).  This means that the runner who’s already there (Jeano Segurannen) has to start running to second.

0:09 – What’s with the water hazard inside the park?  I guess with this being Finnish baseball, they’ve replaced right field with a right fjord.

0:10 – I like the greenery in right fjord.  Gives it a Wrigley-like ambiance (this is the Obligatory 2016 Cubs Reference™ for this article)

0:11 – Crowd going wild, screaming for Sami to run the bases the right way and not blow a well-earned ground-rule double.

0:12 – Or maybe it’s a ground-rule triple if it gets stuck in the poison ivy.  Not sure.

0:13 – Love the hustle on the guy in right fjord.  Plays the game the right way, he does.

0:14 – And emerging from behind a tree there’s an umpire, checking to see if the ball lodged in the poison ivy for a triple or into the water for a double….what, the ball’s IN PLAY??!?

0:15 – Yep. The right fjorder (Jonni Damonen) swiftly tosses a relay to one of his fellow outfjorders.

0:16 – Unfortunately, Ryän Raburninnen isn’t known for having the best “handle” in this sport

0:17 – Average water temperatures in Finland are colder than anywhere in the continental USA.  That’s because they’re measured in degrees Celsius.

0:18 – Look, there’s Jeano rounding the bases the right way

0:19 – Poor right fjorder takes his second plunge in the last five seconds.  Someone please fire up a sauna for ol’ Jonni.

0:20 – And there’s Sami flying like a Finn right behind him.  All this fumbling of the frigid fjord-frozen ball in right fjord has allowed them to finally move forward again.

0:21 – Nice flip by the right fjorder.  Maybe they should move him to second base, wherever the hell they put that in Finland.

0:22 – Nice use of the split screen for the fielding and baserunning portions of the play.  Might catch on for MLB telecasts if they ever tried it.

0:23 – Here comes Sami to his jubilant teammates….

0:24 – …PSYCH!!…

0:25 – …running up the third-base line without him

0:26 – The right fjorder pulls his hypothermic body up Tallinn’s Hill, his efforts having been to no avail.

0:27 – Why are they running out there with their bats?  I am so thoroughly confused.

0:28 – Led Zeppelin, the official sponsor of the third-base warning track.

0:29 – Those uniforms make these guys look like a NASCAR pit crew.  Waiting for one of them to hand Sami a champagne bottle to spray the place.

0:30 – Some guy in a blue jacket is taking a stroll in from left field, apparently oblivious to all the mayhem.

0:31 – This part of the field is also used for the Finnish Capture The Flag League.

0:32 – Finnish vodka is excellent.  Just ask the camera guy.

0:33 – Guy in blue jacket has a helmet on.  Must be from a different pit crew.

0:34 – Ebullient Finnish yelling.

0:35 – This part of the field was formerly used by the local Finnish Basketball Association team.  The team disbanded once it was discovered that someone forgot to put up an actual basket.

0:36 – The one guy with a green helmet comes towards the camera with his bat in ready position.  Must be the team’s enforcer.

0:37 – “HAYYYYY!!!”

0:38 – Another yell sounding like “BASEBALLLL!!!!”

0:39 – Coach about to give Sami a water bottle for all his efforts with the bat and on the basepaths (both clockwise and counterclockwise)

0:40 – Fun fact: one of those long Finnish words on Sami’s uni means “this space available for sale”.  I forgot exactly which one it was.

0:41 – At least Sami holds the water bottle correctly.

0:42 – How come there’s no left fjord?

0:43 – Fuzzy blue feet can only mean one thing — a mascot!  Wonder who/what they have for mascots in Finland?

0:44 – It’s the love child of these two!  Sweet!

0:45 – Not sure what that thing is over the bleachers behind home plate (home Frisbee?).  Looks vaguely aerodynamic.

0:46 – Someone obviously has a job that includes coordinating handtowels to these guys’ uniforms.  The age of specialization is not merely a North American phenomenon.

0:47 – Because Finnish baseballs are often contaminated with fjord-borne bacteria, used handtowels are the souvenir of choice.

0:48 – Eriko is like… what?

0:49 – Ignoring the two kids waving for the towel in the front, Sami fires a Hail Mary pass for the blonde in the top row.

0:50 – Notice all the parkas and heavy winter clothing on these fans.  Although the average game-time temperature in Finland is about 17°C, the temperature on this evening was only 10°C, which is just 10 degrees above the freezing point of the right fjorder’s uniform.

0:51 – Nobody bothered to man the lemonade stand in left field just past the bleachers.  Guy in the blue jacket probably just walked off with the lemons.

0:52 – Can the Finnish president override a vimpelin veto?

0:53 – Fun fact:  the official logo of Superpesis, the major league of Finnish baseball, has basically the same logo as the NBC peacock.

Thank you for watching, and have a nice day.


Comprehensive Contact Quality Model Using MLBAM Batted-Ball Data (Version 0.0)

Contact quality is a recurring sabermetric theme.  Much discussion over the last decade has centered around how we interpret Voros McCracken’s groundbreaking analysis, where he showed that the majority of variance in a pitcher’s ERA was driven by the rates at which he recorded strikeouts, walks, and home runs allowed.  This led to the conclusion by many that the batting average on balls in play (excluding homers) was largely outside of a pitcher’s control, and further research has probed the influence of team defense, home ballpark, and other outside factors on differences in BABIP.

Nevertheless, pitchers like Dallas Keuchel and Chris Young seem to have above-average success in “pitching to contact”,  even after allowing for outside factors.  To better understand such outliers from the standard fielding-independent pitching model, I have developed a new bottom-up  framework to analyze the quality of contact allowed, using the newly-available batted-ball data from MLB Advanced Media (via Baseball Savant).  This model takes all batted balls (including homers) and calculates the expected run value based upon how hard the ball was hit (“exit velocity”) and the estimated angle at which it left the bat (“vertical angle”).  In addition to the contact quality model, I’ve also developed a parallel model to estimate the defense-independent expected run value from batted-ball data (yes, contact quality and defense-independent run value are two different things.)

Relationship to FIP

The key difference between the Comprehensive Contact Quality Model and FIP is the integration of expected home runs allowed into the analysis.   Various metrics such as xFIP have attempted to account for the volatility in HR% by normalizing this rate as a fixed percentage of fly balls allowed.  A different perspective is to treat home runs as one extreme in a broad spectrum of contact quality:

           Swinging strike < Foul tip < Weakly-hit fair ball < Well-hit fair ball

This spectrum ranks how well the hitter has “squared up” on the ball, with better-struck balls further to the right. Home runs can be considered a subset of well-hit fair balls, where the likelihood of actually becoming a four-bagger depends primarily upon the distance travelled, which itself is a function of exit velocity, vertical angle, and a host of other factors.   So, when we talk about a pitcher’s ability to limit the long ball, what we’re really talking about is his ability (if any) to prevent the ball from being hit hard at an optimum angle to leave the park.

With that brief introduction, let’s outline the framework for valuing the contact quality on any batted ball.  First, for balls hit in the air:

Step 1  – Estimate the Probability of a Home Run

For this first iteration of the model, I made the following simplifying assumptions:

  • Exactly 1/30 of all outfield fly balls are hit in each MLB ballpark
  • The direction of these balls is distributed 20% LF to LC, 30% LC to CF, 30% CF to RC, and 20% RC to RF
  • Outfield dimensions are as currently posted in Wikipedia

Also since distance in the MLBAM data is measured to the assumed landing point, we also need to adjust for the height of the outfield wall.   To do this, I used Dr. Alan Nathan’s excellent trajectory calculator to estimate the complete distance traveled by a ball that is W feet above the ground when it passes over the outfield wall, where W is the height of the wall.   Note that this distance will be further for line drives than it will be for high flies, so the necessary distance for a home run will depend upon both the listed distance to the wall and the vertical angle of the batted ball.

[Caution – next section is somewhat technical; you can safely skip and not miss the gist of this article]

One problem with the MLBAM data found on Baseball Savant is that batted-ball angles are only available for home runs.  For other batted balls, we can use the fact that we have both the batted-ball velocity and distance to back-solve for the vertical angle:

1.  Make grid of distance = f(exit vel, angle), using the default settings in Dr. Nathan’s trajectory calculator:

(Key values shown below – columns are vertical angle, rows are exit velocity)

0 5 10 15 20 25 30 35 40 50 60
60 49 79 111 138 159 173 182 186 185 169 137
65 54 91 129 159 182 198 207 210 208 188 152
70 60 105 148 182 207 223 232 234 231 208 166
75 66 120 169 207 233 249 258 259 254 227 180
80 72 136 192 233 260 276 284 284 277 246 194
85 79 155 217 260 288 304 311 309 301 265 207
90 87 175 244 289 317 332 338 334 324 283 220
95 95 198 272 318 346 361 365 360 347 302 233
100 105 223 302 349 376 389 392 385 370 320 245
105 115 249 332 380 406 418 419 410 393 338 256
110 127 277 363 411 436 446 445 434 415 355 268
115 141 307 394 442 466 474 471 458 437 371 278
120 156 338 426 472 495 502 497 482 458 387 288

2.  Distance peaks at a certain “optimal” vertical angle then decreases.  This means that there are 2 possible solutions for the vertical angle when doing a lookup based upon distance and exit velocity.  Lacking any other information, I used the batted-ball type recorded by the Baseball Scoresheet stringers to guide which value to use:

LD uses lower of the two angles, PU uses higher of the two, FB uses mean of the two

This becomes our estimate of vertical angle on the batted ball.

[End of technical note]

Now, for each of the 30 MLB ballparks, we can use the combination of distance and vertical angle to estimate the probability of a homer, assuming the pull/center/opposite mix assumed above (note – version 0.0 of this model does not reflect batted ball direction).  After averaging across all ballparks, we get a grid of home run probabilities for any outfield fly ball:

0 5 10 15 20 25 30 35 40 50 60 Actual
300 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
310 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.0%
320 0.0% 0.0% 0.0% 0.0% 0.1% 0.1% 0.1% 0.1% 0.2% 0.2% 0.3% 0.1%
330 0.0% 0.0% 0.1% 0.2% 0.3% 0.4% 0.5% 0.7% 1.0% 1.4% 1.7% 0.6%
340 0.0% 0.0% 0.2% 0.6% 1.3% 2.5% 3.9% 5.0% 6.0% 7.4% 8.2% 1.7%
350 0.0% 0.0% 0.9% 4.4% 8.1% 11.1% 13.2% 14.7% 15.9% 17.6% 18.5% 3.2%
360 0.0% 0.1% 6.0% 13.4% 18.7% 22.1% 24.3% 25.9% 27.1% 28.8% 29.8% 11.1%
370 0.0% 0.4% 15.3% 24.0% 30.4% 34.0% 36.3% 38.0% 39.2% 41.0% 42.0% 15.3%
380 0.0% 2.7% 25.7% 36.8% 43.0% 46.3% 48.3% 49.9% 51.0% 52.8% 53.8% 29.1%
390 0.0% 10.0% 36.1% 50.0% 55.2% 58.7% 61.3% 63.2% 64.8% 67.0% 68.3% 40.7%
400 0.0% 19.0% 47.8% 63.5% 70.1% 74.5% 77.5% 79.8% 81.5% 84.0% 85.3% 54.1%
410 0.0% 28.1% 61.8% 80.1% 86.6% 90.0% 91.7% 92.9% 93.7% 94.8% 95.4% 76.3%
420 0.0% 37.6% 78.0% 92.2% 95.2% 96.4% 97.0% 97.5% 97.9% 98.2% 98.3% 90.3%
430 0.0% 50.1% 88.9% 96.6% 97.7% 98.3% 98.7% 98.8% 98.9% 99.0% 99.1% 94.8%
440 0.0% 64.6% 93.5% 98.0% 99.0% 99.4% 99.5% 99.6% 99.7% 99.7% 99.8% 96.7%

Step 2 – Estimate BABIP if Not a Home Run

One big benefit of hitting the ball over the fence is that virtually no chance of making an out.  For balls hit in the air to the outfield, however, there typically three guys whose goal it is to catch the ball in order to get the batter out.  Now while a little bit of extra loft on a hard-hit OF fly can improve the chance of a dinger, for balls that stay in play the relationship between BABIP and vertical angle is essentially linear (using first-half 2015 data):

    BABIP if hit in the air to OF = .9698 – .0256 * MIN(37.5, angle)

We will use this in conjunction with the next step to determine the run value of a non-homer fly/popup/liner.

Step 3 – Estimate Expected Run Value If A Hit (Non-HR)

For balls not caught by the outfielder, the chances for an extra-base hit vary by vertical angle and also increase for higher exit velocities.  Regressing the first-half 2015 data (using hits to the outfield only) results in this estimate:

RV if hit to OF =  -1.06 + 0.0206*velocity – 0.00006*velocity^2 + 0.0223*angle – 0.000318*angle^2

We can now calculate the contact-quality run value as:

     CQRV = (1.38 x HR Probability) + (RV if hit to OF x (1 – HR Probability))

Contact Quality Run Values for Ground Balls

For ground balls, the expected run value increases with increasing exit velocity.  We can estimate the CQRV directly from the following regression equation:

CQRV = 0.35-0.0174*velocity+0.00014*velocity^2, if velocity > 65; else CQRV = -0.19

Note that the expected run value is set to -0.19 for velocity less than 65 MPH.  This is because the run expectancy actually improves for grounders hit at a very low speed (basically dribblers and slow rollers).  Because this is a model of contact quality, we are not going to penalize the pitcher for poor batted-ball luck when the actual quality of contact is low.

This leads us to a discussion of the last key feature of the model….

Contact Quality vs. Expected Batted-Ball Result

The CQ model is designed to produce higher run values for better quality of contact.   However, as discussed in Tony Blengino’s enlightening series on batted-ball outcomes, real-life BABIP doesn’t improve continuously with higher batted-ball velocity, but instead actually decreases over the stretch between balls hit relatively shallow and balls hit to the deeper parts of the outfield.  The CQ model calculates BABIP as a function of vertical angle in order to avoid rewarding pitchers for the better-struck balls that fall into the “donut hole” near the depths where outfielders normally position themselves.

I chose vertical angle to model BABIP for the CQ framework because of its close relationship to hang time, which in turn is a key component of the likelihood of the outfielder making the putout.  In reality, batted-ball location also plays an important role in determining whether a fielder can range into position to catch the ball.  To model this more realistic BABIP, I estimated what proportion of balls hit a certain distance would be reachable by one of the three outfielders, given a certain amount of hang time (note – hang time can be estimated by Dr. Nathan’s trajectory calculator based upon exit velocity and vertical angle).    For example, an arc 320 feet from home plate is roughly 502 feet long from foul line to foul line.   If we assume that each outfielder can cover 52 feet in 3.0 seconds, then we can draw a circle with a 52 foot radius from each fielder’s initial position and estimate the overlap between the arc and these circles to be about 237 feet.  So we assign a 47% chance (237 divided by 502) of catching a fly ball hit 320 feet with a 3.0 second hang time.  If we increase the hang time to 4.0 seconds, the coverage circles now have an 87 foot radius, and 479 feet of the arc are covered, for a 95% chance of an out.

Here is how the more realistic BABIP varies based upon both batted-ball distance and hang-time.  Note the “donut hole” for balls hit around 300 feet with hang times in the neighborhood of 4 seconds.

           1.0            1.5            2.0            2.5            3.0            3.5            4.0            4.5            5.0
200    1.000    1.000    1.000    1.000    1.000    1.000    0.711    0.400    0.005
210    1.000    1.000    1.000    1.000    1.000    0.925    0.589    0.318          –
220    1.000    1.000    1.000    1.000    1.000    0.761    0.523    0.217          –
230    1.000    1.000    1.000    1.000    0.889    0.666    0.485    0.161          –
240    1.000    1.000    1.000    0.960    0.772    0.618    0.353    0.136          –
250    1.000    1.000    0.971    0.828    0.696    0.528    0.254    0.061          –
260    1.000    0.932    0.857    0.757    0.646    0.403    0.180          –          –
270    0.919    0.863    0.802    0.717    0.555    0.314    0.120          –          –
280    0.886    0.838    0.783    0.678    0.468    0.258    0.073          –          –
290    0.884    0.834    0.762    0.598    0.419    0.217    0.035          –          –
300    0.918    0.823    0.721    0.579    0.413    0.218    0.038          –          –
310    0.956    0.853    0.741    0.588    0.414    0.211    0.020          –          –
320    0.941    0.916    0.857    0.663    0.470    0.263    0.059          –          –
330    0.943    0.919    0.891    0.807    0.556    0.330    0.104          –          –
340    0.962    0.936    0.908    0.869    0.714    0.444    0.205    0.029          –
350    1.000    0.967    0.931    0.883    0.830    0.576    0.315    0.118          –
360    1.000    1.000    0.985    0.911    0.843    0.726    0.434    0.212    0.043
370    1.000    1.000    1.000    0.977    0.870    0.783    0.559    0.317    0.144
380    1.000    1.000    1.000    1.000    0.933    0.799    0.691    0.428    0.248
390    1.000    1.000    1.000    1.000    1.000    0.856    0.712    0.525    0.339
400    1.000    1.000    1.000    1.000    1.000    0.956    0.759    0.603    0.420
410    1.000    1.000    1.000    1.000    1.000    1.000    0.866    0.716    0.487
420    1.000    1.000    1.000    1.000    1.000    1.000    1.000    0.749    0.574
430    1.000    1.000    1.000    1.000    1.000    1.000    1.000    0.827    0.637
440    1.000    1.000    1.000    1.000    1.000    1.000    1.000    0.944    0.704

This neatly explains why fly balls hit at 85 MPH often result in an out, while line drives hit that hard are most often base hits.

Angle 0 5 10 15 20 25 30
Distance          79        155        217        260        288        304        311
Hang Time        0.7        1.4        2.2        2.9        3.5        4.1        4.5
BABIP    1.000    0.669    0.229    0.032          –

If we substitute the hang-time based BABIP for the vertical-angle based BABIP used in the CQ model, we obtain a batted-ball-data expected run value that is more realistic and truly fielder-independent.  Unfortunately, this metric (let’s call it BBRV) doesn’t do as well as CCRV in measuring the actual quality of contact, since it rewards a pitcher allowing an 85 MPH/25 degree angle fly (.032 expected BABIP) more than a pitcher who gives up a 75MPH/25 degree bloop (.537 expected BABIP).

In short, we can see that fielding-independent pitching consists of two parts:  contact quality allowed, and batted-ball luck.

Some Actual Results…

Well, with all that said, what does CCRV version 0.0 tell us about pitchers so far in 2015?

First, let’s look at the actual run expectancy above average allowed on batted balls (using linear weights).  Here are the top 5 and bottom 5 through the first half of 2015:

Sonny Gray         (18.9)
Zack Greinke         (18.8)
Dallas Keuchel          (16.3)
Jacob deGrom          (12.1)
Chris Young          (11.5)
Ian Kennedy            20.4
CC Sabathia            21.5
Kyle Lohse            21.7
Kyle Kendrick            22.2
James Shields            23.0

No real surprises for those who’ve followed this year’s FIP/BABIP outliers (though Greinke’s never been this successful on batted balls – maybe he’s the guy who’s heisted Kyle Lohse’s secret formula for contact management.)

Now, let’s look at CQRV:

Pitcher CQRV Expected Run Value Actual Run Value
Sonny Gray              (8.8)              (9.2)            (18.9)
Brad Ziegler              (7.1)              (6.5)            (11.2)
Clayton Kershaw              (6.6)              (3.2)                7.3
Brandon Maurer              (6.4)              (6.1)            (10.0)
Alex Wilson              (6.0)              (6.9)              (3.3)
Kyle Lohse                13.7                12.9                21.7
Jerome Williams                14.3                16.0                18.7
Phil Hughes                16.9                16.4                17.5
Josh Collmenter                17.6                18.6              14.0
Kyle Kendrick               23.3                22.5               22.2

The only mildly interesting name in the bottom five is Phil Hughes, who has returned to allowing a high HR% after conquering the gopher ball in 2014.  In the top five, we see saber-fave Brad Ziegler, whose ridiculous .177 BABIP/0.45 HR/9 combo is driven far more by low contact quality than by batted ball/defensive luck.  We also see two very surprising names at #4 and #5.   Brandon Maurer has allowed a .238 BABIP along with just 1 HR in 44 innings, thanks to a career high 27% soft hit percentage alongside a career low 21% hard hit percentage.  Alex Wilson has likewise improved his contact management numbers (25% soft hit/21% hard hit) to drive a .270 BABIP with just 2 longballs allowed.

Finally, it’s interesting to note Clayton Kershaw’s numbers.  Despite having a BABIP north of .300 for the first time since his rookie season, Kershaw has been well above average in terms of stifling contact quality.  But, between having fewer fly balls than average dying in the outfield “donut holes” (3 runs) and other batted-ball/defensive factors (10 runs), Kershaw has been a few runs worse than average on balls in play. (Not that he needs any help to remain brilliant).

Conclusion

I have chosen to call this version 0.0 of the CCQM framework because in essence this is as much a “proof of concept” as a potential tool.   Two key areas will require continuous research and review to fully power up this model.

First, the raw data used to develop the model is new and evolving.  As more MLBAM data becomes publically available, there will be a more robust historical track record of fundamental physical stats behind every play made, which will improve the reliability of the model.

Second, the framework itself needs to be tested further to make sure that any variables that truly affect contact quality are considered.  For example, I consciously chose to not include batted-ball direction as a factor for this first version of the model in order to avoid extra complexity.  In effect, this was equivalent to a null hypothesis that pitchers cannot influence batted-ball direction.  It would be foolish not to test the validity of this assumption for future iterations of the model to see if there are pitchers who consistently show the ability to improve their performance by influencing the batted-ball direction, all other factors being equal.

My hope is that the CCQM model sparks a fresh round of discussions on the whole notion of contact quality, leveraging this whole new generation of metrics at our disposal.