Introducing BERA: Another ERA Estimator to Confuse You All

Coming up with BERA… like its [almost] namesake might say, it was 90% mental, and the other half was physical.  OK, maybe he’d say something more along the lines of “what the hell is this…” but that’s beside the point.    By BERA, I mean BABIP-estimating ERA (or something like that… maybe one of you can come up with something fancier).  It’s an ERA estimator that’s along the lines of SIERA, only it’s simpler, and—dare I say—better.

You know, I started out not knowing where I was going, so I was worried I might not get there.  As you may recall, I’ve been pondering pitcher BABIPs for a little while here (see article 1 and article 2), and whereas my focus thus far had been on explaining big-picture, long-term BABIP stuff in terms of batted ball data, one question that remained was how well this info could be used to predict future BABIPs.  After monkeying around with answering that question, though, I saw that SIERA’s BABIP component could be improved upon, so I set to work in coming up with BERA.  In doing so, I definitely piggybacked off of FIP and a little of what SIERA had already done.  You can observe a lot just by watching, you know.   I’m also a believer in “less is more” (except for when it comes to the size of my articles, obviously), so I tried to go for the best compromise of simplicity and accuracy that I could.

Read the rest of this entry »


Is Rebuilding Worth It?

Every year the least competitive MLB teams decide whether they will commit to “going for it” the next season, or take a step back and wait for some of their cost-controlled young players to develop into big league contributors, then invest money in the team at that time a year or two down the road.  If the situation is dire, the media and baseball executives alike will start kicking the tires on an organization needing an all-out rebuild.  In this case, teams trade away every expensive, though often productive, veteran for young prospects that can hopefully help form a more competitive and sustainable team in a few years in part due to a higher production to salary ratio.  A judgment is made that investing money into the major league portion of the organization will not yield worthwhile results in the upcoming seasons, leading to declining attendance and television ratings.  That money would be better spent on the draft and developing the players acquired through trades of the more expensive players on the team.  These often publicly announced plans usually have estimated times to completion ranging from 3-5 years, often coinciding with a new baseball executive’s contract length within a year or two.  I set out to measure the results of this strategy as it applies to total revenue, as well as how it works out in terms of return on investment.

Read the rest of this entry »


BABIP and Innings Pitched (Plus, Explaining Popups)

In my last post on explaining pitchers’ BABIPs by way of their batted ball rates, I was very careful to say that it was applicable in the long run, as it’s hard to be accurate over a short number of innings pitched, due to all the “noise” in BABIP (Batting Average on Balls In Play).  I only used pitchers with a qualifying number of innings pitched (IP) in the calculations, for that reason.  After writing the post, I did some messing around with the data, to find out just how much of an effect IP had on the predictability of BABIP.

Hold on to your propeller beanies, fellow stat geeks: the correlation between xBABIP and BABIP went from 0.805 when the minimum IP was set to 1500, to 0.632 at a 200 IP minimum, down to 0.518 at 50 IP.  OK, maybe it’s not that surprising.  Still, I thought I’d better show you how confident you can be in my xBABIP formula’s accuracy when you take the pitcher’s innings pitched into account.

The formula, again: xBABIP = 0.4*LD% – 0.6*FB%*IFFB% + 0.235

And remember, that formula is primarily meant to be a backwards-looking estimator of “true,” defense-neutral BABIP.  My next article will (probably) discuss another formula I’ve come up with that’s more forward-looking.

Read the rest of this entry »


Projecting BABIP Using Batted Ball Data

Hi everybody, this is my first post here. Today, I’ll be sharing some of my BABIP research with you. There will probably be several more in the near future.

Now, I don’t know about you, but Voros McCracken’s famous thesis stating that pitchers have practically no control over their batting average on balls in play (BABIP) always seemed counterintuitive to me, ever since I heard it about 10 years ago. Basically, my thought this whole time was that if an Average Joe were pitching to an MLB lineup, the hitters would rarely be fooled by the pitches, and would be crushing most of them, making it very tough on the fielders. Think Home Run Derby (only with a lot more walks). Now, the worst MLB pitcher is a lot closer in ability to the best pitcher than he is to an Average Joe, but there still must be a spectrum amongst MLB pitchers relating to their BABIP, I figured. After crunching some numbers, I have to say that intuition hasn’t completely failed me.

This is going to be a long article, so if you want the main point right here, right now, it’s this: in the long run, about 40% or more of the difference in pitchers’ BABIPs can be explained by two factors that are independent of their team’s defense: how often batters hit infield fly balls and line drives off of them. It is more difficult to predict on a yearly basis, where I can only say that those factors can predict over 22% of the difference. Line drive rates are fairly inconsistent, but pop fly rates are among the more predictable pitching stats (about as much as K/BB). I’ll explain the formula at the very end of the article.

Read the rest of this entry »


Part II: Curveball Velocity, Location, or Movement: What is more important?

Stated in as simplest terms as possible, the goal of pitching is to get batters out without allowing runs to score. There are three ways any given pitch can get a batter out. A pitch can either be swung on and missed, taken for a called strike, or batted in such a way that the batted ball does not result in the runner reaching base. Batted balls involve the defence and are therefore less directly related to the pitch’s effectiveness at getting outs. That leaves us with swinging strikes and called strikes as the two best ways to measure a pitch’s effectiveness.

In Part I of my research on curveballs, I looked at what makes a curveball effective from a swinging strike perspective. I used an outcome variable that I like to call: ratio of effectiveness. Ratio of effectiveness is simply a ratio between swinging strikes and home runs hit. In Part II of my research, I will look at the effectiveness of curveballs from a called strike perspective. This work will aim to answer two basic questions: 1) are curveballs taken for strikes more often than fastballs? And 2) what are the characteristics of curveballs most often taken for strikes?

Are curveballs taken for strikes more often than fastballs?

Read the rest of this entry »


Infield Fly Proposal

117 years ago, in response to an epidemic of infielders intentionally dropping popups to attempt double plays instead, the National League adopted the infield fly rule, and with some minor adjustments, the rule has survived to the present. Like many remedies from the 1800s, the intent- protecting the offense from chicanery- was good, but the implementation- calling their batter automatically out- was fraught with problems.

First, and most obviously in light of recent events, even when the defense can’t make the play, the rule intended to protect the offense punishes them by giving the defense the out anyway. Second, any time a fly ball can be intentionally dropped for a good shot at a double play, the offense should be protected from that, but because the play requires calling the batter automatically out, the rule as written can’t be invoked liberally. Third, and related to the second, the umpires have to make a judgment call based on the trajectory of the ball, the position of the fielder, environmental factors, and anything else they consider relevant to determining “ordinary effort”. That leads to late calls and inconsistent application.

Read the rest of this entry »


Part I: Curveball Velocity, Location, or Movement: What is more important?

The curveball is often used as an ‘out’ pitch. This implies either it is difficult to hit or is often taken for a called strike. I was interested in exploring both of those possibilities, and as such, I have decided to present research addressing both. Part I, presented below, addresses the questions of how difficult the curveball is to hit and what makes it difficult to hit.

Earlier this week, I shared some research about the relative importance of velocity, location, and movement with respects to major league fastballs. The approaches I used to answer the curveball problem were very similar to the approaches I described previously. Again, I used the 2011 MLB season as my dataset, and included only pitches to right handed batters. Since curveballs are thrown far less frequently than fastballs, this time I included both right and left handed pitchers to increase my sample size. Another reason I wanted to include lefties is I wanted to know if the direction of the horizontal break mattered.

Is a curveball more difficult to hit than a fastball?

Read the rest of this entry »


What is More Important for a Fastball: Velocity, Location, or Movement?

Velocity, location, and movement are all unquestionably important when we try and compare ‘good’ pitches to ‘bad’ pitches. My particular interest lies in how important each are. I’ve often wondered if a 98 mph cutting fastball can be thrown right down the middle and still have little chance of being hit for a home run? Or conversely, is an 88 mph fastball that’s straight as an arrow still likely to get a swinging strike if it paints the bottom outside corner of the zone?

Read the rest of this entry »


Comparing 2011 Pitcher Forecasts

This article is the second of a two part series evaluating 2011 baseball player forecasts. The first looks at hitters and found that forecast averages outperform any particular forecasting system. For pitchers, it appears as though the results are somewhat reversed. Structural forecasts that are computed using “deep” statistics (k/9, hr/fb%, etc.) seem to have done particularly well.

As with the other article, I will look at two main bases of comparison: Root Mean Squared Error both with and without bias. Bias is important to consider because it is easily removed from a forecast and it can mask an otherwise good forecasting approach. For example Fangraphs Fan hitter projections are often quite biased, but are very good at predicting numbers when this bias is removed.

Read the rest of this entry »


Comparing 2011 Hitter Forecasts

This article is an update to the article I wrote last year on Fangraphs.

This year, I’m going to look at the forecasting performance of 12 different baseball player forecasting systems. I will look at two main bases of comparison: Root Mean Squared Error both with and without bias. Bias is important to consider because it is easily removed from a forecast and it can mask an otherwise good forecasting approach. For example, Fangraphs Fan projections are often quite biased, but are very good at predicting numbers when this bias is removed.

Read the rest of this entry »