Archive for December, 2014

Prepping You for an Ottoneu Initial Auction

If you are a Fantasy Baseball fan and read FanGraphs, you are probably contemplating joining an Ottoneu league, if you have not already. Three years ago, I was in the same boat. I had experience with snake drafts and was looking for something different. Ottoneu provides that. Dynasty, minor league players and owners must win a player through an auction. I took the plunge three years ago and was not prepared for the initial auction and wanted to share my experiences with some advice sprinkled in.

For the initial auction, make sure you can devote the ENTIRE DAY and probably some of the night. With 12 people and 40 players to roster, that is 480 auctions. Not to mention the inevitable mistakes (wrong Raul Mondesi) and an owner’s internet problems (it will happen to at least one person) and someone showing up late. Oh and those sweet, sweet bathroom breaks. This was my first auction, and if you have never done a real one before, you are in for a treat.

In a snake draft, if you pick ninth, there is only a slight chance you will get the stud you want. In an auction, you WILL get him. Just be prepared. If you pick Trout for example, be prepared to be in a bidding war with several other owners. Popularity comes in to play here. There are some players that are first-rounders that just aren’t as popular because they play for a team that everyone in your league hates, or they play on the west coast and your league is filled with east coast snobs. Once you have a player identified, keep bidding until you win him. You are thinking, “But what about the budget? It’s only $400 and there are 40 positions on my roster!” Do not worry about that now. Now is your chance to get the player YOU WANT. Not the player who is the best available based on your random draft position.

So now, you got the player you wanted. The power is coursing through your veins. You feel like you have already won. Snap out of it! That is only one spot filled. You have to start 22 players in total and bench another 18! That stud represents less than 5% of your total starting lineup. A thought will cross your mind that you just spent a big portion of your payroll on less than 5% of your team! What have you done?! If he gets hurt, you are screwed right? Wrong. Do not freak out and sit out bidding for the next several picks to let everyone else’s remaining budgets catch up. Why? Because you will be surprised on the caliber of players won for a buck or two during the final rounds, after everyone has spent most of their money.

It is within the next few rounds where you can still get top-notch talent, but for a lot less than what the first round players went for. Popularity comes into play again here. Ottoneu owners favor young players because they can be kept year-to-year. Use this to your advantage here. It is in these rounds where you can get older studs at a discount. In addition, other owners’ pocketbooks are still stinging from their first round purchase. You can get some very good, post peak, players for around half of what players went for in the first round. Three years ago, I got Beltre and Konerko for less than $20 each. Within these next few rounds, still go after players you want, knowing that the final price will not hit nearly the same astronomical levels as the first round.

After 3 or 4 rounds, you have a nice nucleus of top-notch talent on your squad. This is when you can start looking at the tiers of players within your positional rankings. Also, start paying attention to how many owners are actively bidding up a player you want. You will see many owners bidding small amounts, looking for a bargain. Ignore them. Pay attention to how many owners are bidding as if they really want the player. If several owners are actively bidding, bow out, and go after the next player within the same tier. That next player should have one less owner bidding on him and he should come cheaper.

During the later rounds, start focusing on those high-upside starters, sleeper types. Do not bid on players who might fall to the waiver wire. If your sleepers do not pan out, the waiver wire will have plenty of serviceable players. Furthermore, it is better to see what a players is doing in April before bidding on him. Again, you will be amazed at the player that is won for a few bucks during the last rounds. This is where you will start prospecting. Remember, they are prospects! Do not spend a ton on prospects. The flameout rate is just too high. Another losing endeavor is taking the heir apparent to an older star you got. I took Mike Olt thinking once Beltre hangs them up; Olt will take over and not miss a beat. Minor league prospects change hands more than a $20 bill at a white elephant gift exchange. This combined with their high flameout rate is a doomed strategy. Focus on guys with an ETA of current year + 1 and don’t spend more than a few dollars on each one.

Finally, keep some powder dry for free agents. Every year, players come out of nowhere and surprise everybody. There are less owners actively bidding during the regular season than during the initial auction so the player should cost less. Throughout the season, you can monitor players off to fast starts and check FanGraphs to know when those small sample sizes can start to be taken seriously.


Jon Niese Is Changing It Up

Mets southpaw Jon Niese has something interesting going on and if the trend continues, he might not be so average in 2015.

One thing I enjoy doing is comparing 1st and 2nd half splits to spot anomalies and possible mid-season skill growth of players.  Niese’s 2014 splits stood out remarkably on two of my favorite metrics:  First pitch strike % (F-Strike) and Swinging Strike % (SwStr).

Niese has a career 61.4% F-Strike and 7.8 SwStr.  Last year’s first half he was struggling along with a 59.2% F-Strike and 5.8% SwStr.  Then something changed.  In the 2nd half his F-Strike soared to an elite level 67.8%.  SwStr rebounded to 8.6%.  What happened?

A solid changeup happened.

Read the rest of this entry »


Alex Wood Poised to Be NL Version of Chris Sale

As the Braves enter the 2015 season, the roster is expected to be extremely pitching-heavy. In John Hart’s first offseason as the Braves’ President of Baseball Operations, he has made two huge moves by sending outfielders Justin Upton and Jason Heyward to the Padres and the Cardinals, respectively. While the return in the Upton trade was primarily made up of lower level prospects at least a few years away from helping the big club, the Braves did land RHP Shelby Miller in the Heyward/Cardinals trade. Although the Braves lost two key pieces to the offense, the pitching staff could be very strong, especially if Miller can bounce back to 2013 form (3.06 ERA, 3.67 FIP). Miller will join a rotation that returns three starters in Julio Teheran, Mike Minor, and Alex Wood. 

Minor enters the 2015 season in a very similar situation to Miller’s. After a breakout year in 2013 (3.21 ERA, 3.5 WAR), Minor was worth just 0.2 wins in 25 starts in 2014. While Teheran is considered to be the budding star ($32M contract extension and a 3.2 WAR for his age 23 season), Wood could be the pitcher that steals the show.

Following a three-year stint at the University of Georgia (one of which shortened by Tommy John surgery), the Braves drafted Wood in the 2nd round of the 2012 draft. Wood would go on to make his MLB debut less than one year later, after starting just 23 minor league games. Wood would appear in 16 games as a reliever, before transitioning into a starting role in the second half of the 2013 season. Wood’s final numbers for his age-22 season looked very similar to those numbers of another funky-delivering 22-year-old southpaw, Chris Sale:

Pitcher    Inn          K/9          BB/9        GB%        FIP

Sale            71            10.01        3.42           49.7%         3.12

Wood        77.2         8.92          3.13           49.1%         2.65

 

Unlike Wood, Sale spent the entire 2011 season working out of the bullpen, and had even pitched in 23 innings in the 2010 season, joining Chicago’s bullpen less than two months after signing with the White Sox. Following Sale’s strong 2011 season, the White Sox decided it was time to move Sale to the rotation. The Braves had similar plans for Wood in 2014, although the club planned to limit his innings in his first full year with Atlanta, and Wood would only make 24 starts for the season, along with 11 more relief appearances. For the age-23 seasons, the numbers look very similar, yet again:

 Pitcher    Inn          K/9         BB/9       GB%        FIP

 Sale            192           9.0            2.39          44.9%       3.27

Wood         171.2         8.91          2.36          45.9%       3.25

Another key factor for both pitchers is the durability later in the season. While many young pitchers wear down later in the year, both pitchers got stronger as the year went on. Sale’s strikeout numbers increased (9.41 to 10.64), and his walks and FIP saw a significant decrease as well. The same would hold true for Wood, as his K/9 rate went from a first half 8.61 to a second half 9.21, with a decrease in walks and a 3.05 second half FIP. The midseason move to the pen (to help limit innings) could have given Wood a fresher arm for his full-time return to the rotation in the second half, but whatever the reason may be, he was one of the top NL starters down the stretch.

As for Sale, his improving performance went well beyond the second half of the 2012 season, as the lefty would go on to post back-to-back 5+ WARs over the next two seasons, as well as finishing in the top 5 for the AL Cy Young Award in both 2013 and 2014. The key for Sale was his ability to continue to improve his strikeout numbers, while also cutting down on his walks. Sale’s 1.97 BB/9 over the last two years has been good for 7th among all American League starters. During that same time frame, Sale’s 10.06 K/9 rate trailed only Yu Darvish and Max Scherzer. Only 14 pitchers were able to top Wood’s 3.25 FIP as well as his his 3.78 K/BB in 2014. If he can push the FIP closer to 3.00, as well as the K/BB over 4.00, we could be looking at another Chris Sale.


The Ghosts of Designated Hitters Past and Designated Hitters Yet to Come

Among the flurry of deals announced over the past month and a half, a couple raised eyebrows:

(That’s how the transactions were listed on mlb.com. I have no idea why Billy Butler, who started 108 games at DH in 2014 and 35 at first base, is listed as a DH, but Kendrys Morales, who started 71 games at DH and 26 at first, is listed as a first baseman.)

The logic behind these signings made sense superficially: The A’s signed Butler, who was the Royals’ DH in 2014, because Oakland DHs hit a middle-infielder-esque .215/.294/.343 last year. The Royals signed Morales to take Butler’s place. What was a little more surprising was the money: three years for $30 million for Butler, two years for $15.5 million plus an $11 million mutual option/$1.5 million buyout in 2017 for Morales.

The reason that’s surprising is that both were below-average hitters in 2014. Butler had a wRC+ of 79 as a DH, while Morales’s was 62. Among the eleven players with at least 200 plate appearances at DH last season, Morales’s wRC+ ranked eleventh and Butler’s ninth.

That’s the thing about designated hitters: They play the ultimate You had one job… position. All they’re supposed to do is hit. There will never be a Derek-versus-Ozzie, bat-versus-glove debate about DHs. They’re all bat. And while wRC+ doesn’t encompass baserunning contributions, DHs are generally plodders like Butler, who went from first to third on a single only once in 31 opportunities last year, so that doesn’t differentiate them. (There have been only 12 seasons in which a player’s gotten more than 15 stolen bases as a DH, and five of those were by one guy, Paul Molitor.)

(Another DH fun fact: The American League adopted the designated hitter after the 1972 season, when the league batted .239/.306/.343, equating to a .297 wOBA. The worst season since then? 2014: .253/.316/.390, .312 wOBA.)

Designated hitters are paid to hit, not to field and run. So why do DHs who are below-average hitters stay on rosters, much less sign multi-year free agent contracts? Before I try to answer that, here’s another tidbit about designated hitters. This is a list of the number of American League players, by position, who qualified for the batting championship last year (i.e., 502 plate appearances):

  • Catchers: 1
  • First basemen: 4
  • Second basemen: 8
  • Shortstops: 9
  • Third basemen: 7
  • Left fielders: 5
  • Center fielders: 7
  • Right fielders: 5
  • Designated hitters: 1

Salvador Perez was the only player to amass 502 plate appearances as a catcher. That’s understandable, given the demands of the position. But why was David Ortiz the only player to get 502 plate appearances as a DH? It’s clearly not the physical strain of being a designated hitter. So let’s lower the bar a bit and count the number of players, by position, to get 400 plate appearances–regulars, if not batting title qualifiers:

  • Catchers: 9
  • First basemen: 10
  • Second baseman: 11
  • Shortstops: 11
  • Third basemen: 11
  • Left fielders: 7
  • Center fielders: 10
  • Right fielders: 7
  • Designate hitters: 4

Yikes. That makes it look even worse. There were fewer regular designated hitters than there were regulars at any other position. Has this always been the case: unremarkable hitters who aren’t even regulars?

To answer this question, the chart below shows, for every season since the advent of the DH in 1973, the percentage of teams with a DH with 502 plate appearances, the percentage with a DH with 400 plate appearances, and the aggregate OPS+ for all DHs. (I chose the percentage of teams, rather than the the number of DHs, to account for the increase in American League teams from 12 in 1973 to 14 in 1976 and 15 in 2013. And I used OPS+ because that’s the only relative  metric I could find with splits data going all the back to 1973.)

*Strike-shortened year; playing time data prorated.
Source: baseball-reference.com, using the Play Index Split Finder.

As you can see, there were roughly three eras for DHs:

  • 1973-1993: Teams trying to figure out how to optimize the position, with playing time and performance fluctuating, including a nadir of a 96 OPS+ in 1985
  • 1994-2007: Slightly fewer regular DHs but the position generating the most offense in its 42-year history
  • 2008-2014: Reduced offense and sharply fewer full-time DHs

Let’s examine those three eras in detail. When the DH was first implemented, American League teams relied heavily on aging sluggers. From 1973 to 1976, the DHs with the most plate appearances were Tommy Davis (in his age 34-37 seasons), Tony Oliva (34-37), Frank Robinson (37-40), Deron Johnson (34-37), Willie Horton (30-33), and Rico Carty (33-36).

This began to shift with Hal McRae, whom the Royals acquired via a trade with the Reds at the end of the 1972 season, when he was 27. He started 134 games at DH in 1973-75, and was a full-time DH for the remainder of his career. He received MVP votes in four seasons. In 1978, 29-year-old Angel Don Baylor finished seventh in the MVP vote, primarily as a DH (102 starts at DH, 56 in the field) and he won the MVP the next year starting 97 games in the outfield and 65 at DH. There were still plenty of old DHs by the late 1970s — Horton was 36 in 1979 when he became one of only two DHs in history to play 162 games at the position — but it wasn’t the exclusive province of old guys.

Still, there were variations in play. The year 1980 is the only non-strike-shortened season in which no DH qualified for the batting title, and that was largely because of a changing of the guard: Carty retired, Lee May and Mitchell Page neared the end of their careers, Horton played his last season, and Rusty Staub inexplicably got 40% of his plate appearances as a 1B/OF.

As you can see by the chart, the performance of DHs took off after 1993, a year during which four former MVPs and/or future Hall of Famers provided 400+ plate appearances of subpar performance as a DH: George Brett (.265/.311/.431, 95 OPS+), Andre Dawson (.266/.308/.432, 94 OPS+), Dave Winfield (.258/.313/.406, 90 OPS+), and George Bell (.217/.243/.363, 59 OPS+). Brett, 40, and Bell, 33, were in their last seasons, while Winfield, 41, and Dawson, 38, were in their last years as regulars.

That led to another changing of the guard in 1994, and fourteen straight years in which DH OPS+ was 105 or higher, accounting for half of the 28 such seasons in the DH’s 42-year history. Of the nine seasons during which DHs had a combined OPS+ greater than 110, six occurred during 1994-2007. This was the heyday of Edgar Martinez, the greatest DH, of course, but Ortiz (4), Chili Davis (3), Travis Hafner (3), Frank Thomas (3), Ellis Burks (2),  Jose Canseco (2), Juan Gonzalez (2), and Jim Thome (2) all had multiple seasons with an OPS+ of 125 or more as a regular DH during those years.

(I know what you’re thinking: Hmm, 1994 to 2007: PEDs. Yes, but the statistics I’ve used throughout this analysis — wRC+ and OPS+ — are relative figures. The league average, every year, is 100. When DHs compiled a 114 OPS+ in 1998-99, it meant they were 14% better than the inflated averages of the time, a level never attained before or since. So unless there’s some reason DHs were more chemically enhanced, or benefited more from such enhancement, than other players, it’s not a PED thing. And no, it’s not because of the alleged career-prolonging properties of PEDs, as only five of the 21 seasons of OPS+ over 124 cited above were amassed by players older than 35. Nine of the player-seasons were compiled by hitters in their 20s, and five occurred in 2006 and 2007, after the implementation of MLB’s drug policy.)

That brings us 2008-2014. The average OPS+, which was 104 in the 1973-93 period and 109 during 1994-2007 has receded to 106 over the past seven seasons. More strikingly, while the percentage of teams employing a full-time DH (400+ plate appearances) has declined steadily, from 43% in the first 21 years to 38% in the next 14 to 36% in the past seven, the percentage qualifying for the batting title has nosedived, from 27% over the first 35 years of the DH to 16% in the seven years since. In 2008, Thome was the only player who qualified for the batting title as a DH, as was Butler in 2012 and Ortiz in 2014. Prior to those seasons, the only times that happened were in the nascent DH seasons of 1980 (noted above) and 1976 (Carty).

So what’s happening now, and how might it inform the Butler and Morales contracts? I think that the decline in DH performance relative to the league and the decline of full-time DHs are related, because they both stem from the construction of pitching staffs in general, and the modern bullpen in particular. In 1973, the first year of the DH, teams commonly carried 10-11 pitchers on their 25-man rosters. Now they usually have 12, sometimes as many as 13. That leaves less room for a full-time player who can’t play in the field and more need for positional flexibility. As Dave Cameron wrote nearly five years ago:

Teams are choosing to increase their flexibility, even if it comes at the expense of some production. Increasingly, teams want the option to use the DH spot as a pseudo off day for their regulars, or as a fall back plan if their banged-up position player is unable to acceptably field his position. With the move towards 12 man pitching staffs, limited bench sizes put a premium on roster flexibility, and teams are reacting by devaluing players who can’t play the field.

In 2014, eight players played at least 15 games at DH (the extreme right side of the defensive spectrum) and 15 games at catcher, second base, or shortstop (the extreme left side). Breaking that combination down by the eras I defined above, it works out to:

  • 1973-1993: 91 occurrences, or 4.3 players per season
  • 1994-2007: 50 occurrences, or 3.6 players per season
  • 2008-2014: 42 occurrences, or 6.0 players per season

Positional flexibility allows teams to get maximum utility from scarce roster spots, but it doesn’t boost batting by DHs. The eight players in 2014 who played at least 15 games at second, short, or catcher as well as DH were J.P. Arencibia (64 wRC+), Alberto Callaspo (68 wRC+), Logan Forsythe (80 wRC+), John Jaso (121 wRC+), Derek Jeter (73 wRC+), Josmil Pinto (101 wRC+), Dioner Navarro (98 wRC+), and Sean Rodriguez (99 wRC+): One good hitter, three average hitters, and four lousy ones. That doesn’t help the aggregate numbers for designated hitters. Add to that the “DH Penalty,” i.e. the observation that hitters tend to perform worse at DH than when playing in the field — which Mitchel Lichtman calculates in this article to be about 14 points in wOBA — and we can expect increased positional flexibility to erode the offensive contributions of designated hitters.* Jaso, an extreme example, hit .298/.362/.488 in 50 games as a catcher but only .208/.293/.296 in 35 games as a DH.

The DH will remain an offensive position, obviously. And there are obvious risks in drawing conclusions based on just the past seven seasons of data, which admittedly include three above-average years for DHs in aggregate. But given modern roster construction, it’s hard to see DHs consistently generating an outsized contribution to offense as they did in years past. That doesn’t make the below-average performance of Butler and Morales tolerable, but it does make it less of an outlier than it would’ve been previously.

 

*Lichtman’s data indicate that position players who sometimes were DHs didn’t suffer a greater DH penalty than players like Ortiz or Butler, who rarely play in the field. But as he stated in the above-cited article,

I expected that the penalty would be greater for position players who occasionally DH’d rather than DH’s who occasionally played in the field. That turned out not to be the case, but given the relatively small sample sizes, the true values could very well be different.


2015 Fantasy: More Starting Pitching Busts

Starting pitching is half of the fantasy baseball equation and when you take them in the early rounds you cannot afford to strike out.  Here are three starting pitchers you should be letting others draft along with seven other names you should consider as alternatives.

Read the rest of this entry »


The Cubs, the Red Sox and a Blank Check

The Cubs and Red Sox are doing interesting things for their ambitions. Boston overhauled their young and unpredictable club with two of the top free agents on the board while the Cubs’ rotation makeover coincides with a slew of young offensive talent already in place. Neither team is yet finished as the outcomes of Scherzer, Shields and whatever we’re calling San Diego still loom toward the New Year. Regardless, their intent for 2015 and beyond is the same: win.

But these two teams are interesting for another reason. It’s not often that clubs expected to contend in one year also happen to have top 10-selections of that year’s amateur draft, but that’s exactly what will happen this coming June. The former club of Epstein and Hoyer will select 7th overall. Their current club will select 9th and if all goes according to plan, each club’s 2016 selections figure to fall well out of protection.

But rising from this is a fascinating opportunity. It’s a very rare opportunity requiring the unique but exact convergence of factors surrounding these two teams, swinging the cost/benefit ratio to an extreme. The Red Sox intend to be at the top of the standings this season and based on what they’ve done, chances are good that they will. The Cubs are not far behind and as opined by Dave Cameron may be just a leap or two from the same goal. Whatever happens, each of the next two seasons project to be followed by two of the strongest free agent classes in history. 2015 should include Justin Upton, Jordan Zimmermann, Jason Heyward and more. The 2016 elite is likely to be headlined by Stephen Strasburg. There is going to be a hefty number of qualifying offers and little reason to care.

Focusing upon the next two years is important. The current collective bargaining agreement is in effect until December 1st, 2016, specifying the rules that govern draft bonus allotments and the penalties for their violation. Summarized below:

–        0-5% overage: 75% tax on the overage

–        5-10%: 75% tax on the overage and loss of 1st round pick in subsequent draft

–        10-15%: 100% tax on the overage and loss of 1st and 2nd round picks in subsequent draft

–        15% or higher: 100% tax on the overage and loss of 1st round picks in subsequent two drafts

o   Note: If a team lacks the selection subject to penalty due to a prior penalty levied from draft overages, the team will be penalized in the next draft in which said selection is conveyed.

To date, no team has spent beyond the 5% threshold and thus no team has been penalized a selection.

But then no team has been in this particular position before, a situation perhaps too unlikely to have been considered during CBA negotiations. Under the rules above, teams are pressured either to adhere to slot value or to strategize by shifting their allotments in favor of two or three top talents. The Cubs and Red Sox will have no such limitations this June. They can spend with complete and utter impunity.

Part of how this is possible is due to the language of the CBA and the impending sequence of events. A theoretical chronology:

1)     The 2015 draft begins

2)     Boston or Chicago picks who it wants and spends as much as it wants

3)     Boston or Chicago receives the maximum penalty, including tax and forfeiture of 2016 and 2017 1st-round selections

4)     2015 free agent class – Either team signing a QO free agent forfeits its next highest 2016 selection (2nd round)

5)     2016 free agent class – Either team signing a QO free agent forfeits its next highest 2017 selection (2nd round)

6)     Draft Penalty completed

Why would they do this? Because of who they are and because there is every incentive for doing so. Consider the maximum penalty – forfeiture of 2016 and 2017 first round picks – only for these two clubs, it’s entirely probable that none of these picks project to exist. Notice I said project, which is a critical distinction, because at the time of the 2015 draft their 2016-17 selections are officially still in place.

Implying, what if they can be sacrificed? The Cubs and Red Sox each operate at the highest levels of revenue and at their current win-curve trajectory are virtually guaranteed to be major players on the free agent market in each of the next two years. As a demonstration, both have already signed major targets this off-season. It’s easy then to imagine either team having to relinquish their top selections anyway, except either club can decide that in June as opposed to November. Given enough information by then, they’d have to ask themselves: What do they have to lose?

Lets assume each team performs toward the fringe of the playoffs and we assign them the 23rd selection in 2016 and 2017. In Boston’s case we could argue this will be even lower, or a few spots higher for the Cubs if you think they aren’t quite playoff ready, but as a middle ground the 23rd selection is a good place to start. Keep in mind by June, enough games will have been played to know this with some certainty. Let’s also assume that the first-round selections in each year will be lost to FA compensation. This isn’t an exact process since picks will be added or removed to a varying degree, but using 2014’s values will give us a rough estimate going from one year to the next:

2014 7th 9th 23rd
Round ($M) ($M) ($M)
1 3.30 3.08 1.95
2 1.19* 1.13 0.90
3 0.68 0.66 0.53
4 0.47 0.46 0.39
5 0.35 0.34 0.29
6 0.26 0.26 0.22
7 0.20 0.19 0.17
8 0.16 0.16 0.15
9 0.15 0.15 0.14
10 0.14 0.14 0.14
Total 5.71 6.57 2.93
% Decrease -48.7 -55.5

*Boston forfeits 2nd round selection (Hanley Ramirez)

In addition to forgoing the top 30 prospects, dwindling bonus pools severely damage teams’ ability to pay for any talent at all. By employing this strategy, Boston and Chicago can essentially punt drafts in which they might have expected to extract little value in the first place. In exchange, they take full reign to obtain as much talent as they wish in the coming draft – and the talent will be there. Even as restrictions pressure draftees to sign close to slot nevertheless talent falls due to signability, particularly when coming from high school. In the scenario above, a team is looking at one 7-figure talent, maybe two if slots can be shifted. Compare that to what they might obtain with 40 limit free selections.

Just as important, where these teams select has a significant influence on realistic return. Where top ten selections can result in impact talents, selecting early in each round is an opportunity to grab a falling talent well before other teams consider themselves able. Within current strategies, teams have to be cautious of the round in which they decide to risk on higher prospects as the ability to pay is tied directly to their selections in other rounds. But if money is no object, the Cubs and Red Sox can simply pick whomever they want whenever they want, in which case having the higher position becomes a huge advantage.

This isn’t foolproof. It would have to be a “calculated risk” decided upon almost the day-of. For one thing, it’s impossible to predict exactly what a free agent class will have to offer. If several projected free agents instead sign extensions, it becomes more difficult to justify devaluing your top selection. By June, teams should have a better sense of the picture ahead but it won’t be crystal clear. These teams will have to be reasonably confident not only that targets will exist, but that they’ll have a reasonable desire to sign them.

For another thing, at a certain point the cost in payable tax becomes a bit unwieldy. Perhaps the key then isn’t to sign as many top prospects as possible but rather enough to make up for impending losses in the two subsequent drafts. Because their pools are relatively large both teams will be partly insulated, but past that you’re paying double what you normally would per prospect. That requires confidence that the talent available is worth the additional costs, something not often expressed by teams prior to the current CBA.

That’s an argument to be made however, simply because the successful development of a few can exponentially result in surplus value. Should you prefer a direct measure of dollars, studies such as this one routinely demonstrate the windfalls in appropriately identifying and obtaining draft talent regardless of where they’re picked. In today’s league with today’s prices, that’s as tempting an idea as ever and if you wonder whether teams still place premiums on potential, look no further than the international market. Furthermore, the value of a prospect is predicated not on “Will he produce major-league value?” but rather “Can he?” The extractable value of potential in trades should be evident as I write this.

But the third and perhaps the most critical obstacle is the league itself and whether it takes the power to reject bonus agreements. This is suggested in the CBA document linked above, where the “uniform player contract” specifies required-approval by the Office of the Commissioner. Whether this suggests the Office would actually exercise its right of refusal isn’t clear. The only precedent as far as I know is MLB’s refusal to allow a $6M bonus to Matt Purke, a unique situation in which MLB had control of the Rangers’ finances. Particularly controversial deals have drawn little more than ire. Strict stipulation of penalties in the current CBA implies a team’s right to accept said penalties should it choose to do so. For the Commissioner to explicitly prevent a team from exercising this right is bordering on breach and may be actionable or subject to a grievance by the club or by the MLBPA. This is where the issue gets a little messy and comes down to debates beyond the scope of this article.

I won’t hesitate to call this what it is: a gambit. Strategies like this enliven the game and introduce an element of danger that can either pay off or egg face. Regardless, it highlights yet another flaw to the system in place. In one sense, this strategy is a novel way of playing within the rules, which is at the very heart of high competition. In another sense, it goes against sportsmanship in that this strategy is available only to teams who can realistically devalue their top selections, i.e. teams operating with enough capital to consistently invest at the top of the open market.

But rules are rules. The Cubs and Red Sox have the chance to align their playoff ambitions with a prospect bonanza not yet seen. They’ll have their pick among the elite and after that, should the dominoes fall along the way, they can – and should – take full advantage.

Jonathan Aicardi is a researcher with UCSF in the study of glioblastoma and the proprietor of Another Mariners Blog! Because apparently the world needed another one.


2015 Fantasy Bust: Johnny Cueto

I was planning on covering several overvalued starting pitchers in this next article but after analyzing Reds ace Johnny Cueto, I realized I might have enough material to fill an encyclopedia. Read the rest of this entry »


The Effectiveness of the Speed and Movement of a Four-Seam Fastball

Introduction

A few weeks ago I posted a proposal for a regression analysis for an econometric class I am taking with the promise I would post the full analysis when it is complete. Well, its been completed and here is the full analysis, as promised. Its a lot of words so if you don’t care much for how a Probit model works or how to perform a t-test I will go ahead and tell my findinds now.  I found that the speed of a four-seam fastball does help determine the outcome of the pitch–the faster the pitch the lower quality of contact. I also found that movement of a four-seam fastball is statistically insignificant–a four-seam fastball can have zero movement and the outcome will be the same for that pitch. This could be because a four-seam fastball just doesn’t move that much relative to other pitches, I’m not sure though. Also, the model I created has very low goodness of fit measures, which means speed and movement of a four-seam fastball only play a small part in determining the outcome of the pitch. This makes sense: baseball is a complicated game and a lot of variables go into determining an outcome. Without adding even more words to this post below is the paper, in its entirety.

It could easily be said Major League Baseball is in an arms race. Teams have been putting a greater emphasis on finding and developing pitchers who can throw a baseball faster than their peers. Indeed, the average velocity of a fastball has increased every year from 2004 to 2013, with a slight downtick in 2014. From 1990 to 1999, 37 pitchers threw 25 percent or more of their fastballs at 95 MPH or faster; in 2013, 149 pitchers did so. From 2003 to 2008, seven pitchers threw a fastball 100 MPH or faster 20 or more times in a season; from 2009 to 2013, 38 pitchers did so. Teams are trying to find flame-throwers because they believe the faster a ball travels towards home plate, the harder it is for a hitter to make the type of contact resulting in a hit. On the other hand, other factors not emphasized, such as the amount of movement of a fastball may play a role. When a pitcher throws a fastball, it moves. Just as some pitchers can throw a fastball with more velocity, some pitchers can throw a fastball with more movement than others. The relationship between velocity and contact should be the same for movement—the more movement there is, the harder it is to make good contact.

Due to this assumed relationship between velocity, movement, and outcome, I would like to answer the following questions: is it more difficult to hit a fast-moving four-seam fastball than one moving more slowly? Also, is it more difficult to hit a four-seam fastball the more movement it has? Therefore, my hypothesis is twofold: A fast pitch will be more difficult to hit than a slower moving pitch, and the more movement a pitch has, the harder it will be to hit. If my hypothesis is true, more speed and more movement will make a pitch more difficult to hit. The ball from a specific pitch is difficult to hit if a batter swings his bat and fails to make contact with the ball, or the contact made is poor and results in the batter making a strike, if he swings and misses, or an out, if he puts the ball in play.

The body of this paper is organized into six categories: the economic model, the econometric model, the data, the procedures of estimation and inference, the empirical results, and the conclusion. The economic model section explains the composition of the independent variables, the dependent variables, and the error term. It also explains the assumptions as well as provides a general framework for the type of model required for the estimation. The econometric model lays out the functional form of the economic model by formalizing the variables and creating the equations; it also establishes a method to test the statistical significance of the independent variables. The data section explains how the data was gathered, any issues that had to be resolved, and any hesitations about the quality of the data. The procedures of estimation and inference section describes the tools, software, and the specific models chosen to derive the results, why they were chosen, and the characteristics of the model. The empirical results section reports the means of the independent variables, the discrete profile for the outcomes, the parameter estimates, interval estimates, the value of the test statistics, and the goodness of fit measures; it also puts the parameters into the equations. Finally, the conclusion section analyzes the implications from the empirical results and offers possible explanations for the results.

The Economic Model

Independent Variables

A pitcher can throw many types of pitches. The pitcher can try to deceive the batter by throwing a pitch that has a lot of movement, such as a curveball or slider, or a pitch that is slower than it looks like it will be when it leaves the pitcher’s hand, such as a change-up. But no pitcher tries to deceive a hitter when throwing the four-seam fastball. When a pitcher throws a four-seam fastball he is simply trying to throw it as hard and accurate as he can. And this is what teams are searching for—the maximum velocity of a pitcher’s four-seam fastball and the higher the velocity, the better. Even though a pitcher is not trying to induce movement when he throws a four-seam fastball, the ball still moves either horizontally or vertically, which can affect the outcome of the pitch, just as velocity can. This means there will be two independent variables: velocity, measured in MPH, and total movement, which is horizontal plus vertical movement, measured in inches.

Dependent Variables

The dependent variables will be all of the possible per-pitch outcomes that involve the batter attempting to hit the pitch by swinging his bat; this excludes pitches an umpire calls a strike or a ball. These two outcomes are excluded because the batter did not swing his bat, which means the speed or movement of the pitch having any effect on avoiding contact, or inducing poor contact, cannot be discerned.

In addition, because the outcomes are per-pitch, walks and strikeouts are excluded because those outcomes are already accounted for. More specifically, if the batter walks, then he did not swing at the pitch, and it is therefore excluded. If the batter strikes out by swinging and missing, which is accounted for with the swinging-strike outcome, or by being called out by the umpire, then it is excluded because the batter did not swing his bat.

The included outcomes are: swinging strike, foul ball, ground-out, pop-out, fly-out, line-out, single, double, triple, and home run. The difference between a pop-out and a fly-out is who catches the ball: if an infielder catches a ball in the air then it is a pop-out, if an outfielder catches a ball in the air then it is a fly-out. Many types of outs have been included because each type of out can indicate what type of contact was made. For example, if the contact was poor, then the result will either be a ground-out or a pop-out. If the contact was solid, but the batter still made an out, then the result will be a line-out or a fly-out. If the contact did not result in an out, then it will be assumed the contact was good.

From a pitchers perspective the most desirable outcomes are, from most to least desirable: swinging strike, pop-out, ground-out, fly-out, line-out, foul, single, double, triple, and home run. This ranking also reflects a continuous spectrum of contact from softest to hardest. An argument can be made that a swinging strike does not belong on the spectrum because no contact was made. But no contact is still a type of contact; it is the absence of contact, which is the lowest quality of contact and the lowest point on the contact spectrum.

Error Term

The error term will capture the sequencing of the previous pitches, the count, the base-out state, the location of the pitch, and the quality of the defense.

Each pitch will be context neutral; the pitches that preceded it will not be accounted for. This can affect the outcome of the pitch because the absolute speed of the pitch may not matter as much if the previous pitches that a batter has seen in an at bat have been much slower than the four-seam fastball.

The count of the at bat can affect the outcome of the pitch because batters know, in some counts, pitchers are more likely to throw a four-seam fastball. In this case, the batter may be anticipating the four-seam fastball, which will give the batter an advantage. The base-out state can affect the outcome of the pitch because it can dictate what pitch a pitcher is more likely to throw. The location can also affect the outcome of the pitch because some locations are more difficult for a batter to reach with his bat when he swings. Also, pitchers generally know there are certain locations where most hitters of a certain handedness have difficulty hitting a four-seam fastball if thrown in the particular location, and the location is less sensitive to speed and movement.

The quality of the defense can affect the outcome of the pitch as well because it can turn hits into outs, if the defense is good, or it can turn outs into hits, if the defense is poor. This can cause the ranking of outcomes to be less predictive of the type of contact made for each outcome. For example, a ground ball that gets past an infielder is a single. But the contact made was the type of contact consistent with the contact for a ground-out, not a single. Since a ground-out is ranked third and a single is ranked seventh, the difference in quality of contact between the two outcomes is substantial.

Estimation Methods

Since the dependent variable can take only one of ten possible values the relationship between the independent and dependent variable is not linear and the Ordinary Least Squares model would not be appropriate for our purposes. The best type of model to predict one of the possible outcomes for a pitch given an initial value of velocity and movement is a Limited Dependent Variable model. A Limited Dependent Variable model is used when the value of the dependent variable is restricted to a range of possible outcomes that can be ranked in a meaningful manner. The estimation of the relationship between the independent and dependent variable requires the method to take into account the restriction and ranking. This model was chosen because the range of possible outcomes is restricted and the values are discrete—each pitch can only result in one of ten possible outcomes—and the outcomes are ordered by their value to the pitcher. Also, the relationship between velocity, movement, and the outcome of the pitch requires the ranking of the outcomes to be accounted for because it is assumed velocity and movement influence the type of outcome.

Since the outcomes are also ranked by type of contact, an outcome occurs only if the contact for a particular outcome is greater than the contact required for the outcome located below it and less than the contact required for the outcome located above it. For example, if the contact made was greater than the contact required for a ground-out, but less than the contact required for a line-out, the outcome would most likely be a fly-out.

This type of reasoning implies interval estimates will need to be created for each outcome. Each interval estimate will have a lower limit and an upper limit; if the value the model calculates, given an initial value of velocity and movement, lies between the upper and lower limit, then the outcome the interval estimate represents will be the outcome to most likely occur.

The Econometric Model

Regression Equation   

Formalizing the independent variables, dependent variables, and error term results in the following equations:

Oi= β1 + β2*V + β3*M + ε                           (1)

Where ε ~ (0, σ2)                                         (2)

The right side of equation 1 contains the dependent variable, outcome, and the subscript i represents the type of outcome. The left side of equation 1 has two parts: a structural component and a random component. The structural component contains the independent variables where β1 is the intercept, β2 is the estimated coefficient for velocity, V is velocity in MPH, β3 is the estimated coefficient for movement, and M is horizontal movement plus vertical movement in inches. The random component is the error term, ε; it is the residual that cannot be explained by the variables in the model. The error term is assumed to have a standard normal distribution, which is indicated by equation 2.

Interval Estimates

If equation 1 is less than the lower limit of an outcome ranked two outcomes higher of the upper limit that equation 1 is greater than, the outcome is the one located between these two outcomes. This can be said in terms of quality of contact as well: if the quality of contact a particular amount of velocity and movement is likely to induce is less than the lower limit for the quality of contact required for an outcome located immediately above a particular outcome, but the quality of contact is greater than the upper limit for the quality of contact required for an outcome immediately below a particular outcome, the quality of contact results in the outcome located between the quality of contact required for the upper and lower limit of the particular outcomes. This means equation 1 can be used to create an interval estimate for a particular outcome:

LOf < β1 + β2*V + β3*M + ε < -LOc = Oi                (3)

LOf is the upper limit for outcome f and -LOc is the lower limit for outcome c. Outcome f’s quality of contact is located immediately above the maximum amount of contact required for outcome i and outcome c’s quality of contact is located immediately below the minimum amount of contact required for outcome i. With that being said, interval estimates can be created for all of the outcomes and can be written as:

 OSS if Oi > Lpo  (4)                                     
OPO if -Lss > Oi > Lgo (5)
OGO if -Lpo > Oi > Lfo (6)
OFO if -Lgo > Oi > Llo (7)
OLO if -Lfo > Oi > Lfl (8)
OFL if -Llo > Oi > Lsl (9)
OSG if -Lfl > Oi > Ldb (10)
ODB if -Lsl > Oi > Ltp (11)
OTP if -Ldb > Oi > Lhr (12)
OHR if -Ltp > Oi (13)

To make sense of equations 4 through 13, the outcomes have been assigned the following categorical values and subscripts in Table 1: Categorical Values & Subscripts

Outcome Value Subscript
Swinging Strike 10 SS
Pop Out 9 PO
Ground Out 8 GO
Fly Out 7 FO
Line Out 6 LO
Foul 5 FL
Single 4 SG
Double 3 DB
Triple 2 TP
Home Run 1 HR

Using equations 3, and 4 through 12, the interval estimates can be derived for each outcome, those equations are:

LPO < β1 + β2*V + β3*M + ε = OSS                             (13)

LGO < β1 + β2*V + β3*M + ε < -LSS = OPO             (14)

LFO < β1 + β2*V + β3*M + ε < -LPO = OGO             (15)

LLO < β1 + β2*V + β3*M + ε < -LGO = OFO             (16)

LFL < β1 + β2*V + β3*M + ε < -LFO = OLO              (17)

LSG < β1 + β2*V + β3*M + ε < -LLO = OFL             (18)

LDB < β1 + β2*V + β3*M + ε < -LFO = OSG             (19)

LTP < β1 + β2*V + β3*M + ε < -LSG = ODB             (20)

LHR < β1 + β2*V + β3*M + ε < -LDB = OTP             (21)

-LTP > β1 + β2*V + β3*M + ε= OHR                           (22)

  Hypothesis Testing

Once the estimates for the coefficients are reported, their level of significance can be tested. To do this a null and alternative hypothesis was created:

Ho: β2 = 0, β3 = 0                      (23)

H1: β2 ≠ 0, β3≠ 0                        (24)

Equation 23 is the null hypothesis and it states the coefficients for velocity and movement equal 0. This means if one of the coefficients is 0, the predicted outcome and quality of contact will not change. Equation 24 is the alternative hypothesis and it states the coefficients for velocity and movement is not equal to 0. This means the coefficients do influence the outcome and quality of contact. The next step in hypothesis testing is calculating a test statistic. Since the assumption is the error terms have a standard normal distribution and they are homoscedastic—all of the error terms have the same variance—the t-test will be used for the test statistic. The next step is to establish a rejection region. Because the alternative hypothesis is “not equal to” then a two-tail test needs to be used. This is done with the following equation:

t(α/2, N-3) < t < t(1-α/2, N-3)                           (25)

Where α is the critical value for the level of significance, N is the amount of observations, and N-3 is the degrees of freedom—3 is being subtracted because 3 degrees have been used by the coefficients and intercept. The rejection region has two regions: one located in the lower tail of the curve, the other located in the upper tail of the curve. The space to the left of t(α/2,N-3) is the lower tail and the space to the right of t(1-α/2, N-3) is the upper tail. Equation 25 states the null hypothesis can be rejected for two reasons: if t is greater than t(α/2, N-3), or if it t is less than t(1-α, N-3). If either of these is true, the null hypothesis is located beyond the critical value somewhere in one of the rejection regions, which means the null hypothesis can be rejected and the alternative hypothesis can be accepted. But, if both of the reasons needed to reject the null hypothesis are false, the null hypothesis is located before the critical value of both tails somewhere in the acceptance region, which means it cannot be rejected and the coefficient being tested could be 0—which is statistically insignificant.

Data

The data was collected from www.BaseballSavant.com. This website maintains the PITCH f/x database, which contains data on every pitch thrown from the 2008 to 2014 season, using high speed cameras located in every Major League ballpark. Since the data from 2008 to 2009 has some classification issues, those years are excluded from the data sets; thus the data sets are from seasons 2010 to 2014. Each data set has approximately 21,000 observations. Since there are five data sets, the total amount of observations is approximately 105,000.

The website allows for many types of filters to be used when searching for data, but the filters used for our purposes are pitch type, pitch result, batted ball result, and at-bat result. The filters for pitch result do not include the type of outcome resulting from the ball being put in play. To get those results the filters for at-bat result had to be used. This resulted in the inclusion of data that was supposed to be excluded. For example, if a four-seam fastball was thrown during an at-bat, but the batter did not swing, then it needs to be excluded, but if the at-bat ended with one of the selected at-bat filters then it was included in the data set. All lines of data containing this type of issue had to be removed from the data sets.

Also, the data on movement came in two components—horizontal movement and vertical movement. Some of the values for horizontal and vertical movement were negative and some were positive. Horizontal movement is positive if the pitch moves towards the right side of home plate, and negative if the pitch moves towards the left side of home plate from the catchers’ perspective. Vertical movement is positive if the pitch drops less than it would from gravity alone, and negative if the pitch drops more than it would from gravity alone. If a pitch had one type of movement that was positive and another type of movement that was negative, the two values would subtract from each other when adding them together and not properly reflect total movement. To prevent this from occurring, the absolute value was taken for each type of movement and then added together.

Since a Limited Dependent Variable model is being used, a new variable had to be created. This variable captures the ranking of each outcome by assigning a numerical value to each type of outcome. Since each outcome was ranked from least to most desirable from the perspective of the pitcher, the least desirable outcome, a home run, was assigned the value of one, and the most desirable outcome, a swinging strike, was assigned the value of ten. Also, a variable had to be created indicating the year from which the data originated. Since there are five years’ worth of data, the variable could take on one of five possible values—1 through 5. This was done because all of the data was combined when put into the program. Having a variable indicating year allowed for a dummy variable to be created in the program so different data sets could be created and regressions could be run on each data set, and then all the data sets combined.

Procedures of Estimation and Inference

The program used to run the regression was SAS, version 9.3. The procedure used to estimate the mean, standard deviation, and the minimum and maximum values for the independent variables was the MEANS procedure. The procedure used to estimate the intercept, coefficients, and interval estimates was the QLIM procedure. The QLIM procedure is a Limited Dependent Variable model, and can use either the Binary Probit or Logit model, or the Ordinal Probit or Logit model. The Binary Probit or Logit model is used when the dependent variable assumes only one of two values. Since the dependent variable has ten possible values, the Binary model was not appropriate for our purposes. The Ordinal Probit or Logit model allows for a dependent variable to assume more than two values and the values can be ranked in either ascending or descending order, which was most appropriate for our purposes. The difference between the Ordinal Probit and Ordinal Logit model is the Ordinal Logit model assumes the error term has a standard Logistic distribution, and the Ordinal Probit model assumes the error term has a standard Normal distribution. Error terms can be assumed to have a standard normal distribution if the dependent variable is influenced by an unobserved continuous variable and the possibilities for the unobserved continuous variable is infinite, even if the possibilities are bounded between a minimum and maximum value.

The outcome of a pitch can be thought of as a proxy for quality of contact—the softer the contact the better the outcome for the pitcher and vice versa. Even though the model has ten dependent categorical ordinal outcomes—which by definition means it is not continuous—it measures a single variable at a distance, which is quality of contact. Quality of contact can be thought of as being continuous: it is a spectrum of infinite possibilities bounded between two values—no contact and perfect contact. Even though perfect contact is a nebulous concept, it still acts as a boundary that cannot be surpassed. This means quality of contact meets the criteria for having error terms that have a standard normal distribution, which means the Ordinal Probit model is the model most appropriate for our purposes.

The purpose of the Ordinal Probit model is to estimate the probability an observation will fall into one of the categorical outcomes. The central idea behind the Ordinal Probit model is there is an unobserved continuous variable underlying the dependent variable, which influences the ordering of the dependent variable. The unobserved continuous variable is quality of contact, which is assumed to determine the outcome, and it is assumed velocity and movement of a pitch influence quality of contact.

The Ordinal Probit model creates upper and lower threshold values partitioning the continuous variable into a series of regions corresponding to one of the ordinal categories representing one of the regions along the continuous spectrum. These upper and lower thresholds create intervals; each interval corresponds to a range of contact required for a particular type of outcome. Quality of contact lies on a continuous spectrum of no contact to perfect. Each outcome occupies a region along the quality of contact spectrum. Each outcome has two threshold values: if the quality of contact worsens and passes an upper threshold quality of contact value of a particular outcome, the outcome will be the outcome ranked immediately below the outcome whose upper threshold quality of contact value was passed, this is a lower limit. If the quality of contact improves and passes the lower threshold quality of contact value of a particular outcome, the outcome will be the outcome ranked immediately above whose lower threshold quality of contact value was passed, this is an upper limit.

The Ordinal Probit model relaxes the constraint that the effect of the independent variables is constant across different predicted values of the dependent variable. The model assumes an S-shaped curve. In each tail section of the curve the dependent variable responds slowly to changes in the independent variables, and as it moves closer towards the middle of the curve, the dependent variable responds faster. This implies as the probability of a particular outcome occurring approaches .5, changes in velocity and movement cause relatively large changes in the probability of a particular outcome occurring. As the probability of a particular outcome occurring approaches 0 or 1, changes in velocity and movement induces relatively small changes in the probability of the particular outcome occurring.

This cascading effect of outcome-probability has intuition: if the probability of an outcome occurring approaches 0, the probability of the outcomes furthest away—either below its lower limit or above its upper limit depending on the type of contact—must be approaching 1. This means as the probability of a particular outcome decreases by a particular amount, the amount it decreases by is allocated disproportionally between the outcomes in a particular direction in descending order, with the outcome ranked immediately above or immediately below receiving the biggest increase in probability of occurrence, and the outcome furthest away probability of occurrence increasing the least, which is closest to 1. Another way to put it is, as velocity and movement changes, contact moves along its spectrum changing the probability of each of outcome occurring; some probabilities increase and some decrease. If the probability of an outcome decreases, the amount it decreases by increases the probability of the outcome located immediately below or above to increase the most, and the outcome located the furthest away to increase the least, with the probability of all the intermediate outcomes increasing or decreasing disproportionally with their distance from the origin.

For example, a home run and swinging strike are on opposite ends of the contact spectrum. If the probability of a home run occurring approaches 0, and the probability of a swinging strike occurring approaches 1, the amount of velocity and movement—and therefore contact—required for the two outcomes is substantially different because the probability of anything occurring in between must be approaching 0, but not at the rate in which the home run contact is approaching 0. As velocity and movement change towards the amount of velocity and movement required to induce the type contact resulting in a home run, then the probabilities of the outcomes located between swinging strike and home run will increase, with the probability of the outcome located immediately below swinging strike, pop-out, increasing the most, and the outcome located immediately below pop-out, ground-out, increasing the second most, and so on, with the probability of a home run occurring increasing the least. As velocity and movement continue to change and contact moves along its spectrum towards the type of contact required for a home run, the probabilities of each outcome change with the outcomes closest to a swinging strike increasing the most until, eventually, the allocation of probability is reversed and the probability of a home run occurring approaches 1 and the probability of a swinging strike occurring approaches 0.

Empirical Results

Discrete Response Profile & Means            

Table 2 is the discrete response profile for seasons 2010 to 2014. It reports the frequency of each outcome and the percent the frequency represents of all the outcomes.

Index Outcome Frequency % of Total
1 Home Run                6 0.01%
2 Triple            196 0.18%
3 Double          2,252 2.09%
4 Single          8,112 7.52%
5 Foul        50,835 47.12%
6 Line Out          2,891 2.68%
7 Fly Out        10,435 9.67%
8 Ground Out        12,198 11.31%
9 Pop Out          4,072 3.77%
10 Swinging Strike        16,881 15.65%

Table 3 contains the amount of observations for each variable, the mean, standard deviation, and the minimum and maximum values for seasons 2010 to 2014.

Variable N Mean Std Dev Min Max
Velocity    107,880 91.9668966 2.9209241 78 104.1
Movement    107,880 13.1161436 3.2585848 0.29 44.41

Parameter Estimates

Table 4 contains the parameter estimates for data from the 2010 to 2014 seasons. It contains the estimates, standard error, t values, and p values for each of the parameters. The standard error indicates the accuracy of the estimate in representing the population. The t and p values test for statistical significance. They both assume the null hypothesis is true and equal to 0. The t value indicates if the estimate is statistically significant from 0, the larger the t value, the more likely the null hypothesis is wrong and the parameter is statistically significant from 0. The p value indicates the probability the null hypothesis is true and the parameter is not statistically significant from 0. The lower the p value the more likely the null hypothesis is false and the parameter is statistically significant from 0.

Parameter Estimate S.E. t Value Pr > [t]
Intercept 2.175107 0.134481 16.17 < .0001
Velocity 0.017939 0.001114 16.1 < .0001
Movement -0.001804 0.000998 -1.81 0.0706

           Hypothesis Testing

Since the standard error and t value have been reported, their level of significance can be tested. Using the null and alternative hypothesis from equations 23 and 24 and using a critical value of 5 percent, equation 25 can be written as:

t(2.5, 107,877) < 16.10 < t(97.5, 107,877) = -1.960 < 16.1 < 1.960             (26)

t(2.5, 107,877) < -1.81 < t(97.5, 107,877) = -1.960 < -1.81 < 1.960             (27)

Equation 26 is the test hypothesis for velocity. Since -1.960 is less than 16.1 and 16.1 is greater than 1.960, the null hypothesis for velocity is located to the right of the critical value in the upper tail of the curve somewhere in the rejection region, which means it can be stated with 95% confidence that velocity is statistically significant from 0 and influences the quality of contact and the outcome of the pitch, holding movement constant.

Equation 27 is the test hypothesis for movement. Since -1.960 is less than -1.81, but -1.81 is not greater than 1.960 then the null hypothesis for movement is located to the left of the upper tail’s critical value, which is not beyond the critical value in the rejection region, and means the null hypothesis cannot be rejected. This means it can be stated with 95% confidence that movement is not statistically significant from 0. If movement were 0, the quality of contact and outcome of the pitch would not change, holding velocity constant. This means it can be removed from equations 1, 3, and 13 through 22. Regression Equation Since estimates for the parameters have been calculated and their level of significance has been determined, the values can be plugged into equation 1 to get:

Oi= 2.175107 + .017939*V + ε                      (26)

Movement has been removed because it has no effect on the outcome. Also, the error term remains unknown because its precise value cannot be determined using a Limited Dependent Model. The error term takes on a range of values depending on the value of the independent variables and the value of the upper and lower limit of the outcome.

Interval Estimates

Table 5 contains the interval estimates for seasons 2010 to 2014 for each type of outcome. It gives the lower limit, upper limit, standard error, t value, and p value, and the upper limit minus the lower limit, which gives the size of the interval.

Parameter Home Run Triple Double Single Foul Line Out Fly Out Ground Out Pop Out Swinging Strike
Lower Limit 0.900493 1.798627 2.175107 2.5067 3.975195 4.043806 4.304515 4.664113 4.811196
Upper Limit 0.900493 1.798627 2.175107 2.5067 3.975195 4.043806 4.304515 4.664113 4.811196
S.E. 0.086475 0.088013 0.088104 0.088125 0.088165 0.08818 0.088204 0.088235
t Value 10.41 20.44 28.45 45.11 45.87 48.81 52.88 54.53
Pr > [t] < .0001 < .0001 < .0001 < .0001 < .0001 < .0001 < .0001 < .0001
Upper – Lower 0.898134 0.37648 0.331593 1.468495 0.068611 0.260709 0.359598 0.147083

Velocity can be removed from equations 13 through 22 and the values from Tables 4 and 5 can be plugged into the equations to get:

4.811196< 2.175107 + .017939*V + ε = OSS                                                        (27)

4.664113 < 2.175107 + .017939*V + ε < 4.811196 = OPO                           (28)

4.304515 < 2.175107 + .017939*V + ε < 4.664113 = OGO                          (29)

4.043806 < 2.175107 + .017939*V + ε < 4.304515 = OFO                           (30)

3.975195 < 2.175107 + .017939*V + ε < 4.043806 = OLO                           (31)

2.506700 < 2.175107 + .017939*V + ε < 3.975195 = OFL                           (32)

2.175107 < 2.175107 + .017939*V + ε < 2.506700 = OSG                          (33)

1.798627 < 2.175107 + .017939*V + ε < 2.175107 = ODB                           (34)

0.900493 < 2.175107 + .017939*V + ε < 1.798627 = OTP                           (35)

0.900493 > 2.175107 + .017939*V + ε = OHR                                                    (36)

  Goodness of Fit Measures

Goodness of fit measures describes how well the model fits the observations. The measures typically summarize the discrepancy between observed values and the expected values in the model. Since the linear regression model was not used, the goodness of fit measures is not those that are typically expected such as the coefficient of determination, R2. Table 6 contains the reported measures for the data from the 2010-2014 seasons.

Measure Value
Likelihood Ration (‘R) 259.29
Upper Bound of R (U) 350699
Aldrich-Nelson 0.0024
Cragg-Uhler 1 0.0024
Cragg Uhler 2 0.0025
Estralla 0.0024
Adjusted Estralla 0.0022
McFadden’s LRI 0.0007
Veall-Zimmerman 0.0031
McKelvey-Zavoina 0.0027

The most useful of these measures is McFadden’s LRI because it is analogous to R2. It is bounded between 0 and 1 and, in theory, can equal 1, meaning the model is a perfect fit for the data, even though most models that are a good fit fall in the range of .2 to .4 (vii). All of the other measures except for the Likelihood Ration (R) and Upper Bound of R (U) are similar to McFadden’s LRI—they’re an attempt to simulate R2.

Conclusion

Since the estimated coefficient for velocity is positive, the greater the amount of velocity the lower the quality of contact, meaning a desirable outcome for the pitcher is likely to occur. This supports the first part of the hypothesis. But the estimate for movement was not significantly different from 0, which does not support the second part of the hypothesis. A pitcher is not trying to induce movement when he throws a four-seam fastball and the movement that does occur is relatively little compared to pitches in which a pitcher is trying to induce movement. Indeed, a four-seam fastball rotates backwards, which keeps the ball straight and limits the movement. This relatively small amount of movement may not do much to deceive a hitter and cause him to either swing and miss or make poor contact. It would be interesting to see if the amount of movement in pitches in which a pitcher is trying to induce movement leads to lower quality of contact.

According to Table 2, it appears to be difficult for a pitcher to get a hitter to swing and miss at a four-seam fastball. Hitters make contact 84.35 percent of the time, and swing and miss 15.65 percent of the time. It also appears to be difficult for a hitter to make the type of contact required to not make an out—only 9.8 percent of the outcomes resulted in a hit. When a hitter does make an out the type of contact is mostly poor—54.97 percent of the outs are ground-outs and pop-outs. The outs requiring a bit more solid contact—line-out and fly-out—make up 45.03 percent of all the outs. It also appears the most frequent outcome is a foul. Fouls can be good for a pitcher if they result in strikes, but a foul will only result in a strike if the count has less than two strikes. If the count for the hitter has two strikes, it is good for the hitter because he gets to see another pitch.

Since the interval for the foul is the largest and the intercept is the lower limit for the outcome immediately above it—the single—it is easy to see the model predicts the most likely outcome to be a foul. This makes sense because it was the outcome that occurred most often by a wide margin. But given the ambiguity of the foul in terms of value to the pitcher and hitter, and the quality of contact required to cause a foul, any type of positive analysis will be ambiguous. A statement cannot be made about the value of this outcome except the value changes from the pitcher to the hitter depending on the count.

Since the goodness of fit measure is rather low, the model is not a good fit for the data. This result does not mean the model is not predictive. Rather, it means there are other variables influencing the quality of contact and the outcome of the pitch that are not included in the model. In some ways, this makes sense: baseball is a complicated game and the outcome of a four-seam fastball depends on much more than just velocity and movement. Things such as the location of the pitch, the sequencing of the previous pitches, the handedness of the pitcher and batter, the base/out state, and the count play a large part in determining the outcome of the pitch. If some of these variables were included in the model then its predictive power and goodness of fit would have most likely increased.

Taking the average fastball velocity from table 3, 91.96 MPH, plugging it into equation 26 and ignoring the error term, the value is 3.82, which falls in the interval for foul, as expected. But, in order for the speed to result in a swinging strike, it needs to travel around 147 MPH, or 19 standard deviations above the mean. This doesn’t fit very well with reality—no pitcher will ever throw a pitch at 147 MPH and plenty of hitters swing and miss four-seam fastballs with velocity around the mean. If velocity were the only thing determining the outcome, it would require 147 MPH to result in a swing and miss. But velocity is not the only determinant; it has only a small influence over the outcome of the pitch. This supports the conclusion the model does not fit the data very well and the error term is probably rather large relative to the estimated coefficient for velocity.

In the extremely competitive environment of major league baseball where teams flesh out the smallest advantage to give them an edge over their competitors, it makes sense for them to put a greater emphasis on velocity. It does have an influence on generating favorable outcomes for the pitcher. Therefore the trend in baseball is likely to continue and velocity is going to continue to increase.


Trouble With the Aging Curve

Ever since I became enamored by the baseball statistical community, I’ve tried to gather as much information as I could. I registered on several websites dedicated to the analysis of baseball statistics such as baseballprospectus.com or FanGraphs.com or HardballTimes.com. I read every book, article I could get my hands on and even tried my hand at producing my own research and analysis in order to achieve two goals in my life: 1. Publish my research and become a savvy baseball analytical mind; and 2. Work within a baseball organization.

My first basic analysis came in the form of three year projections in order to try my hand at fantasy baseball. Personally, I’m proud to say that my first dip within the analytical waters where fruitful as my projections helped me win my league 3 times out of 5 attempts[1]. But, after many years keeping my projections and questions to myself; I’ve finally felt compelled to start more serious research and publish my questions and results online to share with people interested in these topics. So, without further ado, I give you my first serious publication.

***

Many readers will often find that writers, commentators and analysts highly value a player before they reach their age 30 season. But, once they pass this mark, players will begin to gradually decline; their production will falter, they’re prone to getting injured more than once within the same season, their speed will begin to abandon them. In other words, the shine begins to disappear and is replaced by a shelled version of a player we, the fans, and managers value. Furthermore, I’ve often read in many articles that players even peak at the age of 27 – this being the season where a player will give his (all-time) best performance before beginning that slow decline into retirement.

Now, I have two problems with this:

  1. What stats determine that a player’s best season is his age 27 season?
  2. Does this peak age season vary for every position or are all players subjected to the same aging curve?

To answer the first question, I used player statistics starting from 1960 up to 2013 and looked specifically at power numbers – slugging percentage, isolated power and on-base plus slugging[2]. I then calculated each player’s age in accordance with their birthday and how old they would be by June 30th and took this to be their age-season. Once I had this, I began running histograms in order to determine the lowest performance, highest performance, mean and first and third percentiles.

For this analysis, I only used the data for players who were between 20 and 35 years of aged during any given season. What I found, starting with SLG, was that players – power-wise – don’t reach their peak at 27 but after their 30s. A player’s SLG increases gradually as he gets older until he reaches his age 31-32 season. A player will have a mean SLG of 0.437 by age 27, while, during his age-32 season, the mean SLG will be 0.447 – ten percentile points higher or an increase of 2.3%.

So, as we can see, SLG-wise, a player will show a better performance past his 30th birthday. But maybe I am biased. Maybe if I checked ISO, we will find different results.

What I found were very similar results. A player’s isolated power, again, on the mean, didn’t peak at age 27. The ISO was 0.159. And, the ISO didn’t peak during the age-32 season but a year earlier during the age 31 season. During this season, ISO was 0.167 while the next season it began to decline at 0.165. ISO increases by 5.0% during those five years.

Finally, I decided to take a look at OPS to see if I could find a similar pattern. Again, players mean OPS peaks during their age 32 season, going from 0.784 at their age 27 season to 0.801 by the time they’re 32. It’s not much of an increase (2.2%) but it’s something.

What I can determine, then, is that a player’s power begins to develop once he hits 27 years of age and will gradually increase right up to when he turns 32. But, after this, his power performance will begin to decline, though not by much.

Another thing that I concluded from looking at these three histograms is that, even though there are gradual increases every season.  Player performance – power-wise – will be fairly consistent from one season to the next. Save for the early seasons (21-25 when a player is still developing), there are no surprising jumps in power[3] from one age to the next. Therefore, though we might prefer younger players for cost control reasons, when we need power production, we can’t fully disregard an older player’s power performance. Chances are they will still produce the same.

***

Having checked how power changes as a player ages, I come to my second question: Does the aging curve differ across positions? Well in football – or soccer for Americans – we have four major positions: striker, midfielder, defense and goalkeeper. Through statistical analysis by Arsenal F.C.’s data department, Arsene Wenger, Arsenal’s manager, found that a players decline varies on the position he plays on the field. That is to say, a striker will age differently than a goalkeeper, and a defender will age different to these two positions.

And, as we all know, work at different positions takes a different toll on a player’s body. Catchers will suffer become more fatigued as a season rolls by than players at any other position; shortstops, as well, have a more demanding position that will require more physical effort. We expect different results from each of the three outfield positions. So, it would be natural that players at different positions age differently on the power curve[4].

What I found out was that my thoughts were correct: positioning on the diamond does affect a player’s power performance but not by much. These are the results based on the mean:

Position Peak Age SLG
Catcher 33 0.413
First Base 31 0.451
Second Base 35 0.390
Third Base 34 0.417
Shortstop 35 0.389
Left Field 32 0.441
Center Field 32 0.433
Right Field 32 0.447

 

As we can see from the data, first basemen will usually be the first position players to peak. After them, the three outfield positions will peak at age 32. Catchers will then follow suit. Finally, the hot corner will peak at 34 and the middle infield will produce more power by the time they turn 35 than any of their previous years.

What we can conclude from this table is the following; because the demand on power from first base more than defense, players will tend to flex their muscles more often than not; whilst primarily defensive positions such as catcher, second base and shortstop will develop more power later in their careers than when they start off. Outfielders, on the other hand, tend to produce power throughout their careers.

The position that does surprise me is the hot corner. I would have expected third basemen to peak earlier in their careers because most players at the position are power hitters. Then again, there are many good defensive third basemen who aren’t big power players (I’m looking at you Juan Uribe).

***

After reviewing all the numbers, I can safely conclude that as players age, power doesn’t decline. On the contrary, power also increases though not by very much. Furthermore, the gradual increase in power at the plate will vary by position, much like a football – soccer – player’s performance will vary according to his position. Therefore, though we may like young players because of their hustle, cost-control and their energy, it doesn’t hurt to carry a few veterans in the lineup, if not to mentor the young ones, to provide some pop within the lineup.

 

[1] A small sample size, I admit, but nevertheless, a positive achievement as it encouraged me to delve deeper into baseball analytics.

[2] I didn’t look at OBP as I believe that this stat has more to do with a player’s ability at identifying pitch types, though in retrospect, this can also become better as a player ages and gains more experience.

[3] Though there are many outliers as you can see.

[4] I have charts and charts of histograms for each position measuring SLG, ISO and OPS but since I don’t want to oversaturate with information.


Big Winners of the Offseason So Far: AL

As all of baseball convened in San Diego this past week, there were a lot of holes to fill. There are some teams that have been very active in free agency and trades over the past weeks and this article means to look at three teams in the American League that have enhanced their rosters over that span of time.

These teams did not make the playoffs in 2014 and they added players that may make them playoff caliber teams in 2014.

CHICAGO WHITE SOX
2014 Regular Season Record (73-89)

There has been a lot of pressure on the White Sox to build a winner as the Detroit Tigers and Kansas City Royals have made the World Series in the past three years and the Cleveland Indians made the playoffs in 2013. The White Sox made a couple big splashes this offseason to boost their profile in the AL Central.

A bit before the Winter Meetings, they inked Adam LaRoche to bolster their weak lineup and provide left-handed power to match Jose Dariel Abreu’s right-handed power in the middle of the lineup. LaRoche has averaged 27 home runs per 162 games in his career and twice in the past three years has had an OPS over .800. LaRoche may not be an All-Star caliber player, but, other than an awful 2011, LaRoche has consistently been a strong performer with an OPS+ of 114 for his career.

The White Sox have an ace in Chris Sale, with a 9.8 K/9 and a 2.76 ERA since entering the league in 2010. He had a 2.17 ERA over 26 starts last season, but the Sox needed a second top pitcher to compliment Sale in the rotation. They did just that by moving prospect Marcus Semien, along with other minor league prospects, for Jeff Samardzija. The 29 year old veteran has struck out 200 or more batters in each of the past two years and posted a sub-3.00 ERA last season. His ERA went up and strikeouts went down as he went from the Cubs in the National League to the Athletics in the American League, but did see his WHIP drop strongly to beneath 1.00 and struck out 99 while walking only 12. The White Sox now have two top-25 starters coming into the 2015 season, as Sale will be top-5 starter and Samardzija will comfortably sit in the 22-24 range.

The White Sox needed some help in the bullpen as Zach Putnam or Jake Petricka were set to be the closer for 2015, so they dipped into their pockets, signing two former All-Stars to multi-year contracts. Zach Duke signed a bit before Winter Meetings and the former All-Star starter has a 2.20 ERA in his last 88 appearances and the White Sox needed a left-handed relief option as the entire bullpen was right handed before signing Duke. The big splash for the White Sox, though, was signing former Yankee All-Star closer David Robertson. Since 2011, Robertson has a 12.3 K/9 and from 2011-2013, had no higher than a 2.67 ERA. He only has 46 saves in his MLB career, as he was the setup man for Mariano Rivera coming into 2014. But Robertson had 39 saves last year, and has seen his BB/9 go from 4.7 in 2011 to a 2.8 average from 2012-2014. Duke will provide left-handed relief help that the White Sox were devoid of and Robertson will be the All-Star caliber closer that the White Sox have been without since Bobby Jenks left.

 

TORONTO BLUE JAYS
2014 Regular Season Record (83-79)

The Blue Jays play in the most active division and have been active in the market. They signed a Gold Glove caliber catcher, an MVP candidate at third base, and freed up space on the roster for a top prospect.

Russell Martin is a highly underrated player who is very strong in intangibles, like his blocking of pitches and elite game calling skills, and will bring his veteran experience to Toronto. Martin’s game calling abilities are well known; his catching abilities will enhance the entire Blue Jays staff, as he led a Pirates staff to back-to-back playoffs with top five ERAs in each season. Martin may never steal double-digit bases again, as he did each season from 2006 to 2009, but he had a .832 OPS last year and hit 39 home runs in his two previous seasons in the AL East, both with the Yankees. His .402 OBP of 2014 may be a bit of a misnomer of his abilities; he had a .332 OBP in the previous five seasons, but he will have much more than 45 runs as a top of the lineup hitter in a lineup with three MVP candidates behind him. Martin may be in a lineup with MVP caliber talent, but could end up being the most vital piece of a playoff run for the Blue Jays.

Josh Donaldson is the newest MVP candidate in the Blue Jays lineup, adding to the already formidable combination of Edwin Encarnacion and Jose Bautista. The Blue Jays had to trade three prospects and starting third baseman Brett Lawrie to get Donaldson, but Donaldson is well worth the investment. He has been the starting third baseman for the Athletics for two years and over that time he hit 53 home runs and was a top-10 MVP finisher in both 2013 and 2014. Donaldson broke out in 2013 with a .883 OPS and 64 XBH and had a bit of a letdown in 2014; he still finished with 29 home runs and 98 RBI in 2014, even though he struck out 20 more times and saw his OPS drop to .798.  There are not many power hitting third basemen in baseball and the Blue Jays are fortunate to have Donaldson, a top five 3B option.

The Blue Jays saw a couple needs in the offseason and two were filling a gap in the outfield left by free agents Melky Cabrera and Colby Rasmus, as well as finding a place in the rotation for top prospect Daniel Norris. By trading fifth starter J.A. Happ for Michael Saunders, and allowing Norris to slide into the rotation, both gaps were filled. Norris is the #25 ranked prospect according to MLB.com, with a 2.53 ERA last year and a 10.7 K/9 over his three minor league seasons. He may struggle a bit earlier in the season, but he could have a similar impact to 2014 rookie star Marcus Stroman with his power fastball and a strong slider/changeup combination. Norris may not have a huge impact to start the season, but could be an impact player later in the season.

Saunders was a bit undervalued in Seattle, but has a very interesting profile. He slots into the bottom of the projected Blue Jays lineup and has a little bit of a better profile than the man he is replacing, Colby Rasmus. Saunders is a very good defensive outfielder, but has had two seasons with more than 10 home runs and steals, while also posting three consecutive seasons with an OPS above league average. The only season where Saunders had 500 or more at bats, 2012, he posted 19 home runs and 21 steals; his OBP has risen from .306 in 2012 to .341 in 2014, so there is potential for Saunders to be even better with more opportunity in Toronto. Saunders was obtained for a very movable piece in Happ; if the Blue Jays are able to fill a major need in the outfield and only have to give up a fifth starter to do so, this would be a huge victory for the Blue Jays.

 

BOSTON RED SOX
2014 Regular Season Record (71-91)

The 2013 champion Red Sox bore no resemblance to the 2014 team that finished last in the AL East. As the Red Sox are a financial juggernaut, they were able to flex their muscles adding two former All-Stars and then traded for two All-Star pitchers in San Diego.

Pablo Sandoval has been an instrumental part of three Giants World Series and, after disappointment from Will Middlebrooks, will bring his talents to the Red Sox in 2o15. Much has been written about Sandoval’s streaky play and his free swinging ways, but Sandoval is a .294 hitter over his seven MLB seasons and averaged 44 extra base hits over the past four seasons. The switch hitting Sandoval will get a serious boost from the left side by hitting doubles off of the Green Monster; this is a needed boost as Sandoval has not had 30 or more doubles in a season since back-to-back 30 double seasons in 2009 and 2010. Only once in his career has Sandoval had more than 80 RBI and twice has he had 20 or more home runs; Sandoval’s value comes from his postseason experience and is a top 15 3B in a weak 3B crop.

Rick Porcello was a top prospect coming through the Tigers system, but really never broke through as a stable pitching option until his 15 win 2014 season where he had a 3.43 ERA. The Red Sox need a lot of pitching help, as they finished 10th in the AL in ERA, and Porcello’s ground ball tendencies may fit the Red Sox well. Xander Bogaerts will be more prepared at shortstop this season and Dustin Pedroia‘s defense up the middle will absolutely suit Porcello’s skills. Porcello is coming off of his first 200 inning season and has seen his WHIP go from 1.41 in his first four seasons to 1.25 in the last two seasons. He has seen his K:BB ratio rise over 3 as well and he is only 26 years old going into his seventh MLB season. That experience should be great for him coming into the grinder that is the AL East. Porcello has a career FIP that is 30 points less than his career ERA, showing that the talent is there for Porcello; look for him to breakthrough as an All-Star caliber pitcher this year.

Hanley Ramirez was the top hitter available and has been one of the most polarizing players over the past five seasons. Coming into 2010, he was the top fantasy baseball prospect, but saw his OPS go from .853 in 2010 to .742 combined in 2011 and 2012; he then posted a .907 OPS in 2013 and 2014, including a white hot 1.040 OPS in 88 games of 2013. Ramirez has twice before been a 50+ SB player and led the NL in BA in 2009, so the talent is there. But Ramirez has averaged only 121 games played since 2010 and has had two seasons where he played in less than 100 games.

Ramirez will also move to left field this season which should be a very interesting move for fantasy purposes; had Ramirez stayed at third, or even shortstop, he may have been a third round pick, but as an outfielder it is very questionable. There is a chance that Ramirez has less wear and tear in the outfield and becomes a top-10 hitter again, but a .282/.358/.467 slashline in the outfield is not worthy of a top-10 OF spot. A lot will be expected from Ramirez, but this may be the season that he is able to play 150 games of All-Star caliber play in the outfield, regaining his reputation as an MVP candidate.