It’s baseball season. Which means it’s fantasy baseball season. Which means I have to keep reminding myself that, even though it’s already been a month and a half, that’s still a pretty short time in the long rhythm of the season and every performance has to be viewed with skepticism. Ryan Zimmerman sporting a 0.293 On Base Percentage (OBP)? He’s not likely to end up there. On the other hand, Jake Odorizzi with an Earned Run Average (ERA) less than 2.10? He’s good, but not that good. I try to avoid making trades in the first few months (although with several players on my team on the Disabled List, I may have to break my own rule) because I know that in small samples, big fluctuations in statistical performance in the end are not really telling us much about actual player talent.
One of the big lessons I’ve learned from following baseball and the revolution in sports analytics is that one of the most powerful forces in player performance is regression to the mean. This is the tendency for most outliers, over the course of repeated measurements, to move toward the mean of both individual and population-wide performance levels. There’s nothing magical, just simple statistical truth.
And as I lift my head up from ESPN sports and look around, I’ve started to wonder if regression to the mean might be affecting another interest of mine, and not for the better. I wonder if a lack of understanding of regression to the mean might be a problem in our search for ways to reach better health.
Read the rest of this entry »
–comment from a chat at FanGraphs, September 24, 2014
So this comment caught my eye. Ever since I began following sites like BaseballProspectus.com and FanGraphs.com, and reading things like Moneyball, I’ve found myself thinking about efficiency and unappreciated or unexplored resources in different situations.
I realize this was a throwaway line in a baseball chat. But it piqued my interest because it seems to point out something that’s maybe underappreciated and understudied about how sports teams go about their business–specifically, the kinds of things they do to keep their athletes healthy.
My question is, does this represent a potential source of “Found Research” data that could help the rest of us reach wellness? more
This article originally appeared on my blog “Biotech, Baseball, Big Data, Business, Biology…”
It’s Fourth of July weekend in Seattle as I write this. Which means it’s overcast. This was predictable, just as it’s predictable that for the two months after July 4th the Pacific Northwest will be beautiful, sunny and warm. Mostly.
Too bad forecasting so many other things–baseball, earthquakes, health outcomes–isn’t nearly as easy. But that doesn’t mean people have given up. There’s a lot to be gained from better forecasting, even if the improvement is just by a little bit.
And so I was eager to see the results from a recent research competition in health forecasting. The challenge, which was organized as a crowdsourcing competition, was to find a classifier for whether and how rheumatoid arthritis (RA) patients will respond to a specific drug treatment. The winning methods are able to predict drug response to a degree significantly better than chance, which is a nice advance over previous research.
And imagine my surprise when I saw that the winning entries also have an algorithmic relationship to tools that have been used for forecasting baseball performance for years.
The best predictor was a first cousin of PECOTA. Read the rest of this entry »
The MLB draft is coming up and with any luck I’ll get this posted by Thursday and take advantage of web traffic. I can hope! (ed. note: nope) Anyway, Tuesday on FanGraphs I read a fascinating portrayal of the draft process, laying out the nuts and bolts of how organizations scout for the draft. The piece, written by Tony Blengino (whose essays are rapidly becoming one of my favorite parts of this overall terrific baseball site), describes all the behind the scenes work that happens to prepare a major league organization for the Rule 4 draft. Blengino described the dedication scouts show in following up on all kinds of prospects at the college and high school levels, what they do, how much they need to travel, and especially how much ground they often need to cover to try and lay eyes on every kid in their area.
One neat insight for me was Blengino’s one-word description of most scouts as entrepreneurs. You could think of them almost as founders of a startup, with the kids they scout as the product the scouts are trying to sell to upper layers of management in the organization. As such, everything they can do to get a better handle on a kid’s potential can feed into the pitch to the scouting director.
I respect and envy scouts’ drive to keep looking for the next big thing, the next Jason Heyward or Mike Trout. As Blengino puts it, scouts play “one of the most vital, underrated, and underpaid roles in the game.” While one might make the argument that in MLB, unlike the NFL or NBA, draft picks typically are years away from making a contribution and therefore how important can draft picks be?, numerous studies have shown that the draft presents an incredible opportunity for teams in building and sustaining success. In fact, given that so much of an organization’s success hinges on figuring out which raw kids will be able to translate tools and potential into talent, one could (and others have) made the argument that scouting is a huge potential market inefficiency for teams to exploit. Although I’ll have a caveat later. But in any case, for a minor league system every team wants to optimize their incoming quality because, like we say in genomic data analysis, “garbage in, garbage out.”
As I was reading this piece, I started thinking about ways to try and create more efficiencies. And I started thinking about Big Data. Read the rest of this entry »
This post originally appeared on my blog Biotech, Baseball, Big Data, Business, Biology…
The world would be a simpler place, although maybe a much more boring and predictable one, if every aspect of performance could be measured directly. My completely unoriginal thought here is that one of the reasons sports appeal to so many people is because they provide clarity. In a confusing, complex world where the NSA is sucking up our information like a Dyson vacuum sucks feathers in a henhouse, and we’re told this is for our own good, clarity can be refreshing.
The simple view of an athlete’s performance is that all the accolades (or jeers), all the milestones (or flops), all the accumulated statistical totals (or lack thereof) are because of that athlete’s ability: his or her drive, passion, training, and natural ability. And that performance is measured via the statistics each sport collects and chooses to honor and promote. Performance is right there, what more do you need? What more could you want?
But much as we might find the simple view intuitive and appealing, it’s also incorrect. Not only are some of those statistics, at best, clearly crude proxies for true ability, they are also often (always?) dependent upon context. Where and to whom did that quarterback throw all those touchdown passes? Which coach directed that basketball player during the prime of her career to play in a style that complimented (or confounded) her natural tendencies and strengths? What elements of that slugger’s personal life were in shambles the year he broke into the major leagues and thrived/struggled, and what difference did it make?*
If sports analysis is moving in any direction, I like to think it’s moving towards a nuanced, humble view of sports performance that accepts the statistics, the measured performance, the team won-loss record, as proxies at best, distant cousins twice-removed from what we are most curious about: who’s good? How good? Was/were he/she/they ever the best? What does this record mean in absolute terms, if such a Platonic thing could ever exist? We might try to find better ways to measure performance and context, but we’ll always be approaching the asymptote, never quite getting there.
And if athletic performance is so hard to measure, how much harder is it to measure those whose actions are another step yet away from the statistics, the solid measurements produced on the field?
What makes a good manager or coach, and how can we tell?
This is a topic of endless debate, and for good reason. Although they are not often paid like it, there is among many a feeling that the manager or head coach is one of the key elements underlying athletic and team success. As Bum Phillips said of Bear Bryant, “He could take his’n and beat your’n, or take your’n and beat his’n.” This is maybe the ultimate expression of the belief that a coach is what makes the team what it is.
Whether that’s true or not is the question, though. It’s clearly not simple. There have been efforts in the sports analysis community to try and figure out how much coaches and managers matter although sometimes these efforts suffer a little too much from retrospective analysis. For example, “these managers’ teams had winning won-loss records, so therefore they are better managers. Let’s look at the traits they have in common and say those are the traits that let us classify managers into good and bad.” These kinds of analyses are, I believe, over-fitting the data, and it’s often not long before contrary examples pop up.
So what to do? Well, a working paper put out by the National Bureau of Economic Research showed one possible way (thanks to @freakonomics and @marketplaceAPM)
The methodology may not be completely applicable to the sporting environment, or even to most business environments but sports I think is a closer match than most because of the nature of player and coach movements (as we’ll see in a bit).
This study, by researchers at Stanford and the University of Utah, attempted to answer the question of how much bosses are worth to employee performance. And the method they used, frankly, was based on brute force. They first had to find a business situation that would offer them a huge sample size (23,878 workers, 1940 bosses, and 5,729,508 worker-day measurements of productivity) and a clearly quantifiable and electronically captured measure of productivity: technology-based services (TBS). Think of jobs like retail clerks or call center operators where specific actions are repeated and logged; the specific business that was studied remains nameless as a condition of the research. And the third characteristic that made this work is that this particular company also moves employees from boss to boss on a regular basis — in general once or twice a year.
Let me digress for just a second to expand on why this is so important. In clinical research the gold standard is the double-blinded, placebo controlled trial. Which this was not. But it’s a good deal more rigorous than an anecdotal, under-powered observational study. Essentially, their study design is a retrospective, (effectively) randomized crossover study. This allows the performance of each individual to be compared both within the period of time he or she is working for a given boss as well as across different bosses. The accumulation of so many data points allowed the researchers to build statistical models that could isolate the effect of specific bosses on performance even given the vast amounts of noise that are inherent in the day-to-day performance of these employees.
In addition, their model is designed to discovery, a priori, which bosses are best rather than relying on any information from the company under study. In other words, factors such as won-loss records and championships and media savvy don’t enter into the equation. Whether the company or the researchers are going back to their data and corroborating it now with surveys and opinions of the employee, bosses and upper management I don’t know, but that would be fascinating, wouldn’t it?
To give a very high level of summary of their work, they created a mixed model of human capital as the product of talent and effort. Each of those two elements was then further broken down into components that are and are not under the influence of one’s boss. Next, estimation methods were used to approximate the relative effect of different components within the model, including those due to the boss, based on the shape of the overall dataset.**
They uncovered several possible effects in their analysis, the primary one being that top bosses can result in about a 10% increase in the productivity of his or her group relative to the worst bosses. They also found that a good boss seemed to affect worker retention, and that there was a small but significant effect of pairing good workers with good bosses.
Generalizing the specific findings directly in any way to sports management is completely unwarranted. There are several key differences between the situation they analyzed and the team environment; these should not be overlooked, such as: the diversity of actions taken by any individual athlete in a team setting (as opposed to rote, repetitive work like taking reservations in a call center); the effect of peers in a sports environment likely being greater than the work situation described in this study (i.e., workers being generally autonomous in their tasks), and that athletes often get different bosses by moving between establishments (teams) whereas the current study examined a single company.
However, what I think is worth exploring is the question of whether a similar methodology could be applied to sports teams. Let me just say that I will not attempt an exploration myself, I’m just pointing out the possibilities. So anyone hoping for a big take-home message can stop now. Sorry for taking five minutes of your life!
Here are what I see as the requirements of a sport that would allow generation of a large and diverse enough dataset.
1) Specific measurements of output. As discussed above in the way-too-long-winded introduction, one thing sports has plenty of are measurements of output. Except soccer. What do people measure in soccer? YouTube video highlights of great runs followed by missed kicks?***
2) A large number of transitions of coaches/managers and players between situations. Fortunately in this age of free agency, trades and hot seats, there are routinely numerous changes of players and coaches/managers every season. Also fortunately, players and coaches/managers often get multiple chances with different teams and situations.
3) Enough data. This is a tough one. Off the top of my head it seems baseball and basketball are really the only sports that have enough granularity, a long enough season, and sufficient numbers of teams and players to make this work. Maybe hockey. American football, probably not.
However, it seems worth a try. To explore this idea further with baseball as an example, one could choose to isolate one component of performance such as a hitting statistic. Since we would want to measure something that is both generally agreed upon as positive and also something that stabilizes relatively quickly to best reflect effect of coaching, one could pick strikeout rate (60 plate appearances (PA)), walk rate (120 PA), or singles rate (290 PA). It should be stated up front that this means the effect of the manager only on that particular skill will be seen. Probably the entire analysis would need to be repeated for each of several offensive statistics to create a composite and granular picture of how a given manager influences players under his direction.
One could then use individual game performance as the time component of the model and collect data on that specific metric over time and relate that to which managers a given player had and for how long. The null hypothesis would be that the effect of managers would be nothing, and so the result of the model we would look for are signs that specific managers do make a substantial difference in performance by the players under his instruction compared to those same players before and after being on that manager’s team.
Is there enough data for a signal to be seen? You know, in my day job as a genomics researcher this is probably the main question I get from scientists wanting to perform an experiment: is the number of experimental subjects big enough? And my answer is always the unsatisfying, “We won’t know for sure until we do the experiment and compare the natural variation to the effect size.” Same answer here.
And why bother? Well, I go back to what I said earlier about approaching the asymptote and trying to learn more. Not just in sports, but in so many other parts of life, there are elements that right now are in the realm of intuition and anecdote and subjectivity. Who’s a good CEO? What public policy interventions do good and, more important perhaps, are the most cost effective? Wired magazine just had a nice article about the use of controlled trials to measure the actual effect of public policy interventions in the developing world. In our search to make the world a better and more understandable place, we owe it to ourselves to keep asking questions and trying to come up with ways to answer them.
*Notice here, by the way, that you can take either situation–thriving or failing–and make up a completely believable story in your head about how that player’s personal life played a role in his performance. How he rose above the conflict, or the field was his refuge, or his anger or frustration fed his on-field performance. Alternatively, how he’s a tragic figure, his potential derailed by drugs/philandering/emotions, making him an all too human and very sympathetic figure. This is because our minds are programmed to make up stories, to find cause and effect, to indulge in the narrative fallacy. Be careful of that. It will screw up your thinking faster than anything else.
**Just for fun, here’s one of the equations from their model.
This roughly translates as: An individual worker (i)’s output (q) at time t is equal to the ability of the mean worker (alpha sub zero) plus the specific worker’s innate ability (alpha sub i) plus the set of variables outside the worker-boss interaction (X sub it times capital Beta) plus the ability of an average boss (d sub 0t) divided by team size (N sub jt) to the theta power, where theta is related to public versus private time with the boss plus the ability of the current boss (d sub jt) divided by the team size to theta. This equation relates to the current effect. A longer version of the equation tries to take into account the effect of past bosses and the persistence of boss effects.
***I am reminded of one of the fine haikus inspired by the 1994 World Cup tournament in the US, source sadly lost to me although if anyone remembers it, let me know:
“Run, run, run, run, run
Run, run, run, run, run, run, run
Run, run, pass, shoot, miss.”
This post originally appeared in slightly different form on my blog: Biotech, Baseball, Big Data, Business, Biology…
Baseball players break down. Their performances fluctuate. As a group there are some interesting generalities with respect to how pitching, hitting and fielding change with age. But the error bars are huge. There are many things we still don’t know about baseball players, about why one prospect hits the ground running and another flames out. And we also don’t know if there is any way to know, since the task of putting together the skills needed to play major league baseball may be one of the most complex of the major sports, and understanding complexity is hard.
But it seems worthwhile to give it a try.
The Mystery of the Missing Ligament
Let’s talk about R.A. Dickey for a minute. Not because he’s a highly interesting human being, although he is. And not because he’s a knuckleballer, which is fun and interesting due to rarity and the entertaining sight of six foot athletes flailing at baseballs traveling with the flight path of a drunken small-nosed bat. But rather because he was drafted in 1996 in the 1st round by the Texas Rangers, and only during his physical workup was it discovered that he was missing a key ligament in his arm. The Ulnar Collateral Ligament (UCL) to be exact. Without which, it is assumed, a pitcher cannot pitch.
Well, except that he did. This shouldn’t be under-emphasized. Pitching without a UCL is thought to be akin to trying to play tailback for the Seahawks without an Anterior Cruciate Ligament (ACL) in your knee. And yet he pitched and pitched well for years without a UCL. RA Dickey got his UCL replaced and then knocked around the major and minor leagues for several years, eventually learned how to throw a knuckleball, and now has pitched successfully in the majors for several years more.
A story like this illustrates two points. One, we may be making assumptions that aren’t always supported by the data—for example, that the UCL is required for pitching. And two, you can learn a lot just by looking and measuring.
Measure by Measure
What should be measured and how? I think an area to look into might be the tools being developed now to support self-measurement. The quantified-self movement has gained enough prominence that magazines like Newsweek are running profiles. For people in the movement, the motivation for participation stems from a desire to better understand themselves; to have data that will give them a data-driven view of what is going on in their bodies and minds. The goals are often better health, losing weight, tracking mood, athletic prowess, increasing the levels of good indicators and decreasing the levels of the bad.
One of the distinguishing elements of how this is being done is granularity. Apps on a smartphone, portable electronic devices, and logging tools can capture data in intervals ranging from several times a day up to a more or less continuous stream. Even tests and procedures that might normally be performed once a year at an annual physical become fair game for more frequent monitoring, as long as you have the money to pay for the testing. The open question is whether collecting all of this data will reveal new insights. Or, to put it graphically, if you tested a metric infrequently, and got this graph:
Would the result of more frequent testing look like this?
Or like this?
This example is borrowed from the site of Ginger IO, a company that is developing tools for continual measurements of health related metrics, among other things.
Where baseball comes in to this is I believe MLB teams are continually in a search for new ways to gain an advantage in building a quality team. You know, that extra 2%. A baseball team has vast resources, and those resources are focused on getting the most out of the several hundred baseball players that comprise the major and minor league talent of the team. There are trainers, and doctors, and team dieticians, and masseuses, and coaches. What would it take to add an additional technological and analytical group dedicated to gathering data on the players and seeing whether any of this information provides additional retrospective or prospective insight into individual performance?
Here is where an enterprising team could probably reach out to a couple of different groups for help in setting this up. One would be device and software manufacturers who are building tools in this space. I’ve written before about EmotionSense and have also learned recently about GingerIO (HT to @Dshaywitz). Another highly interested party would be the nearest medical school and those researchers looking into patient reported outcome (PRO) techniques and patient monitoring efforts. If an MLB team doesn’t already have its own high-powered statistical analysis group (or even if it does), it could reach out to suppliers of software tools for analyzing large scale datasets and finding patterns, like Ayasdi or Google.
I could also see a viable group for a partnership being other professional sports teams. Many MLB teams are in the same city as NFL, NBA, NHL, and/or MLS franchises. To spread the investment costs as well as providing control groups for each other, it would be useful to collaborate with these other franchises to learn more about the effect of sports training in general.
A speculative area for data collection and analysis could be in genomics, transcriptomics and proteomics. Michael Snyder of Stanford University has been demonstrating for some years now how a program of monitoring personal molecular information about one’s health, along with other more conventional measures, provides new insights into health and disease.
The metrics should also include the conventional. Going back to the example of R.A. Dickey, wouldn’t it be useful to perform elbow and shoulder scans for every player on major and minor league rosters on at least a yearly basis? So often in sports you hear the term “typical wear and tear” when describing an elbow or shoulder or knee. My question is, how do you know it’s typical? Until you have a large, well-defined baseline that you follow for years under the rigorous conditions that baseball players are subjected to, how can you know what real wear and tear is? And if you did know, wouldn’t that help you in making decisions about training and protecting your own players, to say nothing of evaluating free agents? One of the truisms of baseball is that every team knows more about their own players than anyone else, leading to information asymmetry in trading and signing. It seems an imperative for each team to reduce or reverse that asymmetry if at all possible.
An additional area that personal monitoring could help in is understanding on-field performance. I’ve already touched on how MLB could use various kinds of GPS and positioning sensors to more accurately measure defense, for example, so I won’t elaborate further except to point out Chip Kelly is bringing this approach to the Philadelphia Eagles, and it will be interesting to see if we get reports on the effectiveness of using GPS to monitor his NFL players’ movements.
Another benefit of building a baseline for different kinds of metrics in your team would be helping to detect the possibility of doping. This seems to be in the news right now for some reason, so let me just say that if a team began collecting, analyzing and storing biological samples on a regular basis, this would help in detecting those who are taking performance-enhancing substances. This isn’t a new idea; the World Anti-Doping Agency is advocating this approach already. However, I think MLB could take it to a high level of rigor and quality. Would this have to be negotiated? Sure. But there is probably no better time than now to see if such an agreement can be forged between the union and the MLB owners.
Essentially, by taking samples from enough players over time, as well as healthy, age and ethnicity-matched volunteers as a control group, an MLB team could build up a comprehensive profile of what normal is with respect to the known indicators of performance enhancement such as hemocrit levels, not just as an average, but on an individual basis. With this kind of data, a rapid, unusual change in specific metabolites could provide grounds for more intensive investigation. When athletes come up with a positive test, a standard argument has been that he or she always has had an unusually high level of the tested substance. Well, you know, the only way to know that for sure is to have a record dating back years that demonstrates outlier status or not for that athlete and that test. Continual sampling is almost certain to deter many would-be attempts to use performance enhancing substances.
This would be invasive. No doubt about it. Which is why there should also be stringent controls on data and better maintenance of privacy than we’ve seen so far in the Biogenesis saga. However, there is also probably no better time to negotiate these kinds of tests as baseball strives to clean its image again.
Too much data?
Of course, collecting all this data provides no guarantee of actually finding out something specifically useful and actionable for any given MLB team. As Nate Silver has pointed out many times in his columns and book, given enough data you can find a correlation for almost anything. However one thing is certain: you can’t find new things when you don’t look, and trying to apply concepts of the quantified self to MLB teams will lead to a whole lot of cross-discipline interactions and innovative thinking, which a forward-looking team might be able to parlay into the next big market inefficiency in baseball.