Author Archive

Dominican Major Leaguers and the Provinces They Hail From

It shouldn’t come as any great surprise to a typical baseball fan that Dominican players play an outsized role in Major League Baseball today. In fact, the Dominican Republic, which has a population roughly just 3.3% that of the United States, supplies MLB with upwards of 10% of its players. Major League Baseball and baseball fans are better off because of this. After all, who wants to live in a baseball world without Nelson Cruz or Fernando Tatis Jr., for instance?

With this point in mind, the following takes a look at players from the Dominican Republic. More specifically, where in the D.R. players were born and when they made their way to MLB. What follows will be split into three brief sections: a description of the data utilized, some insights into the growth of the D.R.’s influence in MLB, and finally some map-based depictions of the players’ provinces of birth within the Dominican Republic. Read the rest of this entry »

Using Decision Trees To Classify Yu Darvish Pitch Types

Last year, I wrote a post which outlined the application of a K Nearest Neighbors algorithm to make pitch classifications. This post will be, in some ways, an extension of that as pitches will yet again be classified using a machine learning model. However, as one might have presumed given this post’s title, the learner of choice here will be a decision tree. Additionally, this time around, instead of classifying pitches thrown over the course of a single game I will aim to classify pitches thrown by a single pitcher over the course of an entire season.

What follows will be divided into three sections: a brief conceptual explanation of decision tree learners, a description of the data and steps taken to train the decision tree model of choice here, and finally a run-through of the model’s results. I am not an expert on machine learning, but I believe that this is an interesting exercise that (very, very basically) highlights a powerful model using interesting baseball data. The work to support this post was conducted in scripting language R and with the direction of the book Machine Learning with R by Brett Lantz. Read the rest of this entry »