Shape+ v2.0: Isolating Pitch Quality With Relative Physics and Additive Modeling

Troy Taormina-Imagn Images

Last year, I finished Shape+ v1.0, my attempt at a novel approach to pitch modeling, and submitted it to the FanGraphs Community Blog. However, I have since identified several issues with the original framework. In short, v1.0 included a random effect for “PitcherID” — it allowed the model to implicitly learn pitcher identity and use it as a crutch when generating predictions.

Instead of evaluating a pitch in a vacuum as I intended, the model would defer to the random effect, which has soaked up all information not explicitly coded into the models mixed effects. What’s more, the random effect also prevented the model from generalizing. If a pitcher was not included in the training data, it would not be assigned an intercept by the random effect, and therefore would not be scored. In an attempt to rectify this, I decided to try a different approach. Shape+ v2.0 is normally distributed, with 100 representing league-average pitch shape, and each 10-point increase representing one standard deviation. The new Shape+ remains a stable and sticky metric, exhibiting an R2 of 0.881 between 2024 and 2025 scores, and stabilizing after 50–80 pitches depending on the pitch type.

Model v2.0

The biggest shift, at least conceptually, is that v2.0 no longer evaluates pitches in a vacuum. In a real game, hitters aren’t deciding to swing based on isolated physics; they are using context (pitcher tendencies, counts, sequencing, how previous pitches have moved). Since hitters process pitch shape relatively, I wanted the model to work the same way.

To capture this relative movement, I trained two Ordinary Least Squares models to calculate what a pitch’s VAA and HAA should be based on its release point and velocity. This allowed me to isolate the residual and mathematically model which specific shapes hitters struggle to perceive the most, and why.

Much different from the 4-layer hierarchical mixed-effects framework used in v1.0, v2.0 is a single-layer Generalized Additive Model (GAM). However, by employing random effect splines for pitch types, I was able to find the marginal contribution of each physical features. I also elected to exclude spin rate from the model, as the features in place already act as proxies that implicitly capture spin rate.

Features included in v2.0 are as follows:

Physical Characteristics
• Release_speed: Velocity
• Release_extension: Distance from rubber at release
• Release_pos_z: Vertical release height
• Rel_x_sym: Horizontal release point, mirrored for handedness
• Pfx_z: Vertical break
• Pfx_x_sym: Raw horizontal break, mirrored for handedness

Engineered Features
• ivb_diff: Induced vertical break relative to pitch type average
• hb_diff_sym: Horizontal break relative to pitch type average, mirrored for handedness
• vaa_resid: Vertical approach angle residual, angle relative to height/speed
• haa_resid_sym: Horizontal approach angle residual, angle relative to pitch type average, mirrored for handedness

Tensor Interactions
• ti(release_speed, pfx_z): The interaction between velocity and vertical break
• ti(release_speed, vaa_resid): The interaction between velocity and vertical approach angle

Categorical Variables
• p_throws: Pitcher handedness
• pitch_type: Uses a random effect to set unique baseline for each pitch type

Data Processing

I used 2023 and 2024 MLB Statcast data in the training of this model. To prevent data leakage and allow testing of the model’s predictive abilities, I held out the 2025 season for validation.

To prepare the raw tracking data for modeling, the following steps were taken:

• Standardized formatting: I converted raw vertical and horizontal break from feet to inches.
• Filtered noise: I removed NAs, broken rows, and position player pitches.
• Calculated approach angles: VAA and HAA are not standardly included in the raw Statcast columns, so I had to calculate them myself.
• Mirrored horizontal data: I flipped horizontal release and movement profiles for lefties so that positive values represent arm-side run, and negative represent glove-side sweep.
• Calculated flight time: I found exact the flight time of the ball.

Location Neutrality

In order to strip location from the model, I first calculated the mean horizontal and vertical break for each pitch type. Then, I built two linear OLS models to predict the expected approach angle for every pitch in the dataset:

• vaa_anchor_model <- lm(vaa ~ release_speed + plate_z, data = mlb_train)
• haa_anchor_model <- lm(haa_sym ~ release_speed + plate_x_sym, data = mlb_train)

These models establish the league-average approach angle for a given velocity and plate location. By subtracting the observed approach angle from the expected value the OLS model gives, we can isolate how unique a pitch’s angle is relative to its speed and height. The same idea holds for the ivb_diff and hb_diff. Calculating how much more movement a pitch possesses than expected allows the model to evaluate movement in a more relative sense.

Once these features were engineered, I ran a separate linear model on delta run expectancy against count and location. The subsequent residual (the portion of a pitch’s run value that cannot be explained by location or count) became the dependent target variable for the v2.0 model. This framework forces the model to explain pitch success using only physics.

Model Engine

Because the training dataset was so large (approximately 1.420 million pitches), I opted to use a BAM (Big Additive Model) to better handle the computational load. A BAM is a computationally efficient version of a Generalized Additive Model, specifically designed to handle massive datasets.

Key components of the BAM framework are as follows:

• Splines: These allow the model to learn complex, non-linear relationships between features and run expectancy without forcing them into more rigid forms.
• Tensors: These enable the model to learn multi-dimensional interactions between specific terms, i.e. how the impact of VAA changes with velocity.
• Random Effects: Including a random effect for pitch type allows the model to learn that different pitch types have fundamentally different rules for success.

Establishing Baseline for Scoring

Once the model’s training was completed, I used the predicted run values and calculated the mean and standard deviation of the results using only 2023–2024 data. By doing so, I ensured the mean remains static regardless of future data fluctuations. This allowed for a scoring system in which 100 was average, and each 10-point increase represented one standard deviation above average.

Results and Validation

To most accurately test the model’s predictive performance, Shape+ v2.0 was evaluated against the unseen 2025 Statcast data. I examined its descriptive and predictive correlations with several metrics including SIERA, xFIP, xwOBA, xERA, xBA, xSLG, K%, and SwStr%.

• Predictive Error: RMSE = 0.226, MAE = 0.121
• Stability: On unseen 2025 data, Shape+ demonstrated a 0.757 correlation or agreement with Stuff+. (Note: It is unknown if Stuff+ used 2025 data in training.)
• Run Prevention: The model showed statistically significant negative correlations with: SIERA (r = -0.499 descriptively, r = -0.478 predictively), xERA (r = -0.431 descriptively, r = -0.391 predictively), and xFIP ( r = -0.430 descriptively, r = -0.433 predictively).
• Whiff Generation: Shape+ also demonstrated correlations with K% (r = 0.475 descriptively, r = 0.501 predictively) and SwStr% (r = 0.428 descriptively, r = 0.474 predictively).
• Contact Quality: xSLG (r = -0.488 descriptively, r = -0.464 predictively), xBA (r = -0.488 descriptively, r = -0.466 predictively), and xwOBA ( r = -0.469 descriptively, r = -0.427 predictively).

Limitations

Shape+ is not without limitations. The most notable constraints are listed below:

• Extrapolation: Because the model relies on splines to map non-linear relationships, it can produce unpredictable grades at the extreme boundaries of the dataset, meaning it struggles to accurately evaluate rare or unique pitch shapes it has not seen in training.
• Biomechanical Tradeoffs: In its current state, the model ignores human physiological limits/constraints. (Example: The model might suggest a pitcher could improve a pitch by adding x amount of induced vertical break, but it cannot currently account for the fact that doing so may require adjustments that affect other variables.)
• Location and Command Agnostic: This is an intentional “shortcoming.” The model does not account for location, command, or sequencing, in order to better isolate the pitch shape’s impact on run prevention.
• Linear Anchor Models: The VAA and HAA baselines are built using simple linear models, which don’t fully capture how those variables actually behave. For most pitches this works fine, but it can introduce some bias for more extreme shapes.
• Additive Assumption: The model treats location and count as things that can be cleanly removed from run value. In reality, pitch movement and location interact, so some of that context is likely still baked into the target.

Case Study

Now for the fun part! Let’s take a deeper look at Mariners All-Star starter Bryan Woo. Below we can see his Shape+ scores by pitch type, and a quick glance at his Stuff+ scores for reference.

While many are already familiar with Woo’s elite, flat-VAA four-seamer, the Cal Poly product’s sinker grades out a tick better by Shape+. To see why, we can take a look at a more granular breakdown of his 106 Shape+ score.

After deconstructing the components of Woo’s sinker score, we see that velocity, release height, and horizontal run are his primary drivers, while his flat VAA is actually hurting his sinker score a bit. Release height and velocity are difficult to alter, but by predicting Shape+ across 30 hypothetical Woo sinkers (identical except for horizontal run between 0 and 25 inches) we are able to see exactly how that changes the model’s grades.

Adding two inches of horizontal run alone could take Woo’s sinker from a 106 to a 108 Shape+!

This provides a good opportunity to highlight both a necessary caveat and a strength of the model. The left side of the curve (from about 0 to 8 inches) represents an area of extrapolation. Because sinkers with such little movement do not exist in the dataset, the model is forced to guess. However, when dealing with pitch shapes present in the training data, the model provides reliable grades that can in theory be applied to development.

More broadly, we can also look at some of what the model says makes sinkers, changeups, and curveballs effective.

Finally, we can pull the top 10 and bottom 10 pitchers in 2025 by Shape+ (minimum 1,500 pitches).

Conclusion

Shape+ v2.0 isolates a pitch’s pure physical quality by training exclusively on the run-value residual that remains after stripping away all location and count context. By moving to a single-layer Generalized Additive Model (GAM), the metric can no longer rely on a pitcher’s identity to evaluate quality. Instead, it grades a pitch’s movement and approach angle relative to what is mathematically expected for its specific velocity and release point, rather than judging those metrics in a vacuum. Ultimately, this makes Shape+ an objective, physics-based tool for pitch design and arsenal evaluation.





Comments are closed.