Machine Learning and Statistical Prediction of Fastball Velocity
Machine Learning and Statistical Prediction of Fastball Velocity
Journal of Biomechanics
journal homepage: www.elsevier.com/locate/jbiomech
A R T I C L E I N F O A B S T R A C T
Keywords: In recent years, one of the most important factors for success among baseball pitchers is fastball velocity. The
Calibration purpose of this study was to (1) to develop statistical and machine learning models of fastball velocity, (2) to
Root mean square error identify the strongest predictors of fastball velocity, and (3) to compare the models’ prediction performances.
Kinetic Chain
Three dimensional biomechanical analyses were performed on high school (n = 165) and college (n = 62)
Baseball
Performance
baseball pitchers. A total of 16 kinetic and kinematic predictors from the entire pitching sequence were included
Pitching in regression and machine learning models. All models were internally validated through ten-fold cross-vali
dation. Model performance was evaluated through root mean square error (RMSE) and calibration with 95%
confidence intervals. Gradient boosting machines demonstrated the best prediction performance [RMSE: 0.34;
Calibration: 1.00 (95% CI: 0.999, 1.001)], while regression demonstrated the greatest prediction error [RMSE:
2.49; Calibration: 1.00 (95% CI: 0.85, 1.15)]. Maximum elbow extension velocity (relative influence: 19.3%),
maximum humeral rotation velocity (9.6%), maximum lead leg ground reaction force resultant (9.1%), trunk
forward flexion at release (7.9%), time difference of maximum pelvis rotation velocity and maximum trunk
rotation velocity (7.8%) demonstrated the greatest influence on pitch velocity. Gradient boosting machines
demonstrated better calibration and reduced RMSE compared to regression. The influence of lead leg ground
reaction force resultant and trunk and arm kinematics on pitch velocity demonstrates the interdependent rela
tionship of the entire kinetic chain during the pitching motion. Coaches, players, and performance professionals
should focus on the identified metrics when designing pitch velocity improvement programs.
* Corresponding author at: 1 Medical Center Blvd, Winston Salem, NC 27103, USA.
E-mail address: [email protected] (K.F. Nicholson).
https://doi.org/10.1016/j.jbiomech.2022.110999
Accepted 10 February 2022
Available online 15 February 2022
0021-9290/© 2022 Elsevier Ltd. All rights reserved.
K.F. Nicholson et al. Journal of Biomechanics 134 (2022) 110999
has been related to the prioritization of velocity during scouting and multicomponent force plates (AMTI, Watertown, Massachusetts)
signing of professional contracts (Arthur). Naturally, this has resulted in embedded in the Perfect Mound (Porta-Pro Mounds Inc, Sauget, Illinois;
increased focus on pitch velocity development at all standards of play Fig. 1) which was created to MLB specifications. Force plate data were
(Arthur). collected at 1200 Hz. Pitchers were allowed to wear their cleats. Ball
Pitching a baseball is a series of complex, multifaceted movements velocity was recorded with a TrackMan device (TrackMan, Vedbæk,
that produce high forces throughout the entire kinetic chain (Aguinaldo Denmark).
and Escamilla, 2019). The kinetic chain consists of the transmission of Each pitcher [age: 16.7 (3.2) years, body mass index: 24.4 (1.2) km/
forces generated in the lower extremity, through the trunk, to the upper m2, left-handed: 21%, high school: 80%] went through their normal
extremity, and ultimately translates to ball propulsion (Naito and Mar pregame warm-up period, consisting of dynamic and static stretching,
uyama, 2008; Naito et al., 2017). Disruption or inefficiency, such as and throwing to at least 60 m (197 feet). The number of throws during
timing errors or suboptimal alignment, within the pitching sequence can warm up were not specified to best simulate individual practice condi
lead to decreased ball velocity (Seroyer et al., 2010). While a majority of tions. Pitchers then threw pitches to a target at a regulation distance
baseball research has focused on elucidating causes for injury (Bullock (18.4 m; 60 feet and 6 in.). Only the fastball data were analysed for this
et al., 2020), there have been a few studies looking at how mechanics or study. Four fastballs were recorded, with three fastballs used for data
timing relate to pitch velocity (Stodden et al., 2005). Many of these analysis. Pitch selection was based on data quality. Data were processed
studies investigate pitching mechanics in the context of arm kinetics and and variables were calculated with Visual3D (C-Motion, Inc. German
pitch velocity (Bullock et al., 2021), while very few examine how me town, Maryland). Pitching models were defined using the PitchTrak
chanics influence velocity as an isolated endpoint (Naito et al., 2017; (Motion Analysis Corporation, Santa Rosa, California) model (A. L.
Sgroi et al., 2015). Additionally, the majority of the literature includes (Aguinaldo et al., 2007), and segment coordinate systems were defined
correlational studies with one or two variables and small sample sizes according to the International Society of Biomechanics recommenda
(Sabick et al., 2004; Post et al., 2015). However, when considering the tions (Wu et al., 2005). Variables included in model development were
entire kinetic chain, full body mechanics, and pitching cycle sequencing, extracted from pitching reports using Visual3D scripts.
there are hundreds of variables that could influence pitch velocity
(Aguinaldo and Escamilla, 2019; Naito and Maruyama, 2008; Naito 2.4. Statistical analyses
et al., 2017).
Currently, there is a lack of quality investigations evaluating the Descriptive statistics are reported as mean (standard deviation) or as
relationship and influence of multiple biomechanical variables on pitch a percentage for pitcher anthropomorphics, handedness, and pitching
velocity production. Additionally, research suggests that a minimum of kinematics and kinetics. All data was investigated for missingness prior
40% of mechanical issues can be corrected once they are appropriately to analyses. Missing data < 1% for all data, with a complete case analysis
identified (Fleisig et al., 2018); however, there are no suggestions on performed. All analyses were performed in R version 4.02.
which modifiable kinematic, kinetic, and spatiotemporal variables are
the most influential in relation to pitch velocity. Statistical prediction
modelling employs to obtain a risk or probability. (Moons et al., 2009)
More recently, machine learning (ML) may allow increased ability to
decipher higher order interactions. (Ogundimu et al., 2016; Collins
et al., 2016). ML has recently been found to improve prediction per
formance in baseball pitching arm strain. (Nicholson et al., 2022).
Therefore, the purpose of this study were (1) to develop statistical and
ML models of fastball velocity, (2) to identify the strongest predictors of
fastball velocity, and (3) to compare the models’ prediction
performances.
2. Methods
2.2. Participants
2
K.F. Nicholson et al. Journal of Biomechanics 134 (2022) 110999
An a priori sample size calculation was performed to reduce the 2.7. Statistical model
chance of overfitting using the pmsampsize package. (Riley et al., 2020)
Prediction model sample size calculations for continuous outcomes Linear regression models to predict fastball pitch velocity were
require the outcome mean and standard deviation, the R2, the antici developed with all 16 predictor variables. Linearity was not assumed,
pated number of parameters included in the model or the number of and all predictors were assessed for non-linearity with restricted cubic
included participants, and the anticipated shrinkage to reduce optimism splines using the R package rms. A restricted cubic spline is a non-linear
bias (Riley et al., 2020). The mean fastball pitch velocity was 36.1 m/s piecewise polynomial (non-linear calculation), joined at specific knots
with a standard deviation of 3.41 m/s. A total of 277 participants were throughout the data. Knots are quantile mark points in which each
included in the study. From previous literature, a R2 of 0.89 was utilized segment (between each knot) is assessed for potential non-linear re
with a shrinkage of 0.90 (Stodden et al., 2005). Therefore, a maximum lationships. The range of data is joined at each successive knot, allowing
of 22 parameters could be included in a prediction model with reduced for different non-linear relationships to be assessed throughout the en
chance of overfitting. tirety of the data. Non-linearity was determined through Akaike’s In
formation Criteria (AIC), residuals, visual inspection, and biological
2.6. Predictor reduction plausibility (Durrleman and Simon, 1989). To reduce the risk of over
fitting, an elastic net was performed. Elastic net is a penalized method
Due to the a priori sample size calculations, biomechanical predictors that assesses multicollinearity, and shrink the coefficients to reduce the
were required to be reduced to create accurate stable models. All risk of overfitting (Philp et al., 2020). Further, elastic net allows for
biomechanical variables from setup through release were considered. predictor selection and incorporates these results into the overall model,
Terms were chosen based on a review of the literature (Bullock et al., and has been found to have improved results compared to more tradi
2020; Bullock et al., 2021) and clinical and expert opinion. Variables tional statistical methods (Philp et al., 2020). To find the optimal alpha
that are not directly modifiable, such as joint moments, were not con and lambda shrinkage parameters, a ten-fold cross-validation with ten
sidred for inclusion. Collinearity was assessed prior to model develop iterations per fold was performed, with the smallest root mean square
ment with collinear terms of a corellation above 0.70 and a VIF of 10 or error (RMSE) used to determine the best tuning parameters. Internal
above excldued from model consideration. Reduction resulted in the validation was then performed with a 10-fold cross validation to reduce
inclusion of 16 predictor variables (Table 1). Due to the aforementioned optimism bias. The R package caret was used to performed elastic net
predictor selection process, a data driven sensitivity analysis predictor and cross validation.
reduction process was performed through principal component analysis.
All kinematic predictors were included in the first principal component 2.8. Machine learning models
analysis, while the kinetic predictors were included separately. The
second principal component analysis included both kinematic and ki Three ML models (Random Forest, Support Vector Machine Regres
netic predictors. The first principal component analysis resulted in three sion, and Gradient Boosting Machine) were developed to predict fastball
components, the second resulted in three components included in the pitch velocity. All ML models incorporated the same predictors used to
develop the linear regression model. An iterative grid search process was
Table 1 used to find the best hyperparameter tuning for each ML model. RMSE
Predictor variables. was used to determine optimal hyperparameter tuning (Appendix 1). All
ML models were internally validated through ten-fold cross-validation.
Kinematic Kinetic Predictors Spatiotemporal Predictors
Predictors The performance of each fold was averaged for overall prediction per
formance. The R packages randomForest, gbm, kernlab, and e1071 were
Elbow angle at foot Maximum rear leg Time difference of maximum
strike (◦ ) ground reaction pelvis rotation velocity and
used for ML model analyses.
force resultant maximum trunk rotation velocity
(s) 2.9. Sensitivity analyses
Elbow angle at Maximum lead leg stride length (%height)
maximum shoulder ground reaction
external rotation (◦ ) force resultant Sensitivity analyses included: 1) principal component analysis with
Maximum elbow kinematic predictors; 2) principal component analysis with both kine
extension velocity matic and kinetic predictors; 3) high school pitchers (all predictors
(◦ /s)
included); 4) the exclusion of maximum humeral rotation velocity and
Shoulder abduction at
foot strike (◦ ) maximum elbow extension velocity; 5) performing preprocessing of the
Maximum shoulder predictors through center, scaling, and removal of outliers. Sensitivity
external rotation (◦ ) analyses were performed on the linear regression model and the best
Shoulder abduction at performing ML model. Risk of overfitting and optimism bias were
release (◦ )
Maximum humeral
reduced through performing elastic net and ten-fold cross-validation for
rotation velocity all sensitivity analyses.
(◦ /s)
Hip-shoulder
separation (◦ )
2.10. Model performance
Trunk forward flexion
at release (◦ ) Performance of linear regression based and ML models were assessed
Lateral trunk tilt at by calculating the RMSE and calibration. Calibration measures the
release (◦ )
agreement between the observed and predicted outcomes. (Van Calster
Maximum pelvis
rotation velocity et al., 2019) Calibration was reported as the calibration slope with 95%
(◦ /s) confidence intervals (95% CI’s) and calibration visual plots.
Maximum trunk The reporting of this study followed the Transparent Reporting of a
rotation velocity multivariable prediction model for Individual Prognosis Or Diagnosis
(◦ /s)
(TRIPOD) reporting guideline (Collins et al., 2015).
3
K.F. Nicholson et al. Journal of Biomechanics 134 (2022) 110999
3. Results Table 3
Statistical and Machine Learning Model Performance.
227 pitchers who threw a total of 787 pitches were included in this Predictive Model Root Mean Square Calibration Slope
study (Table 2). Error (95% Confidence
Interval)
3.1. Linear regression model Generalized Regression 2.49 1.00 (0.85, 1.15)
Random Forest 1.49 1.49 (1.40, 1.57)
Final model RMSE was 2.49 m/s, calibration was 1.00 (95% CI: 0.85, Gradient Boosting Machine <0.001 1.00 (0.999, 1.001)
Support Vector Machine 0.34 1.08 (1.07, 1.09)
1.15), and R2 was 0.45 (Table 3; Fig. 2). Maximal humeral rotation Regression
velocity [0.001 (95% CI: 0.0006, 0.003), p = 0.019], Trunk forward
flexion at release [0.08 (95% CI: 0.03, 0.13), p = 0.003], lateral trunk tilt Fastball velocity root mean square error is reported as m/s
at release [-0.05 (95% CI: − 0.10, − 0.00), p = 0.033], maximum lead leg
GRF resultant [1.40 (95% CI: 0.18, 2.62), p = 0.025], maximal trunk 3.3. Sensitivity analyses
rotation velocity [0.01 (95% CI: 0.00, 0.01), p = 0.012], and maximal
elbow extension velocity [(839–1585 deg/s): 0.001 (95% CI: 0.0001, 3.3.1. Principal component analysis
0.01), p < 0.001; (≥1586 deg/s): − 0.001 (95% CI: − 0.01, − 0.0001), p The kinematics only and kinematic and kinetic regression models
= 0.024] were significant predictors in the fastball pitch velocity linear demonstrated the same RMSE (3.0), calibration [1.00 (95% CI: 0.75,
regression model. For full model, please refer to Appendix 2. 1.25)], and R2 (0.27) and both were decreased compared to the original
linear regression model. Both also reported the same gradient boosting
3.2. Machine learning models machine RMSE (<0.001) and calibration [1.00 (95% CI: 0.999, 1.001)],
which were also similar to the original gradient boosting model.
The gradient boosting machine demonstrated the smallest RMSE and
the most precise calibration compared to all other ML models (Table 3; 3.3.2. High school
Fig. 3). The random forest model demonstrated the largest RMSE and When only high school pitchers were included, the linear regression
calibration (random forest model plot in Appendix 3; support vector model RMSE (2.5), calibration [1.00 (95% CI: 0.84, 1.16)], and R2
machine regression plot in Appendix 4). The gradient boosting model (0.33) were similar to the original linear regression model. The gradient
reported highest influence to fastball pitch velocity for maximum elbow boosting machine RMSE (<0.001) and calibration [1.00 (95% CI: 0.999,
extension velocity (relative influence: 19.3%), maximum humeral 1.001)] were also similar to the original gradient boosting model. Ve
rotation velocity (9.6%), maximum lead leg GRF resultant (9.1%), trunk locity predictors with the greatest influence in the high school gradient
forward flexion at release (7.9%), time difference of maximum pelvis boosting machine model were maximum humeral rotation velocity
rotation velocity and maximum trunk rotation velocity (7.8%). (14.1%), hip shoulder separation (11.5%), trunk forward flexion at
release (9.4%), maximum trunk rotation velocity (9.4%), maximum
elbow extension velocity (9.1%), elbow angle at maximum shoulder
external rotation (7.0%).
BW = Body Weight, H = Height maximal humeral rotation velocity, elbow extension velocity, trunk
Results are reported as mean (standard deviation) forward flexion at release, time difference of maximum pelvis rotation
+
Denotes non-linear variables velocity and maximum trunk rotation velocity, and maximum lead leg
4
K.F. Nicholson et al. Journal of Biomechanics 134 (2022) 110999
Fig. 2. Regression Model Calibration Plot for Fastball Pitching Velocity. The blue line depicts perfect calibration, the red line reports actual calibration and the grey
band reports the 95% confidence interval. Points on the blue line would have a root mean square error of zero. (For interpretation of the references to colour in this
figure legend, the reader is referred to the web version of this article.)
GRF resultant. As hypothesized, the ML models demonstrated improved example, previous pitching models have observed that greater humeral
prediction RMSE compared to the linear regression model; however, rotation and elbow extension velocity are dependent on improved trunk
contrary to the hypothesis, only the gradient boosting machine model rotation velocity (Naito and Maruyama, 2008; Naito et al., 2017). While
demonstrated improved calibration compared to the linear regression these results suggest that humeral rotation velocity and elbow extension
model. The sensitivity analysis including only high school pitchers velocity demonstrate the greatest association to pitch velocity, the inter-
demonstrated similar RMSE and calibration results but lead leg GRF dependent relationship between upper extremity joint velocity, shoulder
resultant was not an influential variable and hip shoulder separation and and elbow angles, GRF resultant, and the entire kinetic chain may limit
maximal trunk rotation velocity were of increased influence. Exclusion the ability to individually modify these biomechanical variables. Further
of maximal humeral rotation and elbow extension velocity analysis research is required to understand the direct modifiability, or non-
resulted in greater influence of trunk and hip mechanics and arm posi modifiability, of shoulder rotation velocity and elbow extension veloc
tion to pitch velocity. ity without subsequent interventions on the lower extremity, trunk, and
other upper extremity mechanics.
4.2. Influence of humeral rotation and elbow extension
4.3. Influence of ground reaction force and kinematic timing
Maximal humeral rotation and elbow extension velocity demon
strated the greatest influence on pitch velocity within the primary Maximal lead leg GRF resultant, time difference of maximum pelvis
models. Similarly, previous kinematic analyses have observed that rotation velocity and maximum trunk rotation velocity, and trunk
shoulder internal rotation velocity (i.e., humeral rotation) was the location in the frontal plane at release were also identified as influential
greatest contributor to ball velocity within tennis, water polo, cricket, predictors of pitch velocity. As stated previously, force generation be
and baseball (Naito and Maruyama, 2008; Naito et al., 2017; Miyashita gins with the lower extremity, transfers through the trunk, and ulti
et al., 2010). Elbow extension velocity has also been positively associ mately to ball propulsion through the hand (Aguinaldo and Escamilla,
ated with throwing velocity in baseball pitchers (Naito and Maruyama, 2019). These results support previous literature which has postulated
2008; Naito et al., 2017). The influence of shoulder and elbow kine that the lower extremity provides a stable base for force transfer and
matics on pitch velocity is not surprising due to the summation of cen trunk and upper extremity rotation, to produce optimal pitch velocity
trifugal forces from the hips and trunk (Naito et al., 2017). Torque (Naito and Maruyama, 2008; Naito et al., 2017). Lead leg GRF resultant
generation and transfer through appropriate pelvic and trunk rotation has been positively associated with pitch velocity at the high school, and
allows for high velocity shoulder rotation and elbow extension (Naito collegiate baseball levels (Naito et al., 2017; Kageyama et al., 2014).
and Maruyama, 2008; Naito et al., 2017; Aguinaldo et al., 2007). For Forces generated from the lead leg rely on proper kinematic sequencing
5
K.F. Nicholson et al. Journal of Biomechanics 134 (2022) 110999
Fig. 3. Gradient Boosting Machine Calibration Plot for Fastball Pitching Velocity. The blue line depicts perfect calibration, while the red line reports actual cali
bration. Points on the blue line would have a root mean square error of zero. (For interpretation of the references to colour in this figure legend, the reader is referred
to the web version of this article.)
to transfer energy (Aguinaldo and Escamilla, 2019). Following foot boosting machines are an ensemble method, which aggregates individ
strike, the pelvis should rotate followed by trunk rotation. Improper ual tree learning into one comprehensive model (Friedman, 2001).
timing, or smaller separation between maximum pelvis rotation and Gradient boosting machines have demonstrated improved prediction
maximum trunk rotation will impact ball velocity. The linear regression performance in higher order interaction scenarios, compared to other
results suggest that every one ms increase in separation is associated ML models (Touzani et al., 2018). As pitching is a series of complex
with 0.01 m/s increase in ball velocity. Lateral trunk tilt and forward interrelated steps (Aguinaldo and Escamilla, 2019; Naito et al., 2017),
flexion at release have been inversely associated with pitch velocity in this ML approach may be able to better quantify these interdependent
high school pitchers (Oyama et al., 2013). Using the linear regression relationships. Conversely, the random forest model had the largest error,
results for interpretation we see that while lateral trunk tilt was aside from the linear regression model, and the worst calibration
inversely associated with pitch velocity, this study observed a con compared to all models. While random forest models are also an
trasting relationship between trunk forward flexion and pitch velocity aggregated ensemble method from a random bootstrapped sample, the
than that previously reported. (Bullock et al., 2021) These contrasting aggregation is performed at end stage, compared to a gradient boosting
results may be due to the inclusion of the entire kinetic chain within our machine forward propagation process (Friedman, 2001). These dis
study compared to previous isolated trunk and arm biomechanical an crepancies in aggregation in ensemble ML methods may potentially
alyses. Additional full kinetic chain studies are required to understand explain the differences observed in prediction performance.
the repeatability of these results.
4.5. Sensitivity analyses
4.4. Improve machine learning model performance
When models were created with only high school pitchers, GRF
The ML models produced decreased RMSE compared to the linear resultant was no longer an influential variable. Hip shoulder separation,
regression model. ML is designed to deliver increased flexibility to maximum trunk rotation velocity, and elbow angle at maximum
evaluate non-linear relationships and higher order interactions (Bzdok external rotation increased in influence, and elbow extension velocity
et al., 2018). While non-linear transformations were included within the was less influential. These results suggest that high school pitchers may
linear regression, it is difficult to evaluate the complex interaction of utilize more trunk and upper extremity mechanics to generate ball ve
multiple predictors, potentially explaining the discrepancy in prediction locity. However, this may represent a less efficient pitching technique. It
error (Bzdok et al., 2018). The gradient boosting ML model demon has been shown that older pitchers are able to generate forces in the
strated the smallest error and best calibration for predicting fastball distal extremities and more effectively transfer these forces up the ki
velocity compared to all ML and linear regression models. Gradient netic chain (Oyama et al., 2014; Aguinaldo and Chambers, 2009). High
6
K.F. Nicholson et al. Journal of Biomechanics 134 (2022) 110999
school and collegiate pitchers produce similar GRFs despite collegiate velocity and maximum trunk rotation velocity, and maximum lead leg
pitchers throwing with higher pitch velocity (Nicholson et al., 2019). GRF resultant were observed to have the greatest influence on baseball
Collegiate pitchers can optimally coordinate body segments and transfer pitch velocity. ML models demonstrated improved prediction perfor
energy up the kinetic chain utilizing the GRFs. High school pitchers may mance compared to linear regression. The influence of lead leg GRF and
lack the physical maturity or skill necessary for optimal energy transfer trunk and arm kinematics on pitch velocity demonstrates the interde
and thus rely on trunk and arm mechanics to produce velocity. It has pendent relationship of the entire kinetic chain during the pitching
been shown that in the high school pitcher, hip shoulder separation has motion. Coaches, players, and performance professionals should
increased influence on elbow stress (Nicholson et al., 2022). It is possible consider the entire body when designing pitch velocity improvement
that utilizing GRF resultant to produce pitch velocity, instead of hip programs. These models are recommended to be externally and trans
shoulder separation, represents a more efficient and safer pitching portation validated by other researchers to decipher model generaliz
technique. Separate models for each playing level may be warranted for ability and usefulness.
elucidating pitching mechanics with the most influence on velocity
among various skill levels.
Declaration of Competing Interest
When excluding the most influential predictors of maximal humeral
rotation velocity and elbow extension velocity, trunk and pelvis rotation
The authors declare that they have no known competing financial
velocity and lead leg maximum GRF resultant were observed to have
interests or personal relationships that could have appeared to influence
highest influence on pitch velocity. Pelvis and trunk rotation produce up
the work reported in this paper.
to 55% of total force generated for ball propulsion during the pitching
motion (Aguinaldo and Escamilla, 2019). Further, initial production and
Appendix A. Supplementary material
energy transfer is developed through the lower extremities (Naito and
Maruyama, 2008). These findings highlight the potential implications of
Supplementary data to this article can be found online at https://doi.
lead leg GRF resultant and pelvis and trunk biomechanical interventions
org/10.1016/j.jbiomech.2022.110999.
during pitch velocity training. Shoulder abduction at release and elbow
angle at maximum shoulder external rotation were also influential.
Elbow and shoulder alignment and timing may have an interdependent References
relationship to force transfer from the trunk, in order to produce high
Aguinaldo, A.L., Chambers, H., 2009. Correlation of throwing mechanics with elbow
arm rotation velocities (Naito and Maruyama, 2008). However, this is valgus load in adult baseball pitchers. Am. J. Sports Med. 37 (10), 2043–2048.
only speculative, with further inquiry needed to understand this Aguinaldo, A.L., Buttermore, J., Chambers, H., 2007. Effects of upper trunk rotation on
relationship. shoulder joint torque among baseball pitchers of various levels. J. Appl. Biomech. 23
(1), 42–51.
Aguinaldo, A., Escamilla, R., 2019. Segmental Power Analysis of Sequential Body Motion
4.6. Limitations and Elbow Valgus Loading During Baseball Pitching: Comparison Between
Professional and High School Baseball Players (in eng), 2325967119827924 Orthop.
J. Sports Med. 7 (2). https://doi.org/10.1177/2325967119827924.
As with all studies, there were limitations. External validation was Arthur, R. The New Science of Hitting. FiveThirtyEight. https://fivethirtyeight.com/fe
not performed, decreasing the generalizability of these findings. Further atures/the-new-science-of-hitting/.
validation is needed to assess the influence of specific predictors and Bullock, G.S., Uhan, J., Harriss, E.K., Arden, N.K., Filbay, S.R., 2020. The Relationship
Between Baseball Participation and Health: A Systematic Scoping Review. J. Orthop.
overall model performance in separate data. While many ML methods Sports Phys. Ther. 50 (2), 55–66.
suggest splitting data into training and testing sets, this decreases the Bullock, G.S., Menon, G., Nicholson, K., Butler, R.J., Arden, N.K., Filbay, S.R., 2021.
power and precision of these models (Collins et al., 2015; Collins et al., Baseball pitching biomechanics in relation to pain, injury, and surgery: A systematic
review. J. Sci. Med. Sport 24 (1), 13–20.
2014). Our a priori sample size calculation did not allow for a random or Bzdok, D., Altman, N., Krzywinski, M., 2018. Statistics versus machine learning. Nat.
temporal split, due to the collected sample size. In order to counteract Methods 15 (4), 233–234.
this potential limitation, internal validation through ten-fold cross- Collins, G.S., de Groot, J.A., Dutton, S., Omar, O., Shanyinde, M., Tajar, A., Voysey, M.,
Wharton, R., Yu, L.-M., Moons, K.G., Altman, D.G., 2014. External validation of
validation was performed (Collins et al., 2015). Further, our R2 was
multivariable prediction models: a systematic review of methodological conduct and
lower than the previous literature R2 of 0.89; as a result, our a priori reporting. BMC Med. Res. Method. 14 (1) https://doi.org/10.1186/1471-2288-14-
sample size calculation is greater than our actual model availability. 40.
Collins, G.S., Reitsma, J.B., Altman, D.G., Moons, K.G., 2015. Transparent reporting of a
Researchers are encouraged to utilize our R2 for future a priori sample
multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the
size calculations. High school and collegiate pitchers were included in TRIPOD statement. J. Brit. Surgery 102 (3), 148–158.
these analyses. As these different standards of play may have disparate Collins, G.S., Ogundimu, E.O., Altman, D.G., 2016. Sample size considerations for the
pitching skill levels, a sensitivity analysis including only high school external validation of a multivariable prognostic model: a resampling study. Stat.
Med. 35 (2), 214–226.
pitchers were performed. The similar RMES and calibration results Durrleman, S., Simon, R., 1989. Flexible regression models with cubic splines. Statist.
suggest that the inclusion of both competition level pitchers did not Med. 8 (5), 551–561.
inherently bias these results. However, different significant predictors Fleisig, G.S., Diffendaffer, A.Z., Ivey, B., Aune, K.T., 2018. Do baseball pitchers improve
mechanics after biomechanical evaluations? Sports Biomech. 17 (3), 314–321.
among models, suggests separate models for each playing level would be Friedman, J.H., 2001. Greedy function approximation: A gradient boosting machine.
ideal. Unfortunately, the sample size of college pitchers inhibited the Ann. Statist. 29 (5), 1189–1232. https://doi.org/10.1214/AOS/1013203451.
development of a college only model. While predictors from the entire Gray, R., 2010. Expert baseball batters have greater sensitivity in making swing
decisions. Res. Q. Exerc. Sport 81 (3), 373–378.
kinetic chain were included, due to power, not all pitching biome Howelldeshell, B., 2021. Inside the Numbers. http://media.hometeamsonline.com/photo
chanical variables could be included. Despite incorporating multi- s/baseball/UPSTATESTORM/Probability_Of_Playing_Professionally.pdf (accessed
collinearity testing, literature review, and stakeholder involvement in May 24, 2021).
Kageyama, M., Sugiyama, T., Takai, Y., Kanehisa, H., Maeda, A., Dec 2014. Kinematic
choosing the included predictors, other associations between biome
and Kinetic Profiles of Trunk and Lower Limbs during Baseball Pitching in Collegiate
chanical variables not captured within these models may have influ Pitchers, (in eng). J. Sports Sci. Med. 13 (4), 742–750.
enced pitch velocity, which decreases the generalizability of these Miyashita, K., Kobayashi, H., Koshida, S., Urabe, Y., Feb 2010. “Glenohumeral, scapular,
and thoracic angles at maximum shoulder external rotation in throwing,” (in eng).
models.
Am. J. Sports Med. 38 (2), 363–368. https://doi.org/10.1177/0363546509347542.
Moons, K.G.M., Royston, P., Vergouwe, Y., Grobbee, D.E., Altman, D.G., 2009. Prognosis
5. Conclusion and prognostic research: what, why, and how? BMJ 338, b375.
Naito, K., Maruyama, T., 6/19/2008 2008,. Contributions of the muscular torques and
motion-dependent torques to generate rapid elbow extension during overhand
Maximal humeral rotation velocity, elbow extension velocity, trunk baseball pitching. Sports Eng. 11 (1), 47–56. https://doi.org/10.1007/S12283-008-
forward flexion at release, time difference of maximum pelvis rotation 0002-3.
7
K.F. Nicholson et al. Journal of Biomechanics 134 (2022) 110999
Naito, K., Takagi, T., Kubota, H., Maruyama, T., 2017. Multi-body dynamic coupling Pitch Velocities Are Already Up, 2021. https://blogs.fangraphs.com/fastball-velocities-a
mechanism for generating throwing arm velocity during baseball pitching. Hum. re-already-up/ (accessed May 24, 2021).
Mov. Sci. 54, 363–376. Post, E.G., Laudner, K.G., McLoda, T.A., Wong, R., Meister, K., 2015. Correlation of
NCAA, 2021. NCAA Sports Sponsorship and Participation Rates Database. https://www. shoulder and elbow kinetics with ball velocity in collegiate baseball pitchers.
ncaa.org/about/resources/research/ncaa-sports-sponsorship-and-participation-rates J. Athletic Training 50 (6), 629–633.
-database (accessed May 24, 2021). Riley, R.D., et al., 2020. Calculating the sample size required for developing a clinical
Nicholson, K.F., Hulburt, T.C., Kimura, B.M., Aguinaldo, A., 2019. Relationship between prediction model. BMJ 368.
ground reaction force and throwing arm kinetics in high school and collegiate Sabick, M.B., Torry, M.R., Lawton, R.L., Hawkins, R.J., 2004. Valgus torque in youth
baseball pitchers. ISBS Proc. Archive 37 (1), 316. baseball pitchers: a biomechanical study. J. Shoulder Elbow Surg. 13 (3), 349–355.
Nicholson, K.F., Collins, G.S., Waterman, B.R., Bullock, G.S., 2022. Machine Learning Seroyer, S.T., Nho, S.J., Bach, B.R., Bush-Joseph, C.A., Nicholson, G.P., Romeo, A.A.,
and Statistical Prediction of Pitching Arm Kinetics. Am. J. Sports Med. 50 (1), 2010. The kinetic chain in overhand pitching: its potential role for performance
238–247. enhancement and injury prevention. Sports Health 2 (2), 135–146.
Ogundimu, E.O., Altman, D.G., Collins, G.S., 2016. Adequate sample size for developing Sgroi, T., et al., 2015. Predictors of throwing velocity in youth and adolescent pitchers.
prediction models is not simply related to events per variable. J. Clin. Epidemiol. 76, J. Shoulder Elbow Surg. 24 (9), 1339–1345.
175–182. Stodden, D.F., Fleisig, G.S., McLean, S.P., Andrews, J.R., 2005. Relationship of
Oyama, S., Yu, B., Blackburn, J.T., Padua, D.A., Li, L., Myers, J.B., 2013. Effect of biomechanical factors to baseball pitching velocity: within pitcher variation. J. Appl.
excessive contralateral trunk tilt on pitching biomechanics and performance in high Biomech. 21 (1), 44–56.
school baseball pitchers. Am. J. Sports Med. 41 (10), 2430–2438. Touzani, S., Granderson, J., Fernandes, S., 2018. Gradient boosting machine for
Oyama, S., Yu, B., Blackburn, J.T., Padua, D.A., Li, L., Myers, J.B., 2014. Improper trunk modeling the energy consumption of commercial buildings. Energy Build. 158,
rotation sequence is associated with increased maximal shoulder external rotation 1533–1543.
angle and shoulder joint force in high school baseball pitchers. Am. J. Sports Med. 42 Van Calster, B., McLernon, D.J., Van Smeden, M., Wynants, L., Steyerberg, E.W., 2019.
(9), 2089–2094. Calibration: the Achilles heel of predictive analytics. BMC Med. 17 (1), 1–7.
Participation Statistics, 2021. https://members.nfhs.org/participation_statistics Wu, G.e., van der Helm, F.C.T., (DirkJan) Veeger, H.E.J., Makhsous, M., Van Roy, P.,
(accessed May 24, 2021). Anglin, C., Nagels, J., Karduna, A.R., McQuade, K., Wang, X., Werner, F.W.,
Philp, F., Al-shallawi, A., Kyriacou, T., Blana, D., Pandyan, A., 2020. Improving predictor Buchholz, B., 2005. ISB recommendation on definitions of joint coordinate systems
selection for injury modelling methods in male footballers. BMJ Open Sport Exercise of various joints for the reporting of human joint motion—Part II: shoulder, elbow,
Med. 6 (1), e000634. https://doi.org/10.1136/bmjsem-2019-000634. wrist and hand. J. Biomech. 38 (5), 981–992.