Strategic swinging model: building a model that optimizes swinging decisions in baseball

Authors

  • Henry Zhan Middlebury College
  • Alex Lyford Middlebury College

DOI:

https://doi.org/10.47611/jsr.v14i1.2856

Keywords:

baseball, statistics, machine learning, data analysis, sabermetrics

Abstract

The most important battle in baseball is the battle between the pitcher and the batter, as hitting the baseball hard and far will drastically change the outcome of a game. In this research, we are attempting to build the strategic swinging model that can help Major League Baseball (MLB) hitters decide whether they should swing at a pitch before it has been thrown. We used random forest classifiers to output a probabilistic prediction of pitch type and pitch location, and estimated how well the hitter would like the pitch based on his past batting data. We evaluated the model by calculating how much better it performed compared to the scenario in which the batter did the opposite. The model outperformed any random swinging strategy. However, the decisions of most above average hitters are better than the model's decisions.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References or Bibliography

Berg, T. (2015, May 6). MLB’s Statcast is already changing the way we watch baseball. USA Today. https://ftw.usatoday.com/2015/05/mlb-statcast-stats-data-launch-angle-route-efficiency

Bock, J. (2015). Pitch sequence complexity and long-term pitcher performance. Sports,3, 40-55.

Cascio, J. (2020, October 23). The science behind hitting a 95 mile-per-hour fastball, explained by experts. FOX 13 News. https://www.fox13news.com/news/fast-pitching-world-series#

Dimeo, N. (2007, August 15). PITCHf/x: The new technology that will change baseball analysis forever. Slate. https://slate.com/culture/2007/08/pitch-f-x-the-new-technology-that-will-change-baseball-analysis-forever.html

FanGraphs. (2012, March 16). Wins above replacement (WAR). FanGraphs. https://library.fangraphs.com/war/

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). Springer.

Ho, T. K. (1995). Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition (pp. 278–282).

Hoang, P. (2015). Supervised learning in baseball pitch prediction and hepatitis C diagnosis(Doctoral dissertation). North Carolina State University

Ishii, B. (2021). Using pitch tipping for baseball pitch prediction (Doctoral dissertation). California Polytechnic State University

Lee, J. S. (2022). Prediction of pitch type and location in baseball using ensemble model of deep neural networks. Journal of Sports Analytics, 8(2), 115–126. https://doi.org/10.3233/JSA-200559

Lewis, M. M. (2003). Moneyball: The art of winning an unfair game. W. W. Norton.

Major League Baseball. (n.d.). On-base plus slugging plus (OPS+). Major League Baseball. https://www.mlb.com/glossary/advanced-stats/on-base-plus-slugging-plus

Pritchard, J. K., Stephens, M., & Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155(2), 945–959. https://doi.org/10.1093/genetics/155.2.945

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF00116251

Ramos, D. (2017). Creating a hitter’s approach: analyzing at-bat data [Master’s thesis, Texas State University].

Tran, H., & Sidle, G. (2018). Using multi-class classification methods to predict baseball pitch types. Journal of Sports Analytics, 4(2), 85-93.

Yoshihara, K., & Takahashi, K. (2020). Pitch sequences in baseball: Analysis using a probabilistic topic model.

Published

02-28-2025

How to Cite

Zhan, H., & Lyford, A. (2025). Strategic swinging model: building a model that optimizes swinging decisions in baseball. Journal of Student Research, 14(1). https://doi.org/10.47611/jsr.v14i1.2856

Issue

Section

Research Articles