-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsabr-abstract.Rmd
10 lines (7 loc) · 2.75 KB
/
sabr-abstract.Rmd
1
2
3
4
5
6
7
8
9
10
---
title: "Synthetic Estimated Averages Matchup (SEAM) methodology for context-rich player matchup evaluations"
output: html_document
---
We develop the SEAM (synthetic estimated average matchup) method for describing batter versus pitcher matchups in baseball, both numerically and visually. We first estimate the distribution of balls put into play by a batter facing a pitcher, called the empirical spray chart distribution. Many individual matchups have a sample size that is too small to be reliable for use in predicting future outcomes. Synthetic versions of the batter and pitcher under consideration are constructed in order to alleviate these concerns. Weights governing how much influence these synthetic players have on the overall spray chart distribution are constructed to minimize expected mean square error. We provide a Shiny app that allows users to visualize and evaluate any batter-pitcher matchup that has occurred or could have occurred in the last five years. This provides a tool that could be used to determine defensive alignments, lineup construction, or pitcher selection through estimation of spray densities based on any input matchup. The computational speed with which the method calculates the spray densities allows the app to display the visualizations for any input almost instantly. Therefore, SEAM offers distributional interpretations of dependent matchup data which is computationally fast.
Baseball outcomes have been assumed to be independent and identically distributed (iid) realizations in literature such as Hierarchical Bayesian modeling of hitting performance in baseball by Jensen et al. (2009). The iid assumption of outcomes may be reasonable in the prediction contexts that involve long time frames such as those of Jensen et al. and In-season prediction of batting averages: A field test of empirical bayes and bayes methodologies by LD Brown (2008). However, this assumption is not appropriate for small time frames when the variability in quality of batter and pitcher characteristics can be very large.
The sparsity of batter-pitcher matchup data to create spray chart distributions is addressed through the development and aggregation of synthetic batters and pitchers with similar characteristics as the batter and pitcher under study. Our synthetic player creation methodology is inspired by the notion of similarity scores as discussed in The politics of glory: how baseball’s Hall of Fame really works by Bill James (1994) and Introducing PECOTA by Nate Silver (2003). However, unlike the similarity scores presented in James (1994) and Silver (2003), we construct similarity scores using a nearest neighbor approach that is based on the underlying batter and pitcher characteristics of the players under study instead of observed statistics.