This project involves analyzing cricket player performance using various datasets related to batting, bowling, match results, and player statistics. The goal is to perform data preprocessing, feature engineering, and model training to predict and evaluate player performance scores.
The project is structured as follows:
Cricket-Player-Performance/
│
├── Data/
│ ├── Batsman_Data.csv
│ ├── Bowler_data.csv
│ ├── Ground_Averages.csv
│ ├── ODI_Match_Results.csv
│ ├── ODI_Match_Totals.csv
│ └── WC_players.csv
│
├── README.md
├── requirements.txt
└── Player_Performance_Prediction.py
- Data/: Directory containing CSV files for different datasets used in the analysis.
- README.md: This file, containing project overview, setup instructions, and usage details.
- requirements.txt: File listing dependencies required to run the project.
- Player_Performance_Prediction.py: Python script with code for data analysis, preprocessing, modeling, and evaluation.
Ensure you have Python 3.x installed along with necessary libraries listed in requirements.txt
.
-
Clone the repository:
git clone https://github.com/Ganesh2409/Cricket-Player-Performance.git cd Cricket-Player-Performance
-
Install dependencies:
pip install -r requirements.txt
To run the performance analysis script:
python Player_Performance_Prediction.py
- The script starts by loading datasets (
Batsman_Data.csv
,Bowler_data.csv
, etc.) using pandas for initial exploration.
- Handling Null Values: Checks for null values in each dataset and performs necessary operations.
- Date Separation: Splits the 'Start Date' into day, month, and year columns across datasets.
- Dropping Irrelevant Columns: Removes unnecessary columns ('Unnamed: 0', 'Start Date', 'Year') from datasets.
- Encoding Categorical Columns: Converts categorical data into numeric form using Label Encoding.
- Iterative Imputation: Uses iterative imputer to handle missing values in
match_results_df
andmatch_total_df
.
- Inner Joins: Merges datasets (
batsman_df
,bowler_df
,ground_avg_df
,match_results_df
,match_total_df
,players_df
) on common columns to create a master dataset (master_df_after_join
).
- Statistical Analysis: Calculates batting average, bowling average, strike rate, economy rate, and more.
- Data Normalization: Applies Min-Max Scaling to normalize selected performance metrics.
- Outlier Detection: Identifies and removes outliers using Z-score method.
- Linear Regression Model: Splits data into training and testing sets, scales features, trains a Linear Regression model, and evaluates its performance using metrics like MAE, MSE, RMSE, and R-squared.
- Mean Absolute Error: 0.026260716919972147
- Mean Squared Error: 0.0010815997971432808
- Root Mean Squared Error: 0.032887684581667964
- R-squared: 0.9402293334532789
This project provides insights into cricket player performance using data analysis and machine learning techniques. It aims to help cricket analysts and enthusiasts understand the factors influencing player performance scores.