This repository contains a data science project focused on predicting the quality of wine based on various chemical properties. Using machine learning, we're exploring the relationship between the characteristics of wine and its perceived quality. ππ·
The primary goal of this project is to build a predictive model that can accurately estimate wine quality based on its chemical composition. By analyzing the data, we aim to:
- Understand the factors that influence wine quality.
- Develop robust machine learning models.
- Gain experience with a typical data science workflow.
- Learn more about MLOps.
We used a range of tools to make this project successful, including:
- Python π - Core language for analysis and modeling.
- Pandas π - Data manipulation.
- NumPy π’ - Numerical computing.
- MLflow π» - Experiment tracking & model management.
- Dagshub π - Versioning and collaboration for MLOps.
- Flask π - For building web APIs.
The dataset used in this project is the Wine Quality dataset. It contains important chemical features such as:
- Fixed Acidity
- Volatile Acidity
- Citric Acid
- Residual Sugar
- Chlorides
- Free Sulfur Dioxide
- Total Sulfur Dioxide
- pH
- Sulphates
- Alcohol
- Quality (target variable)
You can find the dataset here.
This project follows a well-defined workflow for building and deploying the model:
- Data Loading and Exploration π§: Load data using Pandas, explore data types, missing values, and summary statistics.
- Model Selection & Training π€: Select ML models (Logistic Regression, ElasticNet, etc.) and train using the data.
- Model Evaluation π : Evaluate the models using MSE, RMSE, and RΒ² Score for performance analysis.
- MLOps π: Implement practices such as Experiment Tracking with MLFlow and Version Control with Dagshub.
-
Clone the repository:
git clone https://github.com/ArpitKadam/data-science-project-on-Wine-Quality.git cd data-science-project-on-Wine-Quality
-
Create a virtual environment (recommended) π»:
python -m venv venv source venv/bin/activate # For Windows use `venv\Scripts\activate`
-
Install the dependencies π¦:
pip install -r requirements.txt
-
Don't forget to update your Dagshub credentials π:
Change the credentials in the following file:
data-science-project-on-Wine-Quality βββ src βββ datascienceproject βββ components βββ model_evaluation.py
os.environ["MLFLOW_TRACKING_URI"] = '[YOUR_MLFLOW_TRACKING_URI]' os.environ["MLFLOW_TRACKING_USERNAME"] = '[YOUR_MLFLOW_TRACKING_USERNAME]' os.environ["MLFLOW_TRACKING_PASSWORD"] = '[MLFLOW_TRACKING_PASSWORD]'
-
Run the scripts:
python main.py
-
Run the Flask App π:
python app.py
Check out the live demo here.
Contributions to this project are welcome! If you have any ideas for improvements, bug fixes, or new features, please feel free to:
- Fork the repository.
- Create a branch for your changes.
- Submit a pull request.
This project is licensed under the MIT License.
Have questions or feedback? Reach out to me at [email protected] or visit my GitHub.