Skip to content

Latest commit

 

History

History
129 lines (101 loc) · 5.66 KB

File metadata and controls

129 lines (101 loc) · 5.66 KB

Heart-Disease-Prediction-Using-PCA 🫀

Description:

Utilizing Principal Component Analysis (PCA) for insightful feature reduction and predictive modeling, this GitHub repository offers a comprehensive approach to forecasting heart disease risks. Explore detailed data analysis, PCA implementation, and machine learning algorithms to predict and understand factors contributing to heart health.

Medical Phenomenon 🩺

Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year. CVDs are a group of disorders of the heart and blood vessels and include coronary heart disease, cerebrovascular disease, rheumatic heart disease and other conditions. Four out of 5CVD deaths are due to heart attacks and strokes, and one third of these deaths occur prematurely in people under 70 years of age. Most cardiovascular diseases can be prevented by addressing behavioural risk factors such as tobacco use, unhealthy diet and obesity, physical inactivity and harmful use of alcohol using population-wide strategies.

Individuals at risk of CVD may demonstrate raised blood pressure, glucose, and lipids as well as overweight and obesity. These can all be easily measured in primary care facilities. Identifying those at highest risk of CVDs and ensuring they receive appropriate treatment can prevent premature deaths. Access to essential noncommunicable disease medicines and basic health technologies in all primary health care facilities is essential to ensure that those in need receive treatment and counselling.

Dataset

The dataset contains medical records of 304 patients who had heart failure, collected during their follow-up period, where each patient profile has 12 clinical features. https://github.com/PraveenHurakadli/Heart-Disease-Prediction-Using-PCA/blob/main/heart.csv

Attributes Information:

Attribute Description
Age Age of a patient [years]
Sex Gender of the patient [M: Male, F: Female]
ChestPain Chest pain type [TA: Typical Angina, ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic]
RestingBP Blood pressure in Hg (Normal blood pressure - 120/80 Hg)
Cholesterol Serum cholestrol level in blood (Normal cholesterol level below for adults 200mg/dL)
FastingBS Fasting Blood Sugar (Normal less than 100mg/dL for non diabetes for diabetes 100-125mg/dL)
RestingECG Resting electrocardiogram results [Normal: Normal, ST: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), LVH: showing probable or definite left ventricular hypertrophy by Estes' criteria]
MaxHR Maximum heart rate achieved [Numeric value between 60 and 202]
ExerciseAngina Exercise-induced angina [Y: Yes, N: No]
Oldpeak oldpeak = ST [Numeric value measured in depression]
ST_Slope The slope of the peak exercise ST segment [Up: upsloping, Flat: flat, Down: downsloping]
HeartDisease output class [1: heart disease, 0: Normal]

Procedural Overview

Here is the overview of the procedural steps to perform Principal Component Analysis (PCA) using machine learning algorithms for a heart disease prediction project.

Steps:

1. Data Collection and Preprocessing:

Gather a dataset containing relevant features related to heart health (e.g., age, blood pressure, cholesterol levels, etc.). Handle missing values, encode categorical variables, and normalize/standardize the data.

2. Exploratory Data Analysis (EDA):

Perform descriptive statistics, visualizations, and correlation analysis to understand the dataset. Assess feature importance and relationships to gain insights.

3. Feature Selection and PCA:

Identify features relevant for predicting heart disease. Apply PCA to reduce the dimensionality of the dataset while retaining important information. Determine the number of principal components to keep (using variance explained or scree plot).

4. Split Data for Training and Testing:

Divide the dataset into training and testing sets (e.g., 70-30 or 80-20 split).

5. Model Selection and Training:

Choose appropriate machine learning algorithms (e.g., Logistic Regression, Random Forest, SVM) for classification. Fit the model on the training data.

6. Model Evaluation:

Evaluate the model's performance using the testing data (accuracy, precision, recall, F1-score, ROC curve, etc.). Use cross-validation to assess model robustness.

7. Tuning and Optimization:

Fine-tune hyperparameters of the models to improve performance (e.g., GridSearchCV or RandomizedSearchCV).

8. Prediction and Interpretation:

Make predictions on new/unseen data using the trained model. Interpret the results and assess the factors contributing to heart disease prediction.

Performance Results

KNN model gives an accuracy of : 87%

Random forest gives an accuracy of : 86%

Suport Vector Classifier gives an accuracy of : 86%

Gradient Boosting Classifier gives an accuracy of: 82%