This repository contains the exercises that are a part of the blog series published on here. The publication covers the basics of how and when to perform data preprocessing. This is an essential step in any machine learning project is when you get your data ready for modeling. Between importing and cleaning your data and fitting your machine learning model is when preprocessing comes into play. I explored how to standardize your data so that it's in the right form for your model, create new features to best leverage the information in your dataset, and select the best features to improve your model fit. Finally, I practised preprocessing by getting a dataset on UFO sightings ready for modeling.
- Introduction to Data Preprocessing
- Standardizing Data
- What is Feature Engineering?
- What is Feature Selection?
- Case Study
Data Preprocessing comes right in after you have cleaned up your data and done some Exploratory Data Analysis. It is the step where we prepare the data for modeling. Modeling in Python needs numerical input. Checkout more about it here
It is a preprocessing method used to transform continuous data to make it look normally distributed. Checkout more about it here
Feature Engineering is the process of the creation of new features based on existing features. Checkout more about it here
Feature Selection is the method of selecting features from the existing set to be used for modeling. It doesn't create new features. Checkout more about it here
In the final blog of this series, we will walk through the entire preprocessing workflow on the dataset related to UFO sightings. Each row in this dataset contains information like the location, the type of the sighting, the number of seconds and minutes the sighting lasted, a description of the sighting, and the date of the sighting was recorded. Checkout more about it here