Skip to content

This is my final project in the Big Data Analytics Course (CIE 427) at UST - Zewail City.

Notifications You must be signed in to change notification settings

dina-adel/Analysis-of-HMDA-dataset-using-Apache-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Analysis of the Home Mortgage Disclosure Act (HMDA) dataset using Apache Spark

Home Mortgage Disclosure Act (HMDA) is a US federal law that enforced banks to collect and share their mortgages’ data to the public in order to ensure they are not discriminating or misusing their money. The dataset can be found here. In this project, using PySpark, I worked on:

  • Analyzing the HMDA public dataset (collected in 2014) to investigate the fairness of the mortgages approval criteria.
  • Building different supervised machine learning models, using PySpark, such as decision trees, naive bayes, and logistic regression to predict whether a loan request will be accepted or not.
  • Building a very simple recommendation system. This system will check the applicant’s provided information and assess it. It will then either recommend some modifications to his application or tell him/her to proceed.

About

This is my final project in the Big Data Analytics Course (CIE 427) at UST - Zewail City.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published