This is a collaborative project done for the 2024 Europe Datathon hosted by Citadel and Correlation One.
In this Datathon, we are given several datasets and are asked to clean and analyze them, whilst also coming up with a question that the team aims to answer using said data. We are given a week to submit a report which goes into detail about all the work we have done, from exploring the data to making the model as well as everything in between.
For this project, we have written a report which can be found in the Team_8_report.pdf
file. This PDF is compiled from the Team_8_report.typ
file from the Typst app.
To get started, create a virtual environment in Python (we worked on this project using Python 3.10.12), and install all the dependencies in requirements.txt.
To do this, simply activate your virtual environment and run pip install -r requirements.txt
.
To run the project type make all
into the terminal. This will initiate all the steps to clean and process the data. Note that there are lots of Jupyter Notebooks that do not produce any output, but contain key work for our project
Our project consists of a few main folders, which contain our work.
- Cleaning: Has codes for processing the given datasets and external datasets
- Dataset: Contains the given and external datasets
- EDA: Contains most of the work we have done, excluding data cleaning. This contains all the analysis we have done, as well as the models for this project
- Udataset: Contains all the processed datasets from notebooks in cleaning or EDA
All of our group member details can be found in the Team_8_report.pdf
. However, the information is also available below:
-
Charles Dubois-Veltman (Durham University): University email | Personal email | Github | LinkedIn
-
Jeremy Mariani (Durham University): University email | Personal email | Github | LinkedIn
-
Nam Le (Durham University): University email | Personal email | Github | LinkedIn
-
Michal Pluta (Durham University): University email | Personal email | Github | LinkedIn
All of our files are available on GitHub. Simply pull this repository to get all the files.