Created by Marc-Antoine Nadeau, Jessie Kurtz, and Baicheng Peng
We investigate and compare the performance of the following models in emotion classification for the GoEmotion dataset:
- Naive Bayes
- RF
- SR
- XGBoost
The GoEmotions dataset, developed by Google Research, contains over 58,000 English Reddit comments, each labeled with one or more of 27 emotion categories or marked as neutral. This project focuses on classifying these emotions and evaluating performance using several algorithms and configurations.
- Navigate to the
src/BERT
folder. - Run the
PrepareDataset.py
file.
- Navigate to the
src/Distilled-GPT2
folder. - Run the
PrepareDataset.py
file.
- Navigate to the
src/Naive-Bayes
folder. - Run the
PrepareDatasetNB.py
file.
- Navigate to the
src/Baseline_model
folder. - Run the
RandomF.py
file.
- Navigate to the
src/GPT2
folder. - Run the
PrepareDataset.py
file.
- Navigate to the
src/Word2Vec
folder. - Run the
word2vec.py
file.
Any resulting figures, graphs, and CSV will be saved in their respective folders:
Results-BERT
: Contains the results for the BERT LLM.Results-Distilled-GPT2
: Contains the results for the Distilled-GPT2 LLM.Results-GPT2
: Contains the results for the Distilled-GPT2 LLM.Results-Naive-Bayes
: Contains the results for the Naive-Bayes.Results-Word2Vec
: Contains the results for the Word2Vec.Results-RandomF
: Contains the results for Baseline Model (RF)
Make sure you have the following Python libraries installed:
numpy
matplotlib
torch
seaborn
pandas
transformers
scikit-learn
You can install them using pip:
pip install numpy matplotlib torch seaborn pandas transformers scikit-learn