This repository contains the datasets for classification of stress from text-based social media articles from Reddit and Twitter, which were created within the paper titled "Stress Detection from Social Media Articles: New Dataset Benchmark and Analytical Study".
Status - Accepted for Oral Presentation at IEEE WCCI 2022, IJCNN track.
We construct four high quality datasets using the text articles from Reddit and Twitter. Against each of the articles is a class label with a value of '0' or '1', where '0' specifies a Stress Negative article and '1' specifies a Stress Positive article. Annotation was done using an automated DNN-based strategy highlighted in the aforementioned study.
The description about each of the datasets is given as under:
- Reddit Title: Consists of titles from the articles collected from both stress and non-stress related subreddits from Reddit.
- Reddit Combi: Consists of title and body text combined together to form a single text sequence, collected from both stress and non-stress related subreddits from Reddit.
- Twitter Full: Consists of stress and non-stress related tweets, collected from Twitter.
- Twitter Non-Advert: Consists of the denoised version of the Twitter Full dataset.
The details about the dataset may be directly referred to from the study.
@INPROCEEDINGS{rastogi2022stress,
author={Aryan Rastogi, Liu Qian and Erik Cambria},
booktitle={2022 IEEE World Conference of Computational Intelligence (WCCI).},
title={Stress Detection from Social Media Articles: New Dataset Benchmark and Analytical Study},
year={2022},
volume={},
number={},
pages={},
doi={},
ISSN={},
month={}
}