Audio-Classification-in-Multi-stage-Semi-Supervised-Learning-Way

Audio Classification with Noisy Dataset with multi stage semi-supervised learning

Problem Definition:

Given a wav file (in variable length), predict its corresponding label(s), each wav could be in multiple classes

Data Set:

Original Dataset can be found in Kaggle: https://www.kaggle.com/c/freesound-audio-tagging-2019/data.

To save time in data preprocessing, we also use the processed dataset (converting raw wav data to numpy matrix with Logmel transformation) https://www.kaggle.com/daisukelab/fat2019_prep_mels1

The dataset consists of both curated data (with accurate labels), and noisy data (with labels, but not sure whether accurate or not). Noisy data size is much larger than curated data size.

Method

Model

In our code, we have implemented multiple models(CNN, CNN+LSTM, ResNet), for simplicity, experiments are done based on CNN model by default.

Since CNN type model only allow fixed length input, while the data input length in our dataset is variable, we need to cut the long input audio into segments with fixed length (padding if necessary), and use the average of each segment's prediction as final prediction of the original audio data.

Multi Stage Training

Stage 0. WARM UP

Train the model on roughly selected noisy data (i.e. mels_trn_noisy_best50s.pkl in https://www.kaggle.com/daisukelab/fat2019_prep_mels1), details for how to roughly select from noisy data can be found in https://www.kaggle.com/daisukelab/creating-fat2019-preprocessed-data

Stage 1. Fine Tune

Started from Model 0 which is trained in Stage 0, we train the model again on the curated dataset.

Stage 1.5. Filter Noisy Data

Using Model 1 which is trained in Stage 1, we can filter out parts of noisy data which we are confident that its corresponding labels are correct. At the end of this operation, we will get: 1. labeled data (consists of curated data and noisy data we are confident on its labels) and 2. unlabeled data (noisy data that we are inconfident on its labels)

Stage 2. Semi-Supervised Learning

Both labeled data {x_l, y_l} and unlabeled data {x_u} will be used in this stage. Before the input data feeding into classifier, a stochastic data augmentation is required. Here we use SpecAugment as the augmentation
Loss function consists of two parts:
1. For {x_l, y_l} BCELoss will be calculated
2. For both {x_l} and {x_u} will do stochastic augmentation by 2 times: Take x_l for example , where f_θ refers to the classifier and g refers to data augmentation function. Then the squared difference loss will be calculated on the model outputs: . The main idea of this loss is to regularize the network such that it generates about the same outputs for the same data input that undergoes data augmentation.

Experiment Results

Since only curated data (i.e. mels_train_curated.pkl in FAT2019 dataset) have correct labels, the evaluation is done based on this data. We split the mels_train_curated.pkl into three parts: curated training data, curated validation data and curated testing data in 8:1:1.

Evaluation Metrics we use is label-weighted label-ranking average precision

And here is the results for each stage:

Stage	Validation	Testing
Stage 0	0.285	0.282
Stage 1	0.828	0.791
Stage 2	0.836	0.816

Run The Code

make sure you place correct data path in config.ini, all the data we use can be found in https://www.kaggle.com/c/freesound-audio-tagging-2019/data and https://www.kaggle.com/daisukelab/fat2019_prep_mels1
one-click-run: python3 runme.py. You can find out the order of running all the codes in this script.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
.DS_Store		.DS_Store
.ftpconfig		.ftpconfig
README.md		README.md
config.ini		config.ini
fat2019_dataset.py		fat2019_dataset.py
filterout_noisysamples.py		filterout_noisysamples.py
models.py		models.py
onehotencoder.py		onehotencoder.py
run_fat2019.py		run_fat2019.py
run_semisupervised.py		run_semisupervised.py
run_stage0.py		run_stage0.py
run_stage1.py		run_stage1.py
runme.py		runme.py
semiSurpervisedDataset.py		semiSurpervisedDataset.py
split_curated_tr.py		split_curated_tr.py
test.py		test.py
utils.py		utils.py
voicedataset.py		voicedataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio-Classification-in-Multi-stage-Semi-Supervised-Learning-Way

Problem Definition:

Data Set: