This repo reunites code for the project "Using Self-Supervised Learning to classify aerial scenes audiovisuals with remote sensing data". We use the vision transformers paradigm to generate embeddings that approximate audio and image from SoundingEarth dataset. Then, we check its ability to classify audiovisual scenes from the ADVANCE data. Most of the code was based from this other repo.
- First, you can run this locally and on colab notebooks (check the notebooks folder).
- Download the datasets from SoundingEarth and ADVANCE (for images use this link and spectrograms this link).
- Create an account on wandb. We're using this platform to log our experiments.
- If you run this locally:
- Look at the config.py file and adjust the DataRoot, also, check the dataloaders.py to ensure that the patchs to data folders are correct.
- Configure wandb into your local setup.
- Use the
environment_v4.yml
to configure your local virtual environment to download the necessary libraries. - To run the train script use:
python train.py
and to run the classifier use:python advance.py
- If you run this on colab:
- Load the datasets into your google drive or other place that you can access on colab.
- The train notebooks is
01_embeddings_soundingearth_with_vit_base.ipynb
and the advance classifier is02_advance_classifier.ipynb
and the EDA is on03_evaluate_results_for_vit_models_in_advance.ipynb
.
- In lib is where you find the models (
lib/models
) and loss_functions (lib/loss_functions.py
).
You can download the models weights from the Releases tab.