Skip to content

Miamoto/Conformer-NTM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Conformer-NTM

Note: This repository is actively maintained and being updated. Please note that the code and documentation are subject to changes and improvements. We recommend regularly checking for updates and referring to the latest version of the repository for the most up-to-date information.

**Update information (9/11/2023): In the paper I mention that I use all steps of the NTM adressing mechanism, but for the paper and results shown I just used w_c, i.e., the content adressing mechanism, for both reading and writing operations.

This repo contains the code and model for our paper:

Carlos Carvalho, Alberto Abad, “Memory-augmented conformer for improved end-to-end long-form ASR” in Proc. INTERSPEECH, 2023.

Overview

In this work, we propose a new architecture, Conformer-NTM, that combines a MANN (based on the NTM) with a conformer for E2E ASR. We demonstrated that including the external memory is relevant to enhance the performance of the E2E ASR system for long utterances. Also, we observed that the Conformer-NTM starts to be more effective when the distribution length of the test data gets further away from the distribution length of the training data.

The full architecture is illustrated in the figure below:

Screenshot 2023-05-31 at 18 23 38

The main results are summarized in this table:

Screenshot 2023-05-31 at 18 25 02

Requirements

Usage

Note: The espnet2 directory provided in this repository includes custom modifications for training the Conformer-NTM. It contains additional memory-related functionalities and configurations specific to this work. Please ensure that you use the espnet2 directory from this repository when setting up your environment for reproducing or extending our experiments. For any further questions, feel free to ask!

Pretrained models

You can download the pretrained models from the Hugging Face Model Hub:

Make sure to follow the instructions provided in the repository to use the pretrained model in your own ASR tasks.

Citation

Please cite our paper if you use Conformer-NTM.

@inproceedings{conformer-ntm,
    title={{Memory-augmented conformer for improved end-to-end long-form ASR}},
    author={Carlos Carvalho and Alberto Abad},
    booktitle={Proceedings of the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH)},
    year={2023},
}

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published