PairwiseTurnGPT

Paper accepted at SemDial 2024.

This repository contains the code for PairwiseTurnGPT, an End-of-turn detection model trained on the Switchboard corpus PairwiseTurnGPT is based on the work of TurnGPT but modifies the model and spoken dialogue processing to effectively model speaker interactions in overlapping utterances.

This allows us to utilize phenomena such as backchannels, interruptions and overlaps in predicting end-of-turns in spoken dialogues without the need for acoustic features.

The model itself is based on the GPT-2 architecture but uses a dual-transformer approach to model each speaker's utterances separately.

For easy comparison, the original TurnGPT model has been cloned in this repository within the gptonly folder. The pairwisegpt folder contains the code for PairwiseTurnGPT.

Note: The repository that generated the results for the paper is available on branch old. The master branch contains refactored code for easier readability and maintainability.

Setup

Ran and tested on a ubuntu-based linux machine with conda.

For a GPU envrionment run the following (downgrade CUDA if necessary):

conda create -n ENV_NAME python=3.12
conda activate ENV_NAME
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
conda install --file package_list.txt
pip install -r requirements.txt

For a CPU environment run the following:

conda create -n ENV_NAME python=3.12
conda activate ENV_NAME
conda install pytorch torchvision torchaudio cpuonly -c pytorch
conda install --file package_list.txt
pip install -r requirements.txt

cd data && pip install -e .
cd .. && pip install -e .

File Structure

There are two main folders gptonly, for TurnGPT, and pairwisegpt for PairwiseTurnGPT. Here follows the contents of said folders with reference to where code has not been modified for this project.

Each contains:

tokenizer.py: written for the original TurnGPT
train.py: contains entry code
trainer.py: contains main train and testing loops
model.py: contains model intialisation and tokenization initialisation
gpt.py: contains model architecture modified from original Hugging Face version
evaluate.py: miscellaneous code for experiments related to generation and artifical removals of overlaps
generate.py: written for the original TurnGPT, adapted for dual stream tokens
utils.py: miscellaneous code with functions from original TurnGPT
config.py: contains runs that is used for training runs

The data folder contains the necessary code for loading and running functions

dialog_dm.py: main entry point for loading data
spoken_dm.py: contains the main data module for loading and tokenizing the spoken dialogues
aligned_processor.py: contains the main data module for aligning the spoken dialogues in a dual-channel manner
serialised_processor.py: contains the main data module for serialising the spoken dialogues in a single-channel manner
switchboard/switchboard.py: adapted from original TurnGPT calling additional functions for pairwise approach
switchboard/utils.py: adapted from original TurnGPT with pairiwse setup and retaining phenomna

The common/metrics folder contains the necessary code for calculating metrics (only for PairwiseTurnGPT)

Data

Switchboard data is aquired from https://www.inf.ed.ac.uk/resources/corpora/ for Switchboard. But alternatively a download script is provided in data/switchboard/extract_swb.bash to download the data. More information can be found at data/README.md and data/switchboard/README.md. Interact with the data subfolder directly to understand how the data is loaded and tokenized.

Training Runs

To acquire the runs as described in the paper look through config.py for each folder. Generally this is run using:

TurnGPT Run Options

python gptonly/train.py --result-run RUN_NAME

experiment1