For the speaker diarzation track, only the first channel of waves will be used.
The main stage:
- We use the implementation of Kaldi toolkits. Please install the Kaldi toolkits and conduct
ln -s /export/kaldi/utils/ utils
andln -s /export/kaldi/step/ step
. - Stage 1 is for the data preparation and stage 2 is for the voice detect activity (VAD).
- When using the
VBx
toolkits for diarization, please convert the segments file to.lab
. Usescripts/segment_to_lab.sh
to change the file format. - The speaker diarization consists of speaker embedding extraction and speaker embedding clustering. In our baseline system, the
VBx
toolkit is used to extract the speaker embeddings. - For the speaker-embedding cluster, the code will get the hypothesis rttm for each audio in the wav.scp.
- We obtain the reference rttm through the ground truth transcripts.
- We use toolkits
dscore
to get the DER results.
Download the model from the path. Then, move the exp
directory to our speaker
directory and move the ResNet101_16kHz
to speaker/VBx/models
.