initial_prompt and basic speaker diarization using - #118

ooobo · 2024-04-13T10:33:27Z

ooobo
Apr 13, 2024

This might be quite specific to my use case, but hoping someone else has an answer.
I use faster-whisper with base.en and other models with an initial_prompt set to something like
`- Hello, this is the first speaker.

Nice to meet you, I'm speaker two.`
Which is surprisingly effective at speaker turn tracking, or basic diarization. I'm transcribing live audio streams (in 30s sequential chunks) so other diarization methods aren't really helpful, and I'm dealing with quite limited hardware.

I'd like to switch to one of the distil-whisper models, particularly large-v3, but have noted despite a similar initial_prompt, I can't get these distil models to generate the - tokens. I don't think they are being suppressed either, same result with suppress_tokens=None set in faster-whisper

Is that perhaps to do with the distilled training data or am I missing something?

ooobo · 2024-04-20T02:46:33Z

ooobo
Apr 20, 2024
Author

Sorry to bother @sanchit-gandhi, any quick thoughts on this? I've not found initial_prompt to make much impact on transcriptions with distil_whisper models at all, but large-v3 gives much improved transcriptions over base.en, with similar execution times on my hardware.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial_prompt and basic speaker diarization using - #118

{{title}}

Replies: 1 comment

{{title}}

Select a reply

initial_prompt and basic speaker diarization using - #118

ooobo Apr 13, 2024

Replies: 1 comment

ooobo Apr 20, 2024 Author

ooobo
Apr 13, 2024

ooobo
Apr 20, 2024
Author