Replies: 1 comment
-
Sorry to bother @sanchit-gandhi, any quick thoughts on this? I've not found initial_prompt to make much impact on transcriptions with distil_whisper models at all, but large-v3 gives much improved transcriptions over base.en, with similar execution times on my hardware. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This might be quite specific to my use case, but hoping someone else has an answer.
I use faster-whisper with base.en and other models with an
initial_prompt
set to something like`- Hello, this is the first speaker.
Which is surprisingly effective at speaker turn tracking, or basic diarization. I'm transcribing live audio streams (in 30s sequential chunks) so other diarization methods aren't really helpful, and I'm dealing with quite limited hardware.
I'd like to switch to one of the distil-whisper models, particularly large-v3, but have noted despite a similar
initial_prompt
, I can't get these distil models to generate the-
tokens. I don't think they are being suppressed either, same result withsuppress_tokens=None
set in faster-whisperIs that perhaps to do with the distilled training data or am I missing something?
Beta Was this translation helpful? Give feedback.
All reactions