-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Real-time streaming Fast FullSubNet (LSTMCell) #67
Comments
More details on the model latency:
How did you calculate your RTFs? |
fronx
changed the title
Real-time streaming Fast FullSubNet (LSTMCell)
Real-time streaming Fast FullSubNet (LSTMCell) — RTF >1.8?
Feb 16, 2024
fronx
changed the title
Real-time streaming Fast FullSubNet (LSTMCell) — RTF >1.8?
Real-time streaming Fast FullSubNet (LSTMCell)
Feb 16, 2024
Good news: I updated my operating system to Sonoma 14.3.1 and that fixed it, without any further code changes. Now the processing time is consistently between 13ms and 15ms. |
For potential future reference, here's the torch.profiler output of a single inference run:
And here's a pretty timeline view: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm trying to run Fast FullSubNet in a real-time audio streaming context.
I've successfully trained a model that seems to work reasonably well in a non-streaming context: https://github.com/fronx/FullSubNet/releases/tag/fast118
However, the latency of running it in that way is too high. I've tried turning down the hop length, but it just leads to choppy, unintelligible noise. So I looked around and apparently the structure of the code needs to be changed quite a bit for that to work?
I'm happy to execute the change and contribute it to this repo, but I might need a little bit more guidance so I don't go off track. I know how to program, but I'm still fairly new to audio ML.
Gathered instructions
For reference, to have everything in one place, here are instructions I gathered from older issues:
Questions
[B, 1, F, T]
) to an array of samples. Is that necessary? Wouldn't that require completely retraining the model from scratch?LSTM
to looping overLSTMCell
require retraining?Thanks in advance for any hints you can provide. Would be nice if we could get this repo into usable shape for streaming inference in a way that's shareable with the world. 🤩
The text was updated successfully, but these errors were encountered: