Support for streaming inference #20

ojus1 · 2024-09-03T10:43:54Z

Is it possible to use pretrained weights for predicting codes in a chunk-wise fashion (streaming input audio)?

MrWaterZhou · 2024-11-12T08:06:47Z

Sliding window should be enough
`
def sliding_window(data, window_size=21, step=7):
return [data[i:i + window_size] for i in range(0, len(data) - window_size + 1, step)]

    id_list = sliding_window(output_ids)
    pcm_list = []
    for i, l in enumerate(id_list):
        audio_hat = decode(l)
        if i == 0:
            # first chunk
            pcm_list.append(audio_hat[:, :, :2048 * 2])
        elif i < len(id_list)-1:
            # middle 
            pcm_list.append(audio_hat[:, :, 2048:2048 * 2])
        else:
            # last chunk
            pcm_list.append(audio_hat[:, :, 2048:])
    pcm_list = torch.cat(pcm_list, dim=-1)
    torchaudio.save('stream_test.wav', pcm_list[0].cpu(), 24000)

`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for streaming inference #20

Support for streaming inference #20

ojus1 commented Sep 3, 2024

MrWaterZhou commented Nov 12, 2024 •

edited

Loading

Support for streaming inference #20

Support for streaming inference #20

Comments

ojus1 commented Sep 3, 2024

MrWaterZhou commented Nov 12, 2024 • edited Loading

MrWaterZhou commented Nov 12, 2024 •

edited

Loading