You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sliding window should be enough
`
def sliding_window(data, window_size=21, step=7):
return [data[i:i + window_size] for i in range(0, len(data) - window_size + 1, step)]
id_list = sliding_window(output_ids)
pcm_list = []
for i, l in enumerate(id_list):
audio_hat = decode(l)
if i == 0:
# first chunk
pcm_list.append(audio_hat[:, :, :2048 * 2])
elif i < len(id_list)-1:
# middle
pcm_list.append(audio_hat[:, :, 2048:2048 * 2])
else:
# last chunk
pcm_list.append(audio_hat[:, :, 2048:])
pcm_list = torch.cat(pcm_list, dim=-1)
torchaudio.save('stream_test.wav', pcm_list[0].cpu(), 24000)
Is it possible to use pretrained weights for predicting codes in a chunk-wise fashion (streaming input audio)?
The text was updated successfully, but these errors were encountered: