You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 27, 2023. It is now read-only.
I want to get the time coordinates of each word in my 'audio.wav' using python pocketsphinx 0.1.15. I reproduce the official example code from the project https://pypi.org/project/pocketsphinx/ which works well for 'goforward.raw':
When i use my 'audio.wav' the output of ps.segments(detailed=True) is not so bad but when using AudioFile classe (as in the official example) the result is very inaccurate. Not even close to be correct in time coordinates (since the audio is 2.52 sec.) nor in the number of segments.
What is wrong? What should i do to have correct time coordinates?
import os.path
# This is just to have audio info
import wave
import contextlib
from pocketsphinx import (Pocketsphinx, AudioFile, LiveSpeech)
# my own ps model an other resources
from utils.utilities import (get_mexconf, get_data_path)
# get the file and print audio properties
wav = os.path.join(get_data_path(), 'audio.wav')
with contextlib.closing(wave.open(wav,'r')) as f:
rate = f.getframerate()
frames = f.getnframes()
duration = frames / float(rate)
print('rate', rate, 'frames', frames, 'duration', duration)
# This part seems to work getting segments
segments = get_segments(wav)
print(segments)
# set up my asr models and my audio
config = get_mexconf()
config['audio_file'] = wav
audio = AudioFile(**config)
# This part is copy paste from official example #
# Frames per Second
fps = 100
config['frate'] = fps
for phrase in audio:
print('-' * 28)
print('| %5s | %3s | %4s |' % ('start', 'end', 'word'))
print('-' * 28)
for s in phrase.seg():
print('| %4ss | %4ss | %8s |' % (s.start_frame / fps, s.end_frame / fps, s.word))
print('-' * 28)
There is a bug getting utterances.
Related stackoverflow Question
I want to get the time coordinates of each word in my 'audio.wav' using python pocketsphinx 0.1.15. I reproduce the official example code from the project https://pypi.org/project/pocketsphinx/ which works well for 'goforward.raw':
When i use my 'audio.wav' the output of ps.segments(detailed=True) is not so bad but when using AudioFile classe (as in the official example) the result is very inaccurate. Not even close to be correct in time coordinates (since the audio is 2.52 sec.) nor in the number of segments.
What is wrong? What should i do to have correct time coordinates?
rate 16000 frames 40371 2.5231875
Here is my python code:
This is the config:
Causes this output:
The text was updated successfully, but these errors were encountered: