Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lmrescore failure and missing Gr.fst when run the training/run.sh #1668

Open
dyustc opened this issue Nov 28, 2024 · 3 comments
Open

lmrescore failure and missing Gr.fst when run the training/run.sh #1668

dyustc opened this issue Nov 28, 2024 · 3 comments

Comments

@dyustc
Copy link

dyustc commented Nov 28, 2024

Hi, dear authors,

I followed the training recipe in the vosk-api/training folder, I did get a trained model, but the performance is not so good.
Also I found the following errors and some mismatch in model structures from the pretrained models. I wondered if there is something I did wrong.
I used the recent official kaldi repo and installed it with cuda on successfully.

  1. in the decode stage, steps/lmrescore_const_arpa.sh would fail, and here is the log(maybe there needs to be a specfic version of kaldi?) But there is a WER result at last. I guess just the rescored version failed.
    截屏2024-11-28 19 05 28

  2. I intended to get a model structure similar to "vosk-model-en-us-0.22-lgraph", but there is some difference. this is my exp/chain/tdnn folder.
    截屏2024-11-28 19 09 14

  • First, compared to the pretrained models, I got a HCLG.fst, but not a HCLr.fst and Gr.fst, supposed I need a runtime graph.
  • Secondly, I don't find the model.conf file, I tried to collect all the params during training, but maybe not enough. So I just copied the one from "vosk-model-en-us-0.22-lgraph", it worked, but not sure it fits right into my own trained model.
  • I map the exp/chain/extractor folder to ivector folder, not sure it works, but the files are similar.
  1. From the results I got, I run the python script, test_simple.py, all the output words are upper case also not very precise(since I just run the demo run.sh, the training data couldn't be sufficient, so maybe this is possible, I can attach the audio if necessary, it's a good quality speech with decent pronunciation ), and I got a warning, runtime graphs are not supported, as I mentioned above.

So could you help with this? Am I missing some steps in training or there is some twist I should do after training?
Many thanks~

@dyustc dyustc changed the title lmrescore failure and HCLG.fst lmrescore failure and missing Gr.fst when run the training/run.sh Nov 28, 2024
@nshmyrev
Copy link
Collaborator

in the decode stage, steps/lmrescore_const_arpa.sh would fail, and here is the log(maybe there needs to be a specfic version of kaldi?) But there is a WER result at last. I guess just the rescored version failed.

bad option --project_output means you have openfst version mismatch. We recommend to use our branches for training, they have version mismatch fixes.

First, compared to the pretrained models, I got a HCLG.fst, but not a HCLr.fst and Gr.fst, supposed I need a runtime graph.

You run mkgraph_lookahead.sh script to make dynamic graph instead of static

Secondly, I don't find the model.conf file, I tried to collect all the params during training, but maybe not enough. So I just copied the one from "vosk-model-en-us-0.22-lgraph", it worked, but not sure it fits right into my own trained model.

It is ok, you can copy existing one

all the output words are upper case also not very precise

Accurate model requires a lot of training data, not sure how much did you use and what language was it

@dyustc
Copy link
Author

dyustc commented Dec 10, 2024

hi, @nshmyrev , thanks for the quick response, but I get stucked in a cuda mismatch problem when I tried to install kaldi. I tried either 'main' or 'vosk' branches in
alphakaldi, there is a cuda bug in setup for kaldi/src folder. I am using NVCC version 12.6,

Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0

It crashes in makefiles setup all the time, while the latest official kaldi would pass, so this stands in the way of me taking the steps you suggested above.
is this another tools version mismatch problem
截屏2024-12-10 17 31 52

@nshmyrev
Copy link
Collaborator

Hm, something with new cuda. I need to update the codebase then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants