-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No outputs #11
Comments
Thanks for reporting this If you can do the following:
Thanks |
Im on 0.17, checked by running " whisper-ctranslate2 --version "
Yes, the transcriptions appears on the terminal
Ok, ive just tried this and noticed something. Btw i decided to run on a 2 minutes flac audio to speed up things. I ran the program using "whisper-ctranslate2 [the audio file] --model tiny": didnt work. Then i ran with large-v2 and to my surprise, it worked. Then i tried again with large-v2 and it worked again. Then i came back to tiny and it stopped working. Then i tried with base: doesnt work. THen i finally tried with large-v2 again and it worked. But previously even the large-v2 was not working. |
Hello. I'm unable to reproduce this problem in my Windows machine. My only comment if you have tried doing inference in CPU vs GPU and if this makes any difference. Thanks |
Hi, I Have the same problem, transcription appears on the screen until the end of the duration but no files are produced. |
Do you have any file that you can share then I can try to reproduce it? Thanks |
Hi, I found that this only happens on GPU, it produces output when I add "--device CPU" |
Hi Jordimas, I am having similar issues. The first is that nothing gets output unless output type and location are set (though perhaps that is by design?) The second is that unless I add "--device CPU" no data is returned- I just go back to the command prompt. This is true for short clear wav, longer mp4, English and Japanese. I have a RTX 2080 Super with the current studio driver (531.61). I am able to use basic Whisper installations with CUDA as well as Const-me, etc. Is there something I need to set up here or in NVIDIA control panel? For test video we can use the same one I shared before. whisper-ctranslate2.exe --language ja --model "large-v2" --device CPU --output_dir "C:\Users\rsmit\Dropbox\Videos" --output_format "srt" "C:\Users\rsmit\Dropbox\Videos\10 MPantry final new titles 2.mp4" Change to CUDA and it fails. whisper-ctranslate2.exe --language ja --model "large-v2" --device CUDA --output_dir "C:\Users\rsmit\Dropbox\Videos" --output_format "srt" "C:\Users\rsmit\Dropbox\Videos\10 MPantry final new titles 2.mp4" Base model, etc. also fail. NVIDIA Control Panel reports I have NVIDIA CUDA 12.1.107 driver. It has a compute capability of 7.5. |
No, this is not by design. By design it outputs all formats and writes in the current directory that you are. @rsmith02ct Is possible please to create a separate ticker for this issue? It's different to the other one. Thanks |
I think it's exactly the same problem. |
I had this same problem. I was unable to pinpoint it to specifically whisper-ctranslate2, but the problem is exactly the same as yours. It displays the translation. There are no errors. No output files are written. It does write out if I choose a very small file (like a minute or two long), but longer files just mysteriously do not have any outputs. I do not know enough about the code itself to know if it makes sense that longer files would not produce outputs but shorter files will. |
I can confirm that I have the same problem. No output file is created. My command line is (in powershell):
|
Could you try running this on a clip that is only one or two minutes, and see if it works? That seems like it works for me, which may help narrow down a cause if that is a reproducible pattern. |
Hi @Zacharie-Jacob , I tried additional test on a 4min clip with same command line:
Here are the results:
I repeated the same test with |
I can confirm that 16khz mono conversion works, but a lot of the information are lost and the output is very different than CPU on original file. |
Hmm, here I don't see any text in the cmd terminal window when --cuda is enabled (and there's no text output). When set to CPU it works fine on every file I've given it in English and Japanese. I'm using an NVIDIA RTX 2080 Super with the current studio driver and CUDA SDK also installed (Windows 11). |
In my environment, I can almost stably trigger the bug. It prints completely in command line, but nothing outputs in current directory and there is a windows error # \Anaconda\envs\Lib\site-packages\src\whisper_ctranslate2\whisper_ctranslate2.py
def main():
...
for audio_path in audio:
result = Transcribe().inference(
...
output_format,
output_dir,
audio_path,
)
# writer = get_writer(output_format, output_dir)
# writer(result, audio_path) # \Anaconda\envs\Lib\site-packages\src\whisper_ctranslate2\transcribe.py
class Transcribe:
...
def inference(
...
output_format,
output_dir,
audio_path,
):
...
result = dict(
text=all_text,
segments=list_segments,
language=language_name,
)
from .writers import get_writer
writer = get_writer(output_format, output_dir)
writer(result, audio_path)
# return result The detailed process of my debuggingenvironment
trigger the bug
Set the breakpoint# whisper_ctranslate2\whisper_ctranslate2.py
for audio_path in audio:
result = Transcribe().inference(...)
print(result) # some operation. Setting breakpoint here and moving the mouse on result will trigger `python has stopped working` error analysis (unconfirmed)
class Transcribe:
...
def inference(...):
list_segments = []
last_pos = 0
accumated_inc = 0
all_text = ""
...
return dict(
text=all_text,
segments=list_segments,
language=language_name,
) I guess it is suspected that list_segments = [
{ },
...
] Some failed attemptsucrtbase.dllIn Windows Event Viewer, we can see that the crash seems to be related to Writers
|
Thanks for investing time on this @runw99 Regarding memory, Python uses reference counting then it should delete the variable when it does out of scope. Here you have an article that explains how memory works in Python: https://rushter.com/blog/python-garbage-collector/ Actually you have check the reference that it has by doing: import sys I have no idea why this happens, but I do not believe that is due to the variable going out of scope (it's recycled) |
Thansk for your reply. The article you mentioned helps me review the Garbage Collection in Python and learn something new. I have never encountered such a bug before, and I am curious about its causes and solutions. Looking forward to the follow-up Thank you again for the patient answer and this project really saves me a lot of effort to run a big model. |
I ran 355 files, ranging in length from 10 to 120 minutes. |
@rsmith02ct reported that my standalone compile doesn't have this bug. [it doesn't use cli from this repo]
Faster-whisper converts to same audio format using PyAV library, OpenAI is using ffmpeg. |
I second @rsmith02ct , I too have noticed that when I convert audio by Audacity, the results are better than ffmpeg. |
same problem , 1.0 could outputs , but will frequently missing large dialogues |
have the same problem. Can see all the text in the powershell of it transcribing and translating, then when its done. nothing. no srt files are generated. whisper-ctranslate2 "file name here.mp4" --device cuda --device_index 0 --vad_filter true --vad_min_speech_duration_ms 50 --vad_min_silence_duration_ms 2000 --vad_max_speech_duration_s 10 --condition_on_previous_text False --language Japanese --task translate --output_format srt --model large-v2 |
@Qel0droma Are you using a GPU? |
yes |
no luck getting any kind of output, using a 16khz wav that i use for testing Const-me whisper and whisper cpp, expected is a 10 minute translation.
Or some try with default params
Then me follows the instructions and delete "old cache files"
Try to enable debug logging:
same with
Now, read some python docs and see that "true" often is written as "True":
ok, try some other stuff:
|
Hi, I think it's the same issue as SYSTRAN/faster-whisper#71 which I can now reproduce on Windows. When the output files are missing, you can verify that the process crashed with a non-zero exit code:
The process crashes when the model is unloaded but only when the transcription triggered the temperature fallback. If you disable the temperature fallback it should work without issue. Try adding this option on the command line:
The crash seems to happen only on Windows. @jordimas In the meantime, you could slightly change the code to ensure the |
Win 11:
Going to try on other OS tomorrow. |
Thank you, this fixes my problem. Yes, I am on Windows. Unfortunately, that setting was particularly useful, as it prevents the translation from falling into ruts. I will have to make do with a combination of other settings for now. |
Even using |
It looks like it is linked to general use of Temperature, perhaps? I was under the impression that you can have no temperature increment while still using temperature and best_of, but it looks like I get intermittent missing outputs if I am using any temperature settings at all other than just setting the fallback to None. |
Thanks a lot for looking into this issue. I was trying to get more evidence before reporting it to CTranslate issue, but it's great that you are looking a this. Based on the feedback on this thread and the fact that I do not even have a Windows box with CUDA to test it, I do not know if it's worth to do a fix in whisper-ctranslate2 or just wait for the issue to be fixed in ctranslate2. I |
Just to see I made a local change to ensure the model was unloaded after outputs were written out. This sort of works, in that if it was going to crash, the files are written out before it crashes, but if you passed multiple files in to be processed it still crashes when the model is unloaded, so:
Assuming the crash currently occurs with diff --git a/src/whisper_ctranslate2/transcribe.py b/src/whisper_ctranslate2/transcribe.py
index ca53fac..c422037 100644
--- a/src/whisper_ctranslate2/transcribe.py
+++ b/src/whisper_ctranslate2/transcribe.py
@@ -187,7 +187,7 @@ class Transcribe:
last_pos = segment.end
pbar.update(increment)
- return dict(
+ return model, dict(
text=all_text,
segments=list_segments,
language=language_name,
diff --git a/src/whisper_ctranslate2/whisper_ctranslate2.py b/src/whisper_ctranslate2/whisper_ctranslate2.py
index 1ff8335..58862a8 100644
--- a/src/whisper_ctranslate2/whisper_ctranslate2.py
+++ b/src/whisper_ctranslate2/whisper_ctranslate2.py
@@ -514,7 +514,7 @@ def main():
return
for audio_path in audio:
- result = Transcribe().inference(
+ model, result = Transcribe().inference(
audio_path,
model_dir,
cache_directory,
@@ -531,6 +531,7 @@ def main():
)
writer = get_writer(output_format, output_dir)
writer(result, audio_path, writer_args)
+ model = None
if verbose:
print(f"Transcription results written to '{output_dir}' directory") So it's not that helpful to try and work around it from whisper-ctranslate2. Hopefully it can be resolved upstream. |
You could load the model once and then use the same model instance to transcribe each file. This should work around the issue and also be more efficient than reloading the model each time. |
Is there a good workaround for this? Not having access to Temperature at all results in substantially worse model results. |
Hello @guillaumekln. Do you have a timeline to release OpenNMT/CTranslate2#1201 ? If it's going to take more than a week, I can release a version changing the structure of the code (while my preference is to get this fixed upstream). Thanks, Jordi |
Hi, this change does not fix the issue according to user reports in SYSTRAN/faster-whisper#71. I have a hard time debugging this issue as I don't typically develop on Windows. For now I suggest that you update the code to keep the model alive until all transcriptions are complete. |
I will then merge https://github.com/Softcatala/whisper-ctranslate2/pull/44/files in the next hours. This should fix the issue. If somebody wants to provide feedback since I do not have a Windows box handy neither. Thanks |
Version 0.2.6 should fix this. |
Loaded 0.2.7 and sure enough this fixed the problem for me. I had been forced to use --device cpu for a while now, which is significantly slower than cuda with my 3080. Thank you. |
Currently am having the same issue on 0.2.7.
And then it exits. CPU works. |
@iGerman00 try if this works for you https://github.com/Purfview/whisper-standalone-win |
I also have a similar problem, but in my case, there is no effective output. And the return code is not 0 (whisper) PS D:\BaiduNetdiskDownload> pip list
Package Version
------------------- ----------
av 10.0.0
certifi 2023.11.17
cffi 1.16.0
charset-normalizer 3.3.2
colorama 0.4.6
coloredlogs 15.0.1
ctranslate2 3.23.0
faster-whisper 0.10.0
filelock 3.13.1
flatbuffers 23.5.26
fsspec 2023.12.2
huggingface-hub 0.19.4
humanfriendly 10.0
idna 3.6
mpmath 1.3.0
numpy 1.26.2
onnxruntime 1.16.3
packaging 23.2
pip 23.3.1
protobuf 4.25.1
pycparser 2.21
pyreadline3 3.4.1
PyYAML 6.0.1
requests 2.31.0
setuptools 68.2.2
sounddevice 0.4.6
sympy 1.12
tokenizers 0.15.0
tqdm 4.66.1
typing_extensions 4.9.0
urllib3 2.1.0
wheel 0.41.2
whisper-ctranslate2 0.3.4
(whisper) PS D:\BaiduNetdiskDownload> whisper-ctranslate2.exe aaa.mp4 --model small --language zh --verbose True
stream 0, timescale not set
Detected language 'Chinese' with probability 1.000000
(whisper) PS D:\BaiduNetdiskDownload> |
Does this problem still exist? I am seeing it, so I think it is... |
it work perfect now
获取Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: YS Shin ***@***.***>
Sent: Thursday, May 2, 2024 3:14:46 PM
To: Softcatala/whisper-ctranslate2 ***@***.***>
Cc: eighh1 ***@***.***>; Comment ***@***.***>
Subject: Re: [Softcatala/whisper-ctranslate2] No outputs (Issue #11)
Does this problem still exist? I am seeing it, so I think it is...
―
Reply to this email directly, view it on GitHub<#11 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AISKX5DPT44L27UWPHWFITTZAHRWNAVCNFSM6AAAAAAWZHQGKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBZG43TGNJWGY>.
You are receiving this because you commented.Message ID: ***@***.***>
|
I spent half an hour running the large-v2 model on a 25 minutes video. At the end of the process, there were no outputs.
The command i used: whisper-ctranslate2 [the video file] --model large-v2 --output_format srt --output_dir .\ --word_timestamps True --no_speech_threshold 0.2 --logprob_threshold None
GPU -> GTX 1060 (6GB VRAM model)
Average VRAM used by whisper-ctranslate2 during the process -> varies from 2.5 to 4.5GB
Windows 10
Edit: tried with tiny model. Doesnt work either. No outputs.
The text was updated successfully, but these errors were encountered: