-
Notifications
You must be signed in to change notification settings - Fork 72
/
Copy pathchangelog.txt
253 lines (210 loc) · 7.82 KB
/
changelog.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
Log for Standalone Faster-Whisper & Faster-Whisper-XXL
### Note: It's not a comprehensive log.
## r239.1 [XXL]:
New features/args:
`--batched`
`--unmerged`
`--batch_size`
`--multilingual`
`--hotwords`
`--rehot`
`--ignore_dupe_prompt`
New `vad_method`'s:
`silero_v5`
`silero_v4_fw`
`silero_v5_fw`
Removed not useful experimental options:
no_speech_strict_lvl
nullify_non_speech
prompt_max
reprompt's first option
## r194.5 [XXL]:
Change: Enable auto/default prompt presets for translation.
## r194.4 [XXL]:
Fix: --sentence was splitting digits containing dot.
Fix: auditok
Fix: Error with empty audio and pyannote_v3 vad
Fix: Disable prompt_reset_on_no_end when HST is triggered
Removed experimental condition in HST algo
## r194.2 [XXL]:
Bugfix: Broken outputs when diarize/sentence and batch input [194.1].
Fix: --sentence was splitting words on hyphens.
Fix: --sentence was splitting digits containing comma.
Change: --sentence affects all output formats except json [previously only srt & vtt].
Improvements and bugfixes for json input.
Added "One Click Transcribe" tool: https://github.com/Purfview/whisper-standalone-win/discussions/337
## r194.1 [XXL]:
New feature: CUDA support for `pyannote_v3` and `pyannote_onnx_v3` VADs
New feature: Selective output formats, for example: `--output_format srt json`
New feature: `--diarize` will auto-activate `--sentence` and all output formats will be affected except json
New `--diarize` options: `pyannote_v3.0`, `pyannote_v3.1`, `reverb_v1`, `reverb_v2`
New diarization args: `--num_speakers`, `--min_speakers`, `--max_speakers` `--diarize_dump`
Bugfix: Bug in alternative subtitle writer with max_line_width [r189.1].
Bugfix: Bug in --sentence subtitle writer if "word" is space.
Change: .cache dir is bound to exe dir instead of cwd.
pyinstaller updated to 6.11.1
CTranslate2 updated to 4.4.0 [+ Compute Capability 5.0 support]
onnxruntime_gpu updated to 1.18.0 [CUDA 12.x cuDNN 8.x]
Python updated to 3.10.11
## r193.1 [XXL]:
New feature: Speaker Diarization with `--diarize`.
New feature: `large-v3-turbo` model autodownload.
New feature: JSON output has additional key: 'language'.
New feature: `--postfix` works without --language.
Bigfix: Rare bug in alternative subtitle writer if "words" is empty [r189.1].
Change: `--vad_alt_method` shorted to `--vad_method`.
Change: Windows 7 non supported anymore.
## r192.3.4 [XXL]:
New feature: Write intermediate files to temp folder. [ignored on dump]
New feature: Can take JSON files as input to generate subtitles from it according to the settings.
New alternative VAD method : `pyannote_v3` [previous same named option renamed to "pyannote_onnx_v3"]
New arg: `--nullify_non_speech` [for WIP experiments]
## r192.3.3 [XXL]:
CTranslate2 updated to v4.2.0
Transcription on CPU is up to ~26% faster.
## r192.3.2 [XXL]:
Various improvements and bugfixes to `--vad_alt_method`
## r192.3:
Bugfix: 'one_word' was broken in r192.2.
## r192.2:
Allowed to use 'max_line_width' and 'max_line_count' without '--sentence'.
## r192.1.1 [XXL]:
`--ff_mdx_kim2`: Preprocess audio with MDX23 Kim vocal v2 model.
`--vad_alt_method`: Alternative VAD methods - "silero_v3", "silero_v4", "pyannote_v3", "auditok", "webrtc".
## r192.1:
Bugfix: It was unnecessarily going through ffmpeg routines. [introduced in r189.1]
## r189.1:
Supports `distil-large-v3` model.
Switched auto prompt for Serbian to the latin alphabet.
Auto-disable VAD when --clip_timestamps is in use.
JSON output has additional 'text' key with all text aggregated.
Fixed --task=translate and --postfix bug.
--highlight_words now supports --max_line_width and --max_line_count
Changed patience default.
Adjusted "--hallucination_silence_threshold"s score algo, and new `--hallucination_silence_th_temp`
Auto pseudo-vad th offsets for "large-v3", can be disabled with --v3_offsets_off.
`--language_detection_threshold`
`--language_detection_segments`
`--ff_track` - Audio track selection.
`--ff_fc` - Front-Center channel selection.
`--ff_dump` - Additionally prevents deletion of some intermediate audio files.
`--vad_dump` - Writes VAD timings to srt file.
Updated PyInstaller to v6.5.0
Fix Linux exec issue with forward compatibility. v2
## r186.1:
Fix quirks with "The last progress update is passed to the title bar." [r160.11]
Fix Linux exec issue with forward compatibility.
## r182.2:
Bugfix: PR705 was incorporated a bit incorrectly.
Bugfix: Bug with the custom prompt presets and task "translate".
## r182.1:
Includes PR705 & PR706
Excludes PR694 [aka CUDA12]
Skip micro chunks.
New args:
`--hallucination_silence_threshold`
`--clip_timestamps`
Updated PyAV to v11.0.0 [ffmpeg 6.0]
Updated PyInstaller to v6.4.0
Updated CTranslate2 to v3.24.0
## r172.4:
Bugfix: With some CPUs rnndn filters were not working.
Improved: `--sentence` doesn't treat ellipses (...) as the end of sentence, unless ellipses are at the end of a segment.
Changed default of `--initial_prompt` to `auto`.
## r172.3:
Better error handling for the audio filters.
## r172.2:
Increased "recursion" limit from 1000 to 10000.
## r172.1:
Supports "Distil-Whisper" models:
`distil-large-v2`
`distil-medium.en`
`distil-small.en`
New args:
`--max_new_tokens`
`--chunk_length`
## r167.5:
Additional option for `--prompt_reset_on_no_end`
Various audio filters:
`--ff_dump`
`--ff_mp3`
`--ff_sync`
`--ff_rnndn_sh`
`--ff_rnndn_xiph`
`--ff_fftdn`
`--ff_tempo`
`--ff_gate`
`--ff_speechnorm`
`--ff_loudnorm`
`--ff_silence_suppress`
`--ff_lowhighpass`
## r167.4:
Bugfix: Bug if couldn't detect the number of CPU's cores.
Updated psutil to v5.9.7
## r167.3:
Bugfix: Illogical "Avoid computing higher temperatures on no_speech". [It could cause bad hallucination loops]
https://github.com/openai/whisper/pull/1903
https://github.com/SYSTRAN/faster-whisper/pull/625
## r167.2:
Fix: Commit #165 was missing in `r167.1` release.
## r167.1:
`large-v3` now is using the official model.
Updated PyInstaller to v6.3.0
Updated CTranslate2 to v3.23.0
## r160.12:
Doesn't add to prompt segments triggered by the common hallucinations in hallucinations_list.
New parameters :
`--max_comma_cent`
`--min_dist_to_end`
## r160.11:
Bugfix: `--prompt_reset_on_temperature` was broken. Regression since r139.
Improved `--initial_prompt` default preset, there is alternative experimental `auto` preset.
Improved `--sentence` with the ignore words list, words like `Mr.` and ect..
The last progress update is passed to the title bar.
New parameters :
`--prompt_max`
`--reprompt` [enabled by default]
`--prompt_reset_on_no_end` [enabled by default]
## r160.10:
Bugfix: The final line disappeared if the final word wasn't punctuated when using `--sentence`.
## r160.9:
Bugfix: The final word could disappear in some combination of the new parameters from r160.8.
New parameter : --standard_asia
## r160.8:
Improved --skip to work with all output folders.
New parameters :
`--sentence`
`--standard`
`--max_comma`
`--max_gap`
`--max_line_width`
`--max_line_count`
## r160.7:
Bugfix: Wildcard input could fail if brackets were in a folder's name.
Improved `--one_word` with the second option.
Improved `--skip` to work with all output formats. If output is "all" - checks only "srt".
New `--check_files` argument.
Added a catch for invalid `audio` input.
Turns off word_timestamps if output format is "text".
Exposed options :
`--prefix`
`--suppress_blank`
`--without_timestamps`
`--max_initial_timestamp`
## r160.6:
Bugfix: Autodownload wasn't working for `large-v3` [ regression in r160.5 ]
## r160.5:
large-v3 use fp16 model by default. [Removed other v3 options]
New switch: --one_word
Updated Pyinstaller to v6.2.0
## r160.4:
Bugfix: -prompt=None wasn't working.
UTF-8 chars are now supported in SE's "console".
## r160.3:
'large-v3' model.
'Cantonese' language.
## r160.2:
Failed release with a code typo.
Was online for ~15 mins...
## r160.1:
Bugfix: Fixed bug on macOS -> https://github.com/Purfview/whisper-standalone-win/issues/86