GitHub

The goal of this project is to train a SpecAugment + LibriSpeech 960h + Listen, Attend, and Spell pipeline as in the SpecAugment paper, and then run it in real time on this GPU here.

Encoder:

two layers of 3x3 convolution with stride 2
three or four layers of bidirectional LSTM, with a projection layer and batch normalization layer each (a projection layer is presumably projecting the channels down??? or "subsampling" from adjacent times???)

Decoder:

one or two LSTM
it uses attention as in the second paper

$$c_k = \sum_l \alpha_{kl} h_l, \quad \alpha_{kl} = softmax(f_1(h_l)^T f_2(o_k^1))$$

SpecAugment: https://ai.googleblog.com/2019/04/specaugment-new-data-augmentation.html

Listen, Attend, and Spell: https://arxiv.org/pdf/1508.01211.pdf and https://arxiv.org/pdf/1902.01955.pdf

Other papers:

Very Deep Convolutional Networks for End-to-End Speech Recognition: https://arxiv.org/pdf/1610.03022.pdf
Sequence-to-Sequence Models Can Directly Translate Foreign Speech: https://arxiv.org/pdf/1703.08581.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
__pycache__		__pycache__
nets.01		nets.01
nets.02-bak		nets.02-bak
nets.02		nets.02
nets.odd		nets.odd
nets		nets
util		util
.gitignore		.gitignore
README.md		README.md
config.py		config.py
createindex.py		createindex.py
index		index
lrsche.py		lrsche.py
mangle.py		mangle.py
ntes.py		ntes.py
package-list.txt		package-list.txt
py		py
run-web-server		run-web-server
sampletimer.py		sampletimer.py
show_from_mic.py		show_from_mic.py
show_one_spec.py		show_one_spec.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Other papers:

About

Releases

Packages

Contributors 2

Languages

colaprograms/speechify

Folders and files

Latest commit

History

Repository files navigation

Other papers:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages