einygpt

An implementation of a GPT-esque LLM primarily using einops and trained over the TinyStories dataset. It incorporates techniques to support efficient inference with a KV Cache and GQA (grouped query attention).

Training a 6.9 million parameter model on a RTX4090 with the GPT2Tokenizer achieves results inline with the findings from the TinyStories paper and gets a perplexity of 1.0001 over the validation set. Additionally, training a 4.3 million parameter model with its own Byte-Pair Encoding tokenizer and using GQA w/ 4 groups achieves a comparable perplexity. Both models produce stories that have a logical flow and have a good grasp of grammar. You can compare their outputs side by side in this notebook.

You can find the models on Hugging Face here.

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.gitignore		.gitignore
README.md		README.md
common.py		common.py
model.py		model.py
perplexity.ipynb		perplexity.ipynb
tiny_tokenizer.py		tiny_tokenizer.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

einygpt

About

Releases

Packages

Contributors 2

Languages

clankur/einygpt

Folders and files

Latest commit

History

Repository files navigation

einygpt

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages