Skip to content

a transformer implemented primarily using einops and trained on the tinystories dataset

Notifications You must be signed in to change notification settings

clankur/einygpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

einygpt

An implementation of a GPT-esque LLM primarily using einops and trained over the TinyStories dataset. It incorporates techniques to support efficient inference with a KV Cache and GQA (grouped query attention).

Training a 6.9 million parameter model on a RTX4090 with the GPT2Tokenizer achieves results inline with the findings from the TinyStories paper and gets a perplexity of 1.0001 over the validation set. Additionally, training a 4.3 million parameter model with its own Byte-Pair Encoding tokenizer and using GQA w/ 4 groups achieves a comparable perplexity. Both models produce stories that have a logical flow and have a good grasp of grammar. You can compare their outputs side by side in this notebook.

You can find the models on Hugging Face here.

About

a transformer implemented primarily using einops and trained on the tinystories dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published