-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Suggestion] Guide to Access the Best Quantized LLM Models on Hugging Face #193
Comments
I'm fresher it is my first time to contribute a project in github and i like to take this issue |
what are the topics you expect to be there in the guidelines |
@lingeshsathyanarayanacm You could guide users to first understand their own system specs and what level of quantized and non quantized models they can run on their system (for this you need multiple people with different hardware specs who would contribute by sharing their experience with running models like in my case i have RTX 4060 8GB VRAM variant so i can run several types of quantized model like 7B,8B,13B, even 22B but level of quantization matters here like for running 22B model on my system i need to run with with lowest possible quantized version possible like i tried Codestral 22B with IQ2_XS quant and that could barely fit in my VRAM and as far as i could remember it used very little amount of my 780M igpu VRAM too, was slow little bit but could run and give back correct answer for lot of coding question surprisingly :-) but anyways for others lower weights model like 7B,13B i could run them upto Q5_K_M, Q5_K_S easily which is good quality tbh so similarly you have to gain other people experience like upto what level of quant model they could fit and use and should be usable and not be like 1 t/s or 2 t/s speed of generation. This experience you can get from reddit as many share their specs and what level of model they could use easily like you can create category for fast,usable,not usable and then can create guide for like upto what level of VRAM is required to run what level of actual weight models and quantized models so people can understand their limitations easily and can download models accordingly. |
I think we can provide users with clear guidance on where to find the best-optimized and quantized versions of open weights large language models (LLMs) on Hugging Face. Many developers are releasing quantized versions of popular LLMs to improve performance and efficiency, and it's crucial to help users find these resources easily.
Not Many have powerful GPU's to run even fp16 8b models so they will be truly disappointed if they dont know about quantized models and what and from where to download exactly so you can guide them to either some video plus suggest some popular profiles to use for downloading quantized models from hugginface like i use majorly
bartowski
quantized models - https://huggingface.co/bartowskiThe text was updated successfully, but these errors were encountered: