[Suggestion] Guide to Access the Best Quantized LLM Models on Hugging Face #193

Greatz08 · 2024-12-31T14:09:33Z

I think we can provide users with clear guidance on where to find the best-optimized and quantized versions of open weights large language models (LLMs) on Hugging Face. Many developers are releasing quantized versions of popular LLMs to improve performance and efficiency, and it's crucial to help users find these resources easily.

Not Many have powerful GPU's to run even fp16 8b models so they will be truly disappointed if they dont know about quantized models and what and from where to download exactly so you can guide them to either some video plus suggest some popular profiles to use for downloading quantized models from hugginface like i use majorly bartowski quantized models - https://huggingface.co/bartowski

The text was updated successfully, but these errors were encountered:

lingeshsathyanarayanacm · 2025-01-02T11:29:15Z

I'm fresher it is my first time to contribute a project in github and i like to take this issue

lingeshsathyanarayanacm · 2025-01-02T12:06:36Z

what are the topics you expect to be there in the guidelines

Greatz08 · 2025-01-08T03:28:17Z

@lingeshsathyanarayanacm You could guide users to first understand their own system specs and what level of quantized and non quantized models they can run on their system (for this you need multiple people with different hardware specs who would contribute by sharing their experience with running models like in my case i have RTX 4060 8GB VRAM variant so i can run several types of quantized model like 7B,8B,13B, even 22B but level of quantization matters here like for running 22B model on my system i need to run with with lowest possible quantized version possible like i tried Codestral 22B with IQ2_XS quant and that could barely fit in my VRAM and as far as i could remember it used very little amount of my 780M igpu VRAM too, was slow little bit but could run and give back correct answer for lot of coding question surprisingly :-) but anyways for others lower weights model like 7B,13B i could run them upto Q5_K_M, Q5_K_S easily which is good quality tbh so similarly you have to gain other people experience like upto what level of quant model they could fit and use and should be usable and not be like 1 t/s or 2 t/s speed of generation. This experience you can get from reddit as many share their specs and what level of model they could use easily like you can create category for fast,usable,not usable and then can create guide for like upto what level of VRAM is required to run what level of actual weight models and quantized models so people can understand their limitations easily and can download models accordingly.
Main purpose of this guide would be to share as much experience possible of different people in better way which can be easily understood by everyone (techincal and non technical) and they can save their time on testing different level of models like we did :-) plus they will know from where to download those models. You can think or add much more but this much will be great to for many in my opinion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Suggestion] Guide to Access the Best Quantized LLM Models on Hugging Face #193

[Suggestion] Guide to Access the Best Quantized LLM Models on Hugging Face #193

Greatz08 commented Dec 31, 2024

lingeshsathyanarayanacm commented Jan 2, 2025

lingeshsathyanarayanacm commented Jan 2, 2025

Greatz08 commented Jan 8, 2025

[Suggestion] Guide to Access the Best Quantized LLM Models on Hugging Face #193

[Suggestion] Guide to Access the Best Quantized LLM Models on Hugging Face #193

Comments

Greatz08 commented Dec 31, 2024

lingeshsathyanarayanacm commented Jan 2, 2025

lingeshsathyanarayanacm commented Jan 2, 2025

Greatz08 commented Jan 8, 2025