Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

long context #360

Open
pseudotensor opened this issue Jun 30, 2023 · 18 comments
Open

long context #360

pseudotensor opened this issue Jun 30, 2023 · 18 comments

Comments

@arnocandel
Copy link
Member

arnocandel commented Jul 5, 2023

@arnocandel
Copy link
Member

^

import transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

old_init = transformers.models.llama.modeling_llama.LlamaRotaryEmbedding.__init__
def ntk_scaled_init(self, dim, max_position_embeddings=2048, base=10000, device=None):

    #The method is just these three lines
    max_position_embeddings = 16384
    a = 8 #Alpha value
    base = base * a ** (dim / (dim-2)) #Base change formula

    old_init(self, dim, max_position_embeddings, base, device)

transformers.models.llama.modeling_llama.LlamaRotaryEmbedding.__init__ = ntk_scaled_init

@pseudotensor
Copy link
Collaborator Author

@pseudotensor
Copy link
Collaborator Author

@arnocandel
Copy link
Member

@arnocandel
Copy link
Member

@arnocandel
Copy link
Member

#516 (comment) 70B with 16K

@arnocandel
Copy link
Member

huggingface/text-generation-inference#529 TGI will soon have it too

@arnocandel
Copy link
Member

c1ed68c

@pseudotensor
Copy link
Collaborator Author

pseudotensor commented Aug 28, 2023

Things I’m Learning While Training SuperHOT | kaiokendev.github.io
https://kaiokendev.github.io/til#extending-context-to-8k

Reddit - Dive into anything
https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/?onetap_auto=true

TheBloke/Llama-2-70B-chat-GPTQ · Hugging Face
https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ

NTK-Aware Scaled RoPE allows LLaMA models to have extended (8k+) context size without any fine-tuning and minimal perplexity degradation. : LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/

Llama 2 is here - get it on Hugging Face
https://huggingface.co/blog/llama2

jquesnelle/scaled-rope
https://github.com/jquesnelle/scaled-rope

(Experimental) Add support to NTK RoPE scaling by Panchovix · Pull Request #118 · turboderp/exllama
https://github.com/turboderp/exllama/pull/118/files

Reddit - https://preview.redd.it/2qdj7itsb39b1.png?width=662&format=png&auto=webp&s=464052174151b6ae8b6a9ce42b8f1acc9acabd35
https://preview.redd.it/2qdj7itsb39b1.png?width=662&format=png&auto=webp&s=464052174151b6ae8b6a9ce42b8f1acc9acabd35

How Long Can Open-Source LLMs Truly Promise on Context Length? | LMSYS Org
https://lmsys.org/blog/2023-06-29-longchat/

DachengLi1/LongChat: Official repository for LongChat and LongEval
https://github.com/DachengLi1/LongChat

Extending context size via RoPE scaling · ggerganov/llama.cpp · Discussion #1965
ggerganov/llama.cpp#1965

NTK-Aware Scaled RoPE allows LLaMA models to have extended (8k+) context size without any fine-tuning and minimal perplexity degradation. : LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/

HuggingFace models have max_position_embeddings set incorrectly · Issue #359 · facebookresearch/llama
meta-llama/llama#359

Summary post for higher context sizes for this week. For context up to 4096, NTK RoPE scaling is pretty viable. For context higher than that, keep using SuperHOT LoRA/Merges. : LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/14ojd7s/summary_post_for_higher_context_sizes_for_this/

Output garbled in llama2 model · Issue #510 · vllm-project/vllm
vllm-project/vllm#510

Stay on topic with Classifier-Free Guidance : LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/14p6p0g/stay_on_topic_with_classifierfree_guidance/

Add Classifier-Free Guidance sampling · Issue #24536 · huggingface/transformers
huggingface/transformers#24536

tau/scrolls · Datasets at Hugging Face
https://huggingface.co/datasets/tau/scrolls/viewer/gov_report/train?row=0

Quantized LLama2 70B GPTQ 4-bit · Issue #516 · h2oai/h2ogpt
#516

Request: NTK rope support · Issue #479 · vllm-project/vllm
vllm-project/vllm#479

Add support for LLaMA-2 by zhuohan123 · Pull Request #505 · vllm-project/vllm
vllm-project/vllm#505

lmsys/longchat-13b-16k · Hugging Face
https://huggingface.co/lmsys/longchat-13b-16k

[2302.05507] Long-Context Language Decision Transformers and Exponential Tilt for Interactive Text Environments
https://arxiv.org/abs/2302.05507

LongChat/longeval at longeval · DachengLi1/LongChat
https://github.com/DachengLi1/LongChat/tree/longeval/longeval

Request: NTK rope support · Issue #479 · vllm-project/vllm
vllm-project/vllm#479

How Long Can Open-Source LLMs Truly Promise on Context Length? | LMSYS Org
https://lmsys.org/blog/2023-06-29-longchat/

RoPE scaling support? · Issue #464 · vllm-project/vllm
vllm-project/vllm#464

[2212.10947] Parallel Context Windows for Large Language Models
https://arxiv.org/abs/2212.10947

[2307.03172] Lost in the Middle: How Language Models Use Long Contexts
https://arxiv.org/abs/2307.03172

pseudotensor/LongChat: Official repository for LongChat and LongEval
https://github.com/pseudotensor/LongChat

openchat/openchat · Hugging Face
https://huggingface.co/openchat/openchat

How Long Can Open-Source LLMs Truly Promise on Context Length? | LMSYS Org
https://lmsys.org/blog/2023-06-29-longchat/#evaluation-toolkits-longeval

[2306.05685] Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
https://arxiv.org/abs/2306.05685

Training:
https://github.com/DachengLi1/LongChat#longchat-1
https://huggingface.co/togethercomputer/LLaMA-2-7B-32K

Data:
https://huggingface.co/datasets/togethercomputer/Long-Data-Collections

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@pseudotensor @arnocandel and others