Skip to content

Commit

Permalink
transformers==4.37, yi & yuan2 & vicuna (intel#11805)
Browse files Browse the repository at this point in the history
* transformers==4.37

* added yi model

* added yi model

* xxxx

* delete prompt template

* / and delete
  • Loading branch information
JinheTang authored Aug 15, 2024
1 parent f43da2d commit 2fbbb51
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 25 deletions.
12 changes: 6 additions & 6 deletions python/llm/example/GPU/HuggingFace/LLM/vicuna/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Vicuna
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Vicuna models. For illustration purposes, we utilize the [lmsys/vicuna-13b-v1.3](https://huggingface.co/lmsys/vicuna-13b-v1.3) and [eachadea/vicuna-7b-1.1](https://huggingface.co/eachadea/vicuna-7b-1.1) as reference Vicuna models.
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Vicuna models. For illustration purposes, we utilize the [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5) and [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) as reference Vicuna models.

## 0. Requirements
To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
Expand Down Expand Up @@ -109,7 +109,7 @@ python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROM
```

Arguments info:
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the Vicuna model (e.g. `lmsys/vicuna-13b-v1.3` and `eachadea/vicuna-7b-1.1`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'lmsys/vicuna-13b-v1.3'`.
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the Vicuna model (e.g. `lmsys/vicuna-13b-v1.5` and `eachadea/vicuna-7b-v1.5`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'lmsys/vicuna-13b-v1.5'`.
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.

Expand All @@ -118,7 +118,7 @@ Arguments info:
> Please select the appropriate size of the Vicuna model based on the capabilities of your machine.
#### Sample Output
#### [lmsys/vicuna-13b-v1.3](https://huggingface.co/lmsys/vicuna-13b-v1.3)
#### [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5)
```log
Inference time: xxxx s
-------------------- Prompt --------------------
Expand All @@ -130,10 +130,10 @@ What is AI?
### Human:
What is AI?
### Assistant:
AI, or Artificial Intelligence, refers to the development of computer systems that can perform tasks that typically require human intelligence, such as visual perception,
AI stands for Artificial Intelligence. It refers to the development of computer systems that can perform tasks that typically require human intelligence, such as visual perception
```

#### [eachadea/vicuna-7b-1.1](https://huggingface.co/eachadea/vicuna-7b-1.1)
#### [eachadea/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)
```log
Inference time: xxxx s
-------------------- Prompt --------------------
Expand All @@ -145,5 +145,5 @@ What is AI?
### Human:
What is AI?
### Assistant:
AI, or artificial intelligence, refers to the ability of a machine or computer program to mimic human intelligence and perform tasks that would normally require human intelligence to
AI stands for "Artificial Intelligence." It refers to the development of computer systems that can perform tasks that typically require human intelligence, such as visual per
```
6 changes: 3 additions & 3 deletions python/llm/example/GPU/HuggingFace/LLM/vicuna/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@

if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for Vicuna model')
parser.add_argument('--repo-id-or-model-path', type=str, default="lmsys/vicuna-13b-v1.3",
help='The huggingface repo id for the Vicuna (e.g. `lmsys/vicuna-13b-v1.3` and `eachadea/vicuna-7b-1.1`) to be downloaded'
parser.add_argument('--repo-id-or-model-path', type=str, default="lmsys/vicuna-13b-v1.5",
help='The huggingface repo id for the Vicuna (e.g. `lmsys/vicuna-13b-v1.5` and `lmsys/vicuna-7b-v1.5`) to be downloaded'
', or the path to the huggingface checkpoint folder')
parser.add_argument('--prompt', type=str, default="What is AI?",
help='Prompt to infer')
Expand Down Expand Up @@ -57,7 +57,7 @@
# enabling `use_cache=True` allows the model to utilize the previous
# key/values attentions to speed up decoding;
# to obtain optimal performance with IPEX-LLM INT4 optimizations,
# it is important to set use_cache=True for vicuna-v1.3 models
# it is important to set use_cache=True for vicuna-v1.5 models
output = model.generate(input_ids,
use_cache=True,
max_new_tokens=args.n_predict)
Expand Down
20 changes: 15 additions & 5 deletions python/llm/example/GPU/HuggingFace/LLM/yi/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Yi
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Yi models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Yi models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) and [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-1.5-6B-Chat) as reference Yi models.

## 0. Requirements
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
Expand Down Expand Up @@ -112,7 +112,7 @@ python ./generate.py

In the example, several arguments can be passed to satisfy your requirements:

- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the Yi model (e.g. `01-ai/Yi-6B`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'01-ai/Yi-6B'`.
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the Yi model (e.g. `01-ai/Yi-6B` and `01-ai/Yi-6B-Chat`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'01-ai/Yi-6B-Chat'`.
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么?'`.
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.

Expand All @@ -122,8 +122,18 @@ In the example, several arguments can be passed to satisfy your requirements:
```log
Inference time: xxxx s
-------------------- Prompt --------------------
AI是什么?
What is AI?
-------------------- Output --------------------
AI是什么?
人工智能(Artificial Intelligence),英文缩写为AI。它是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及
What is AI?
Artificial Intelligence (AI) is the simulation of human intelligence in machines. AI is the science and engineering of making intelligent machines, especially intelligent computer programs.
```

#### [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)
```log
Inference time: xxxx s
-------------------- Prompt --------------------
What is AI?
-------------------- Output --------------------
What is AI?
Artificial Intelligence (AI) refers to the simulation of human intelligence processes by machines, especially computer systems. These processes include learning, reasoning, and self-
```
14 changes: 3 additions & 11 deletions python/llm/example/GPU/HuggingFace/LLM/yi/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,21 +21,13 @@
from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

# Refer to https://huggingface.co/01-ai/Yi-6B-Chat#31-use-the-chat-model
YI_PROMPT_FORMAT = """
<|im_start|>system
You are a helpful assistant. If you don't understand what the user means, ask the user to provide more information.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
"""

if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for Yi model')
parser.add_argument('--repo-id-or-model-path', type=str, default="01-ai/Yi-6B",
parser.add_argument('--repo-id-or-model-path', type=str, default="01-ai/Yi-6B-Chat",
help='The huggingface repo id for the Yi model to be downloaded'
', or the path to the huggingface checkpoint folder')
parser.add_argument('--prompt', type=str, default="AI是什么?",
parser.add_argument('--prompt', type=str, default="What is AI?",
help='Prompt to infer')
parser.add_argument('--n-predict', type=int, default=32,
help='Max tokens to predict')
Expand All @@ -60,7 +52,7 @@

# Generate predicted tokens
with torch.inference_mode():
prompt = YI_PROMPT_FORMAT.format(prompt=args.prompt)
prompt = args.prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu')
# ipex_llm model needs a warmup, then inference time can be accurate
output = model.generate(input_ids,
Expand Down

0 comments on commit 2fbbb51

Please sign in to comment.