Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Pedro Cuenca <[email protected]>
  • Loading branch information
dacorvo and pcuenca committed Nov 6, 2023
1 parent 2710add commit c2b62dd
Showing 1 changed file with 3 additions and 4 deletions.
7 changes: 3 additions & 4 deletions inferentia-llama2.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Fortunately, 🤗 `optimum-neuron` offers a [very simple API](https://huggingfac
**input_shapes)
```

This deserves a little explaination:
This deserves a little explanation:
- using `compiler_args`, we specify on how many cores we want the model to be deployed (each neuron device has two cores), and with which precision (here `float16`),
- using `input_shape`, we set the static input and output dimensions of the model. All model compilers require static shapes, and neuron makes no exception. Note that the
`sequence_length` not only constrains the length of the input context, but also the length of the KV cache, and thus, the output length.
Expand All @@ -55,7 +55,7 @@ Even better, you can push it to the [Hugging Face hub](https://huggingface.co/mo
```
>>> model.push_to_hub(
"a_local_path_for_compiled_neuron_model",
repository_id="my-neuron-repo",
repository_id="Llama-2-7b-hf-neuron-latency",
use_auth_token=True)
```

Expand Down Expand Up @@ -143,8 +143,7 @@ To evaluate the models, we generate tokens up to a total sequence length of 1024

### Encoding time

The encoding time is the time in seconds required to process the input tokens and generate the first output token,

The encoding time is the time in seconds required to process the input tokens and generate the first output token.
It is a very important metric, as it corresponds to the latency directly perceived by the user when streaming generated tokens.

We test the encoding time for increasing context sizes, 256 input tokens corresponding roughly to a typical Q/A usage,
Expand Down

0 comments on commit c2b62dd

Please sign in to comment.