Skip to content

Commit

Permalink
Regenerate manual
Browse files Browse the repository at this point in the history
  • Loading branch information
jart committed Jan 3, 2024
1 parent 1f1c53f commit 6b66db4
Showing 1 changed file with 35 additions and 21 deletions.
56 changes: 35 additions & 21 deletions llama.cpp/main/main.1.asc
Original file line number Diff line number Diff line change
Expand Up @@ -330,35 +330,49 @@ OOPPTTIIOONNSS
----nnooccoommppiillee
Never compile GPU support at runtime.

If _~_/_._l_l_a_m_a_f_i_l_e_/_g_g_m_l_-_c_u_d_a_._d_l_l already exists on the file system
(or .so for UNIX and .dylib for MacOS), then it'll be linked as-
is without question. Otherwise, llllaammaaffiillee will fall back to CPU
inference.
If the appropriate DSO file already exists under _~_/_._l_l_a_m_a_f_i_l_e_/
then it'll be linked as-is without question. If a prebuilt DSO is
present in the PKZIP content of the executable, then it'll be
extracted and linked if possible. Otherwise, llllaammaaffiillee will skip
any attempt to compile GPU support and simply fall back to using
CPU inference.

----ggppuu _G_P_U
Specifies which brand of GPU should be used. Valid choices are:

-- _A_U_T_O: Use any GPU if possible, otherwise fall back to CPU
inference (default)

-- _A_M_D: Use AMD GPU. The AMD ROCm SDK must be installed and the
HIP_PATH environment variable must be defined. If an AMD GPU
could not be used for any reason, then a fatal error will be
raised.

-- _A_P_P_L_E: Use Apple Metal GPU. This is only available on MacOS
ARM64. If Metal could not be used for any reason, then a
fatal error will be raised.

-- _N_V_I_D_I_A: Use NVIDIA GPU. If an NVIDIA GPU could not be used
-- _A_M_D: Use AMD GPUs. The AMD HIP ROCm SDK should be installed
in which case we assume the HIP_PATH environment variable has
been defined. The set of gfx microarchitectures needed to run
on the host machine is determined automatically based on the
output of the hipInfo command. On Windows, llllaammaaffiillee release
binaries are distributed with a tinyBLAS DLL so it'll work
out of the box without requiring the HIP SDK to be installed.
However, tinyBLAS is slower than rocBLAS for batch and image
processing, so it's recommended that the SDK be installed
anyway. If an AMD GPU could not be used for any reason, then
a fatal error will be raised.

-- _N_V_I_D_I_A: Use NVIDIA GPUs. If an NVIDIA GPU could not be used
for any reason, a fatal error will be raised. On Windows,
NVIDIA GPU support will use our tinyBLAS library, since it
works on stock Windows installs. If both MSVC and CUDA are
installed beforehand, and llllaammaaffiillee is run for the first time
on the x64 command prompt, then llamafile will use NVIDIA's
faster cuBLAS library instead. On Linux and other systems,
the CUDA SDK must always be installed, so that native GPU
support can be compiled on the fly.
works on stock Windows installs. However, tinyBLAS goes
slower for batch and image processing. It's possible to use
NVIDIA's closed-source cuBLAS library instead. To do that,
both MSVC and CUDA need to be installed and the llllaammaaffiillee
command should be run once from the x64 MSVC command prompt
with the ----rreeccoommppiillee flag passed. The GGML library will then
be compiled and saved to _~_/_._l_l_a_m_a_f_i_l_e_/ so the special process
only needs to happen a single time.

-- _D_I_S_A_B_L_E_D: Never use GPU and instead use CPU inference. This
setting is implied by --nnggll _0.

--nnggll _N, ----nn--ggppuu--llaayyeerrss _N
Number of layers to store in VRAM.
Expand Down Expand Up @@ -596,8 +610,7 @@ EEXXAAMMPPLLEESS
weights:

llamafile \
-m wizardcoder-python-13b-v1.0.Q8_0.gguf \
--temp 0 -r '}\n' -r '```\n' \
-m wizardcoder-python-13b-v1.0.Q8_0.gguf --temp 0 -r '}\n' -r '```\n' \
-e -p '```c\nvoid *memcpy(void *dst, const void *src, size_t size) {\n'

Here's a similar example that instead utilizes Mistral-7B-Instruct
Expand Down Expand Up @@ -691,9 +704,10 @@ BBUUGGSS
and print a backtrace.

PPRROOTTIIPP
NVIDIA users need to pass the --nnggll _3_5 flag to enable GPU acceleration.
It's not enabled by default since it sometimes needs to be tuned for
system hardware and model architecture.
The --nnggll _3_5 flag needs to be passed in order to use GPUs made by NVIDIA
and AMD. It's not enabled by default since it sometimes needs to be
tuned based on the system hardware and model architecture, in order to
achieve optimal performance, and avoid compromising a shared display.

SSEEEE AALLSSOO
llamafile-quantize(1), llamafile-perplexity(1), llava-quantize(1),
Expand Down

0 comments on commit 6b66db4

Please sign in to comment.