Regenerate manual

Mozilla-Ocho · Jan 3, 2024 · 6b66db4 · 6b66db4
1 parent 1f1c53f
commit 6b66db4
Showing 1 changed file with 35 additions and 21 deletions.
diff --git a/llama.cpp/main/main.1.asc b/llama.cpp/main/main.1.asc
@@ -330,35 +330,49 @@ OOPPTTIIOONNSS
      ----nnooccoommppiillee
              Never compile GPU support at runtime.
 
-             If _~_/_._l_l_a_m_a_f_i_l_e_/_g_g_m_l_-_c_u_d_a_._d_l_l already exists on the file system
-             (or .so for UNIX and .dylib for MacOS), then it'll be linked as-
-             is without question. Otherwise, llllaammaaffiillee will fall back to CPU
-             inference.
+             If the appropriate DSO file already exists under _~_/_._l_l_a_m_a_f_i_l_e_/
+             then it'll be linked as-is without question. If a prebuilt DSO is
+             present in the PKZIP content of the executable, then it'll be
+             extracted and linked if possible. Otherwise, llllaammaaffiillee will skip
+             any attempt to compile GPU support and simply fall back to using
+             CPU inference.
 
      ----ggppuu _G_P_U
              Specifies which brand of GPU should be used. Valid choices are:
 
              --   _A_U_T_O: Use any GPU if possible, otherwise fall back to CPU
                  inference (default)
 
-             --   _A_M_D: Use AMD GPU. The AMD ROCm SDK must be installed and the
-                 HIP_PATH environment variable must be defined. If an AMD GPU
-                 could not be used for any reason, then a fatal error will be
-                 raised.
-
              --   _A_P_P_L_E: Use Apple Metal GPU. This is only available on MacOS
                  ARM64. If Metal could not be used for any reason, then a
                  fatal error will be raised.
 
-             --   _N_V_I_D_I_A: Use NVIDIA GPU. If an NVIDIA GPU could not be used
+             --   _A_M_D: Use AMD GPUs. The AMD HIP ROCm SDK should be installed
+                 in which case we assume the HIP_PATH environment variable has
+                 been defined. The set of gfx microarchitectures needed to run
+                 on the host machine is determined automatically based on the
+                 output of the hipInfo command. On Windows, llllaammaaffiillee release
+                 binaries are distributed with a tinyBLAS DLL so it'll work
+                 out of the box without requiring the HIP SDK to be installed.
+                 However, tinyBLAS is slower than rocBLAS for batch and image
+                 processing, so it's recommended that the SDK be installed
+                 anyway. If an AMD GPU could not be used for any reason, then
+                 a fatal error will be raised.
+
+             --   _N_V_I_D_I_A: Use NVIDIA GPUs. If an NVIDIA GPU could not be used
                  for any reason, a fatal error will be raised. On Windows,
                  NVIDIA GPU support will use our tinyBLAS library, since it
-                 works on stock Windows installs. If both MSVC and CUDA are
-                 installed beforehand, and llllaammaaffiillee is run for the first time
-                 on the x64 command prompt, then llamafile will use NVIDIA's
-                 faster cuBLAS library instead. On Linux and other systems,
-                 the CUDA SDK must always be installed, so that native GPU
-                 support can be compiled on the fly.
+                 works on stock Windows installs. However, tinyBLAS goes
+                 slower for batch and image processing. It's possible to use
+                 NVIDIA's closed-source cuBLAS library instead. To do that,
+                 both MSVC and CUDA need to be installed and the llllaammaaffiillee
+                 command should be run once from the x64 MSVC command prompt
+                 with the ----rreeccoommppiillee flag passed. The GGML library will then
+                 be compiled and saved to _~_/_._l_l_a_m_a_f_i_l_e_/ so the special process
+                 only needs to happen a single time.
+
+             --   _D_I_S_A_B_L_E_D: Never use GPU and instead use CPU inference. This
+                 setting is implied by --nnggll _0.
 
      --nnggll _N, ----nn--ggppuu--llaayyeerrss _N
              Number of layers to store in VRAM.
@@ -596,8 +610,7 @@ EEXXAAMMPPLLEESS
      weights:
 
            llamafile \
-             -m wizardcoder-python-13b-v1.0.Q8_0.gguf \
-             --temp 0 -r '}\n' -r '```\n' \
+             -m wizardcoder-python-13b-v1.0.Q8_0.gguf --temp 0 -r '}\n' -r '```\n' \
              -e -p '```c\nvoid *memcpy(void *dst, const void *src, size_t size) {\n'
 
      Here's a similar example that instead utilizes Mistral-7B-Instruct
@@ -691,9 +704,10 @@ BBUUGGSS
      and print a backtrace.
 
 PPRROOTTIIPP
-     NVIDIA users need to pass the --nnggll _3_5 flag to enable GPU acceleration.
-     It's not enabled by default since it sometimes needs to be tuned for
-     system hardware and model architecture.
+     The --nnggll _3_5 flag needs to be passed in order to use GPUs made by NVIDIA
+     and AMD.  It's not enabled by default since it sometimes needs to be
+     tuned based on the system hardware and model architecture, in order to
+     achieve optimal performance, and avoid compromising a shared display.
 
 SSEEEE AALLSSOO
      llamafile-quantize(1), llamafile-perplexity(1), llava-quantize(1),