Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: some models failed to load if many GPU are selected #1458

Open
1 of 3 tasks
Tracked by #3908
thonore75 opened this issue Sep 15, 2024 · 11 comments
Open
1 of 3 tasks
Tracked by #3908

bug: some models failed to load if many GPU are selected #1458

thonore75 opened this issue Sep 15, 2024 · 11 comments
Assignees
Labels
category: model running Inference ux, handling context/parameters, runtime needs info Needs more logs, steps to help reproduce os: Windows type: bug Something isn't working

Comments

@thonore75
Copy link

thonore75 commented Sep 15, 2024

Will be fixed by


Original Bug report:

Jan version

0.5.3

Describe the Bug

I imported many models and for some of them, they are failing to load if I selected my both graphic cards (RTX 3060 12Go).
If I unselect one of them, the model is loaded.

It will be great if the models list could indicate if the models are supporting multi-GPU

Steps to Reproduce

  1. Go to Settings -> Advanced Settings
  2. In Choose device(s), select 2 GPUs
  3. Go in "My Models"
  4. Select "Meta-Llama-3.1-8B-Instruct-128k-Q4_0" and start it -> NOT loaded !!!
  5. Go in Advanced Settings
  6. Unselect one GPU from "Choose device(s)"
  7. Go in "My Models"
  8. Select "Meta-Llama-3.1-8B-Instruct-128k-Q4_0" and start it -> loaded !!!

Screenshots / Logs

No response

What is your OS?

  • MacOS
  • Windows
  • Linux
@thonore75 thonore75 added the type: bug Something isn't working label Sep 15, 2024
@imtuyethan imtuyethan self-assigned this Sep 18, 2024
@imtuyethan
Copy link
Contributor

imtuyethan commented Sep 19, 2024

Tested on

114 (windows-dev-tensorRT-LLM)
OS: Windows 11 Pro (Version 23H2, build 22631.4037)
CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores)
RAM: 32 GB
GPU 1: NVIDIA GeForce RTX 3090
GPU 2: NVIDIA GeForce RTX 3090
Storage: 599 GB local disk (C:)

Results

  • I was able to run Mistral 8x7B Instruct Q4 (~24GB) with 2 GPUs turned on.
Screen.Recording.2024-09-19.at.4.20.40.PM.mov
  • I was able to run Aya 23 35B Q4 (~20GB) when using 2 GPUs as well:
Screen.Recording.2024-09-19.at.5.07.32.PM.mov
Screen.Recording.2024-09-19.at.4.23.42.PM.mov

Here's my app logs:

Screenshot 2024-09-19 at 4 29 43 PM

@imtuyethan imtuyethan assigned louis-jan and unassigned imtuyethan Sep 19, 2024
@imtuyethan imtuyethan added the category: model running Inference ux, handling context/parameters, runtime label Sep 19, 2024
@imtuyethan imtuyethan added the needs info Needs more logs, steps to help reproduce label Sep 19, 2024
@imtuyethan
Copy link
Contributor

Quick check @thonore75 what are the models that cannot be run on your end?

@thonore75
Copy link
Author

thonore75 commented Sep 19, 2024

Here are the models I can launch with 1 GPU but not with 2 :

  • CodeLlama-13b-Instruct-hf.Q8_0
  • CodeLlama-70b-Instruct-hf.i1-IQ4_XS
  • gpt4all-13b-snoozy-q4_0
  • gpt4all-falcon-newbpe-q4_0
  • Meta-Llama-3.1-8B-Claude-F16
  • Meta-Llama-3.1-8B-Instruct-128k-Q4_0
  • Meta-Llama-3.1-8B-Instruct.Q4_0
  • mistral-7b-openorca2.Q4_0.gguf
  • Nous-Hermes-2-Mistral-7B-DPO.Q4_0
  • orca-2-7b.Q4_0
  • Phi-3-mini-4k-instruct.Q4_0

app.log

@thonore75
Copy link
Author

thonore75 commented Sep 19, 2024

After my tests, I tried to play a video you posted here (On Google Chrome), no way, no possible to play.
Jan was launched but with no model loaded, my last test was a model failing to load.
I stopped Jan and I was able to play your videos

@thonore75
Copy link
Author

Jan Compatibility.xlsx
app - 1_GPU_1.log
app - 2_GPUs.log
app - CPU.log
app - 1_GPU_0.log

I did some extra tests!
For each tested configuration, the log was cleaned to have separate logs.
4 tested configurations:

  • CPU
  • GPU 0 selected
  • GPU 1 selected
  • GPU 0 & 1 selected

For some models, it's was failing sometime after loading issue with previous tested model, but after loading a correct model, the failing model is loading.

@imtuyethan
Copy link
Contributor

Possibly related to janhq/jan#3558

@louis-jan
Copy link
Contributor

louis-jan commented Sep 25, 2024

  1. Regarding the failed case of CodeLlama-70b-Instruct-hf.i1-IQ4_XS: It says there was a VRAM-related OOM issue (it's a big one, that makes sense)

2024-09-19T10:26:30.142Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17827.31 MiB on device 0: cudaMalloc failed: out of memory
2024-09-19T10:26:30.272Z [CORTEX]::Error: llama_model_load: error loading model: unable to allocate backend buffer
llama_load_model_from_file: failed to load model

  1. The same for Llama-3.1-8B-Instruct, which is weird.

2024-09-19T10:13:40.646Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 5056.03 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 5301633024
llama_new_context_with_model: failed to allocate compute buffers
2024-09-19T10:13:40.729Z [CORTEX]::Error: llama_init_from_gpt_params: error: failed to create context with model '*****Meta-Llama-3.1-8B-Instruct.Q4_0.gguf'

@louis-jan
Copy link
Contributor

If you have sometime, please investigate @vansangpfiev.

@thonore75
Copy link
Author

If need, I can perform some extra tests with more and new models

@freelerobot freelerobot transferred this issue from janhq/jan Oct 13, 2024
@freelerobot
Copy link
Contributor

related #1165

@freelerobot freelerobot moved this from Investigating to Planning in Jan & Cortex Oct 15, 2024
@freelerobot freelerobot moved this from Planning to Investigating in Jan & Cortex Oct 15, 2024
@freelerobot freelerobot moved this from Investigating to Planning in Jan & Cortex Oct 15, 2024
@gabrielle-ong
Copy link
Contributor

Related issues
#1391
#1679

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: model running Inference ux, handling context/parameters, runtime needs info Needs more logs, steps to help reproduce os: Windows type: bug Something isn't working
Projects
Status: Planning
Development

No branches or pull requests

6 participants