bug: some models failed to load if many GPU are selected #1458

thonore75 · 2024-09-15T17:35:05Z

Will be fixed by

Original Bug report:

Jan version

0.5.3

Describe the Bug

I imported many models and for some of them, they are failing to load if I selected my both graphic cards (RTX 3060 12Go).
If I unselect one of them, the model is loaded.

It will be great if the models list could indicate if the models are supporting multi-GPU

Steps to Reproduce

Go to Settings -> Advanced Settings
In Choose device(s), select 2 GPUs
Go in "My Models"
Select "Meta-Llama-3.1-8B-Instruct-128k-Q4_0" and start it -> NOT loaded !!!
Go in Advanced Settings
Unselect one GPU from "Choose device(s)"
Go in "My Models"
Select "Meta-Llama-3.1-8B-Instruct-128k-Q4_0" and start it -> loaded !!!

Screenshots / Logs

No response

What is your OS?

MacOS
Windows
Linux

imtuyethan · 2024-09-19T09:30:06Z

Tested on

114 (windows-dev-tensorRT-LLM)
OS: Windows 11 Pro (Version 23H2, build 22631.4037)
CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores)
RAM: 32 GB
GPU 1: NVIDIA GeForce RTX 3090
GPU 2: NVIDIA GeForce RTX 3090
Storage: 599 GB local disk (C:)

Results

I was able to run Mistral 8x7B Instruct Q4 (~24GB) with 2 GPUs turned on.

Screen.Recording.2024-09-19.at.4.20.40.PM.mov

I was able to run Aya 23 35B Q4 (~20GB) when using 2 GPUs as well:

Screen.Recording.2024-09-19.at.5.07.32.PM.mov

However, for Deepseek Coder 33B Instruct (~18GB), I cannot run the model, even when turned on GPU acceleration or not, so it could be a separate issue, reported here: bug: Can't run model Deepseek Coder 33B Instruct - Check out what's wrong with VM 114 jan#3703

Screen.Recording.2024-09-19.at.4.23.42.PM.mov

Here's my app logs:

imtuyethan · 2024-09-19T09:39:00Z

Quick check @thonore75 what are the models that cannot be run on your end?

thonore75 · 2024-09-19T10:22:24Z

Here are the models I can launch with 1 GPU but not with 2 :

CodeLlama-13b-Instruct-hf.Q8_0
CodeLlama-70b-Instruct-hf.i1-IQ4_XS
gpt4all-13b-snoozy-q4_0
gpt4all-falcon-newbpe-q4_0
Meta-Llama-3.1-8B-Claude-F16
Meta-Llama-3.1-8B-Instruct-128k-Q4_0
Meta-Llama-3.1-8B-Instruct.Q4_0
mistral-7b-openorca2.Q4_0.gguf
Nous-Hermes-2-Mistral-7B-DPO.Q4_0
orca-2-7b.Q4_0
Phi-3-mini-4k-instruct.Q4_0

app.log

thonore75 · 2024-09-19T10:31:00Z

After my tests, I tried to play a video you posted here (On Google Chrome), no way, no possible to play.
Jan was launched but with no model loaded, my last test was a model failing to load.
I stopped Jan and I was able to play your videos

thonore75 · 2024-09-19T12:18:34Z

Jan Compatibility.xlsx
app - 1_GPU_1.log
app - 2_GPUs.log
app - CPU.log
app - 1_GPU_0.log

I did some extra tests!
For each tested configuration, the log was cleaned to have separate logs.
4 tested configurations:

CPU
GPU 0 selected
GPU 1 selected
GPU 0 & 1 selected

For some models, it's was failing sometime after loading issue with previous tested model, but after loading a correct model, the failing model is loading.

imtuyethan · 2024-09-23T09:39:16Z

Possibly related to janhq/jan#3558

louis-jan · 2024-09-25T05:46:48Z

Regarding the failed case of CodeLlama-70b-Instruct-hf.i1-IQ4_XS: It says there was a VRAM-related OOM issue (it's a big one, that makes sense)

2024-09-19T10:26:30.142Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17827.31 MiB on device 0: cudaMalloc failed: out of memory
2024-09-19T10:26:30.272Z [CORTEX]::Error: llama_model_load: error loading model: unable to allocate backend buffer
llama_load_model_from_file: failed to load model

The same for Llama-3.1-8B-Instruct, which is weird.

2024-09-19T10:13:40.646Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 5056.03 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 5301633024
llama_new_context_with_model: failed to allocate compute buffers
2024-09-19T10:13:40.729Z [CORTEX]::Error: llama_init_from_gpt_params: error: failed to create context with model '*****Meta-Llama-3.1-8B-Instruct.Q4_0.gguf'

louis-jan · 2024-09-25T05:47:30Z

If you have sometime, please investigate @vansangpfiev.

thonore75 · 2024-10-11T12:44:06Z

If need, I can perform some extra tests with more and new models

freelerobot · 2024-10-13T12:14:19Z

related #1165

gabrielle-ong · 2024-11-28T08:55:39Z

Related issues
#1391
#1679

thonore75 added the type: bug Something isn't working label Sep 15, 2024

imtuyethan self-assigned this Sep 18, 2024

imtuyethan added the os: Windows label Sep 18, 2024

imtuyethan assigned louis-jan and unassigned imtuyethan Sep 19, 2024

imtuyethan added the category: model running Inference ux, handling context/parameters, runtime label Sep 19, 2024

imtuyethan mentioned this issue Sep 19, 2024

bug: Can't run model Deepseek Coder 33B Instruct - Check out what's wrong with VM 114 janhq/jan#3703

Closed

3 tasks

imtuyethan added the needs info Needs more logs, steps to help reproduce label Sep 19, 2024

louis-jan assigned vansangpfiev Sep 25, 2024

freelerobot transferred this issue from janhq/jan Oct 13, 2024

freelerobot moved this from Investigating to Planning in Jan & Cortex Oct 15, 2024

freelerobot moved this from Planning to Investigating in Jan & Cortex Oct 15, 2024

freelerobot moved this from Investigating to Planning in Jan & Cortex Oct 15, 2024

dan-menlo unassigned louis-jan Oct 31, 2024

gabrielle-ong mentioned this issue Nov 6, 2024

epic: Implement Cortex Hardware API for Nvidia #1568

Closed

18 tasks

This was referenced Nov 27, 2024

planning: prioritize GPUs with CUDA_VISIBLE_DEVICES #1679

Closed

Sprint 26 Planning #1735

Closed

gabrielle-ong mentioned this issue Nov 29, 2024

roadmap: Jan has Hardware Controls and System Monitor and Prioritization janhq/jan#3908

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: some models failed to load if many GPU are selected #1458

bug: some models failed to load if many GPU are selected #1458

thonore75 commented Sep 15, 2024 •

edited by gabrielle-ong

Loading

imtuyethan commented Sep 19, 2024 •

edited

Loading

imtuyethan commented Sep 19, 2024

thonore75 commented Sep 19, 2024 •

edited

Loading

thonore75 commented Sep 19, 2024 •

edited

Loading

thonore75 commented Sep 19, 2024

imtuyethan commented Sep 23, 2024

louis-jan commented Sep 25, 2024 •

edited

Loading

louis-jan commented Sep 25, 2024

thonore75 commented Oct 11, 2024

freelerobot commented Oct 13, 2024

gabrielle-ong commented Nov 28, 2024

bug: some models failed to load if many GPU are selected #1458

bug: some models failed to load if many GPU are selected #1458

Comments

thonore75 commented Sep 15, 2024 • edited by gabrielle-ong Loading

Will be fixed by

Original Bug report:

Jan version

Describe the Bug

Steps to Reproduce

Screenshots / Logs

What is your OS?

imtuyethan commented Sep 19, 2024 • edited Loading

Tested on

Results

imtuyethan commented Sep 19, 2024

thonore75 commented Sep 19, 2024 • edited Loading

thonore75 commented Sep 19, 2024 • edited Loading

thonore75 commented Sep 19, 2024

imtuyethan commented Sep 23, 2024

louis-jan commented Sep 25, 2024 • edited Loading

louis-jan commented Sep 25, 2024

thonore75 commented Oct 11, 2024

freelerobot commented Oct 13, 2024

gabrielle-ong commented Nov 28, 2024

thonore75 commented Sep 15, 2024 •

edited by gabrielle-ong

Loading

imtuyethan commented Sep 19, 2024 •

edited

Loading

thonore75 commented Sep 19, 2024 •

edited

Loading

thonore75 commented Sep 19, 2024 •

edited

Loading

louis-jan commented Sep 25, 2024 •

edited

Loading