planning: Cortex Hardware API #1165

dan-menlo · 2024-09-08T12:18:29Z

dan-menlo · 2024-09-08T12:28:46Z

@louis-jan I'm assigning this to you in Sprint 20, as this has a significant CLI and API design component.

Can discuss later in the week if you want/can take on the implementation of it
May be a good exercise for you to gain deep understanding of our C++ codebase

EDIT: adding @nguyenhoangthuan99 for implementation

dan-menlo · 2024-09-15T10:47:36Z

@louis-jan @nguyenhoangthuan99 I am going to move this to Sprint 21, as I think you guys should land the Model Folder and model.yaml first.

nguyenhoangthuan99 · 2024-09-17T06:55:11Z

The hardware detection serves two main purposes:

Installing the correct version of the engine.
Running a model that fits the available resources.

To achieve these goals and to make debugging easier, as well as to help users choose the appropriate model, the hardware API/CLI should provide the following information:

Operating System (OS)
Number of CPU threads
Amount of free RAM
Presence of AVX instructions (a set of CPU instructions that can accelerate certain computations)
GPU information, including:
- GPU ID
- GPU name
- GPU architecture
- GPU driver version
- CUDA driver version (for NVIDIA GPUs)
- Compute capability (for NVIDIA GPUs)
- Free VRAM (video memory)

example return body:

{
  "os": "windows",
  "arch": "amd64",
  "suitable_avx": "avx2",
  "free_memory": 8192,
  "gpu_info": [
    {
      "id": "0",
      "name": "NVIDIA GeForce RTX 3090",
      "arch": "ampere",
      "driver_version": "552.12",
      "cuda_driver_version": "12.4",
      "compute_cap": "8.6",
      "free_vram": 8192
    }
  ]
}

Note: The commenter mentions that getting the free VRAM information from C++ is challenging and requires further investigation (current approach is parse from output of nvidia-smi command).
This information would allow the system to make informed decisions about which engine version to install and which models can run efficiently on the user's hardware. It also provides valuable data for debugging purposes. cc @louis-jan for recommendation from Jan app for easier integration

louis-jan · 2024-09-17T11:01:45Z

From Jan, we expect to just have some sort of information to select a corresponding engine versions / Settings, such as CPU Instructions / GPUs

But we need to gather comprehensive hardware information for debugging, including CPU, GPU, RAM, OS, and connected monitors (as issues like projector connections have been known to impact performance).

Structure

To make it easier for user support, the hardware information should be grouped for quick lookup (users support), a mix of flattened and grouped structures can be visually overwhelming.

E.g. The supporter have to scroll to the bottom of the file to see os

❌

✅

✅✅

{
  "arch":  "", 
  "free_memory": "", 
  "gpus": [
   {},
   {},
   {}
  ], 
  "os":""
}

{
  "device": {
    "arch":  "", 
    "free_memory": "", 
    "os":""
  }
  "gpus": [
   {},
   {},
   {}
  ], 
}

{
  "cpu": {
    "arch":  "x64", 
    "cores": "4", 
    "model": "Intel Core i9 12900K",
    "instructions": [ "AVX512", "FMA", "SSE" ]
  },
  "os": {
    "version":  "10.2", 
    "name": "Windows 10 Pro"
  },
  "power": {
     "battery_life": 80,
     "charging_status": "charged",
     "is_power_saving": false
  },
  "ram": {
    "total":  "16", 
    "available": "12", 
    "type": "DDR4" // better model name?
  },
  "storage": {
      "total": 512,
      "available": 256,
      "type": "SSD" // better model name?
  },
  "gpus": [
   {},
   {},
   {}
  ], 
  "monitors": [
  ]
  }

Consistent from system to system

Different devices but the same output format, such as GPU Driver. Should not have different response body structure per GPU family.

E.g.

❌

✅

"graphics": [
   {
      "id": "0",
      "name": "NVIDIA GeForce RTX 3090",
      "driver_version": "552.12",
      "cuda_driver_version": "12.4",
      "compute_cap": "8.6",
      "free_vram": 8192
   },
  {
      "id": "1",
      "name": "AMD Radeon RX 6800 XT",
      "driver_version": "5.0.2?",
      "cuda_driver_version": "?",
      "compute_cap": "?",
      "free_vram": 8192
   },
]

"graphics": [
   {
      "id": "0",
      "name": "NVIDIA GeForce RTX 3090",
      "version": "12.4",
      "additional_information": { "driver_version": "552.12", "compute_cap": "8.6" },
      "free_vram": 8192,
      "total_vram": 8192
   },
   {
      "id": "1",
      "name": "AMD Radeon RX 6800 XT",
      "version": "6.1",
      "free_vram": 8192,
      "total_vram": 8192
      "additional_information":  { "rocm_git_revision": "0d0a7a10c1a3" },
   },
]

Try to gather anything could affect the performance?

Connected monitors?
RAM model / bus?
Power Saving mode?

Request

It would be beneficial to have filter query support, allowing clients to only poll for the data they need. E.g. ?filters=gpu,cpu

@nguyenhoangthuan99 @dan-homebrew

nguyenhoangthuan99 · 2024-10-14T10:39:14Z

Cortex Hardware Management Specifications

1. Hardware Detection

Task 1.1: Implement hardware detection function

Create a API that returns a JSON object with the following structure:

{
  "cpu": {
    "arch": "string",
    "cores": "string",
    "model": "string",
    "instructions": ["string"]
  },
  "os": {
    "version": "string",
    "name": "string"
  },
  "ram": {
    "total": "string",
    "available": "string",
    "type": "string"
  },
  "storage": {
    "total": number,
    "available": number,
    "type": "string"
  },
  "gpus": [
    {
      "model": "string",
      "vram": "string",
      "driver_version": "string"
    }
  ],
  "power": {
    "battery_life": number,
    "charging_status": "string",
    "is_power_saving": boolean
  },
  "monitors": [
    {
      "resolution": "string",
      "refresh_rate": number,
      "resolution":"string
    }
  ]
}

Add a new command to the Cortex CLI: cortex hardware list
Add a API to list hardware GET /hardware

Task 1.2: Implement platform-specific detection

Develop separate modules for Windows, Linux, and macOS to gather hardware information
Use appropriate system calls and libraries

2. Hardware Activation

Cortex is stateless, so it's mandatory to save activated hardware on DB to reuse it.

Task 2.1: Design database schema

Create a Hardware table in cortex.db with the following schema:

CREATE TABLE Hardware (
  id TEXT PRIMARY KEY,
  type TEXT NOT NULL,
  name TEXT NOT NULL,
  is_active BOOLEAN DEFAULT 0,
  properties TEXT, # properties of hardware as json dumped string
);

We can decide what hardwares should be place in database.

Task 2.2: Implement CLI for hardware activation

Add a new command to the Cortex CLI: cortex hardware activate <type> <name>
Implement logic to update the is_active field in the database

Task 2.3: Create API for hardware activation

Implement a RESTful API endpoint: POST /api/v1/hardware/activate
Accept JSON payload with type and name fields
Update database and return activation status

3. Hardware -> Engines Integration

Task 3.1: Modify engine initialization

Update engine initialization code to accept a list of active hardware
Implement a function to query active hardware from the database
If no activated hardwares use the default setting like current logic.

Task 3.2: Implement hardware passing to engines

query from database for engines using hardware -> can reuse cortex ps for running model

Task 3.3: Handle `ngl` settings

Implement logic to automatically set ngl to 0 when running on CPU-only
Create a model compatibility API to determine appropriate ngl settings for different hardware configurations

4. Hardware Usage Detection

Task 4.1: Implement RAM and VRAM detection

Create functions to detect free RAM and VRAM
Use platform-specific methods (e.g., Windows API, /proc/meminfo on Linux, sysctl on macOS)

Task 4.2: Implement GPU-specific detection

Create a module for NVIDIA GPU detection using nvidia-smi
Implement a placeholder module for AMD GPU detection using rocm-smi
Research and implement Vulkan-based VRAM detection as a fallback method

5. Hardware Fallback

Task 5.1: Implement fallback logic

Create a HardwareFallbackManager class to handle fallback scenarios
Implement logic to fall back to CPU if GPU inference fails
Implement detailed error logging for hardware-related failures
Create clear, user-friendly error messages for terminal output

Task 5.2: Implement RAM check for CPU fallback

Add a check for available RAM before falling back to CPU
Implement appropriate error handling and user notification if insufficient RAM is available

Additional Tasks

Task 6.1: Documentation

Create comprehensive documentation for the new hardware management features
Include examples and use cases in the documentation

Task 6.2: Testing

Develop unit tests for each new function and class
Create integration tests to ensure proper interaction between hardware detection, activation, and engine initialization

dan-menlo · 2024-10-15T07:06:24Z

@nguyenhoangthuan99 This is well drafted. I have a few clarifications

1. Hardware Detection

I agree with this direction
Out of curiosity, why are we detecting monitors? (how is this relevant?)

2. Hardware Activation

We should specify that Cortex by default activates all hardware compute units, as part of setup/installation.

Will we need to define Hardware as a individual compute unit (e.g. compute + RAM?)

CPUs (+ RAM)
GPUs (+ VRAM)

There are some open questions on my side:

Can the CPU + RAM ever be deactivated?

3. Hardware -> Engines Integration

We may need to specify the logic if there are multiple active GPUs

e.g. which GPU has priority (e.g. Nvidia preferred, biggest capacity)
We should pick a smart default, as it will greatly improve user experience

dan-menlo · 2024-10-15T07:07:57Z

@nguyenhoangthuan99 There are a couple of out-of-scope but related issues that I'd like to also put down ideas for:

`cortex run` and `cortex model start` can take in hardware param?

We should enable users to specify which GPU to use when running a model
We likely need to use the index from cortex hardware list
We need to think through this more clearly - I think my idea is incorrect

cortex run gorilla --hardware 1,2,3

Model Compatibility

planning: Cortex Model Compatibility API #1108
We should start thinking of how we leverage Engines + Hardware API to compute model compatibility
This would prevent 50% of our bug reports

nguyenhoangthuan99 · 2024-10-15T09:36:40Z

1. Hardware Detection

We need to gather comprehensive hardware information for debugging, including CPU, GPU, RAM, OS, and connected monitors (as issues like projector connections have been known to impact performance).

As Louis mentioned earlier, when using GPUs to render high-resolution content, high-frequency monitors can also affect GPU performance. Monitors are an additional hardware component that could impact performance, so we should notify users about this.

2. Hardware Activation

In my opinion, we should only allow activation of GPUs (or NPUs in the future when Cortex supports them). CPUs and RAM are always used to save and process buffers, even if we offload everything to the GPU.

3. Hardware -> Engines Integration

This part should fall back to the Model Compatibility API. Choosing which hardware to run depends mostly on the model, not just the engines. The best scenario, I think, would be to have a threshold for available VRAM (e.g., 3GB, 4GB, etc., depending on which model to run). Among activated hardware, we should choose the ones that have enough available VRAM. If more than one GPU is suitable, prefer NVIDIA GPUs.

Can cortex run and cortex model start take in hardware params?

I believe we have to implement this and provide an API for it. This is an important feature that makes Cortex more dynamic, eliminating the need to manually set environment variables. This API should, like hardware activation, only allow running with GPU (and NPU). However, it will require users to set hardware of the same type in a list (e.g., only NVIDIA GPUs).

Model Compatibility

Model compatibility will depend on activated hardware. Here are some ideas:

If users don't specify which hardware to use, we can choose hardware by:

Ensuring the model can offload at least 30% of NGLs (Num GPU Layers)
Running the model on the most powerful hardware: NVIDIA > AMD > (NPU) > CPU. If criterion 1 cannot be satisfied, fall back to the next available option. CPU will be the last option.

If users specify hardware:
Calculate NGLs and context length to fit that hardware and return recommendations.

dan-menlo · 2024-10-31T14:15:42Z

Marking as complete. Implementation epic: #1568

dan-menlo added this to Jan & Cortex Sep 8, 2024

dan-menlo converted this from a draft issue Sep 8, 2024

dan-menlo assigned louis-jan Sep 8, 2024

dan-menlo changed the title ~~epic: Cortex Hardware Selection & Error Handling~~ epic: Cortex Active Hardware Selection & Error Handling Sep 8, 2024

dan-menlo added the needs pm Needs product level decisions label Sep 8, 2024

dan-menlo moved this to Scheduled in Jan & Cortex Sep 8, 2024

dan-menlo changed the title ~~epic: Cortex Active Hardware Selection & Error Handling~~ epic: Cortex Hardware Selection API Sep 8, 2024

This was referenced Sep 8, 2024

bug: can enable GPU acceleration with cuda not installed - model fails to start janhq/jan#3762

Closed

bug: "Recommended" labels don't autoupdate when user toggles CPU and GPU modes #1142

Closed

dan-menlo assigned nguyenhoangthuan99 Sep 8, 2024

dan-menlo changed the title ~~epic: Cortex Hardware Selection API~~ epic: Cortex Hardware Selection and Model Compatibility API Sep 8, 2024

dan-menlo changed the title ~~epic: Cortex Hardware Selection and Model Compatibility API~~ epic: Cortex Hardware Selection API Sep 8, 2024

freelerobot added the category: hardware management Related to hardware & compute label Sep 9, 2024

dan-menlo mentioned this issue Sep 8, 2024

planning: Cortex Model Compatibility API #1108

Open

4 tasks

dan-menlo changed the title ~~epic: Cortex Hardware Selection API~~ epic: Cortex Hardware API Sep 17, 2024

louis-jan mentioned this issue Sep 17, 2024

Fix: #1142 setting groups toggle does not turn off it's nested settings janhq/jan#3681

Merged

3 tasks

dan-menlo mentioned this issue Sep 29, 2024

bug: BUG on newest jan v5.0, it set the default to gpu instead of cpu janhq/jan#3012

Closed

dan-menlo moved this from In Progress to Scheduled in Jan & Cortex Sep 29, 2024

louis-jan mentioned this issue Sep 30, 2024

planning: Jan's path to cortex.cpp? janhq/jan#3690

Closed

3 tasks

dan-menlo assigned freelerobot and unassigned louis-jan Oct 13, 2024

dan-menlo moved this from Scheduled to Investigating in Jan & Cortex Oct 13, 2024

dan-menlo changed the title ~~epic: Cortex Hardware API~~ eng spec: Cortex Hardware API Oct 13, 2024

This was referenced Oct 13, 2024

epic: Cortex can detect track free and used VRAM and RAM #1191

Closed

feat: Automatic fallback to CPU in case of GPU loading failure #805

Closed

freelerobot mentioned this issue Oct 13, 2024

bug: some models failed to load if many GPU are selected #1458

Open

3 tasks

freelerobot removed their assignment Oct 14, 2024

dan-menlo modified the milestones: v1.0.3, v1.0.2 Oct 14, 2024

dan-menlo changed the title ~~eng spec: Cortex Hardware API~~ planning: Cortex Hardware API Oct 21, 2024

vansangpfiev assigned vansangpfiev and unassigned nguyenhoangthuan99 Oct 23, 2024

This was referenced Oct 25, 2024

planning: Cortex.cpp features needed to fully support Jan #1555

Closed

roadmap: Jan refactors /messages, /threads to Cortex janhq/jan#3895

Closed

dan-menlo mentioned this issue Oct 29, 2024

bug: insufficient handling of insufficient memory #1457

Open

3 tasks

vansangpfiev mentioned this issue Oct 29, 2024

epic: Implement Cortex Hardware API for Nvidia #1568

Closed

18 tasks

dan-menlo mentioned this issue Oct 30, 2024

roadmap: Jan has Hardware Controls and System Monitor and Prioritization janhq/jan#3908

Open

10 tasks

dan-menlo closed this as completed Oct 31, 2024

github-project-automation bot moved this from In Progress to Review + QA in Jan & Cortex Oct 31, 2024

gabrielle-ong moved this from Review + QA to Completed in Jan & Cortex Nov 1, 2024

gabrielle-ong removed this from the v1.0.2 milestone Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planning: Cortex Hardware API #1165

planning: Cortex Hardware API #1165

dan-menlo commented Sep 8, 2024 •

edited

Loading

dan-menlo commented Sep 8, 2024 •

edited

Loading

dan-menlo commented Sep 15, 2024

nguyenhoangthuan99 commented Sep 17, 2024 •

edited

Loading

louis-jan commented Sep 17, 2024 •

edited

Loading

nguyenhoangthuan99 commented Oct 14, 2024 •

edited

Loading

dan-menlo commented Oct 15, 2024

dan-menlo commented Oct 15, 2024 •

edited

Loading

nguyenhoangthuan99 commented Oct 15, 2024 •

edited

Loading

dan-menlo commented Oct 31, 2024

planning: Cortex Hardware API #1165

planning: Cortex Hardware API #1165

Comments

dan-menlo commented Sep 8, 2024 • edited Loading

Goal

Key Functionality

Tasklist

Previous Issues

Appendix

UX Goal

dan-menlo commented Sep 8, 2024 • edited Loading

dan-menlo commented Sep 15, 2024

nguyenhoangthuan99 commented Sep 17, 2024 • edited Loading

louis-jan commented Sep 17, 2024 • edited Loading

Structure

Consistent from system to system

Try to gather anything could affect the performance?

Request

nguyenhoangthuan99 commented Oct 14, 2024 • edited Loading

Cortex Hardware Management Specifications

1. Hardware Detection

Task 1.1: Implement hardware detection function

Task 1.2: Implement platform-specific detection

2. Hardware Activation

Task 2.1: Design database schema

Task 2.2: Implement CLI for hardware activation

Task 2.3: Create API for hardware activation

3. Hardware -> Engines Integration

Task 3.1: Modify engine initialization

Task 3.2: Implement hardware passing to engines

Task 3.3: Handle ngl settings

4. Hardware Usage Detection

Task 4.1: Implement RAM and VRAM detection

Task 4.2: Implement GPU-specific detection

5. Hardware Fallback

Task 5.1: Implement fallback logic

Task 5.2: Implement RAM check for CPU fallback

Additional Tasks

Task 6.1: Documentation

Task 6.2: Testing

dan-menlo commented Oct 15, 2024

1. Hardware Detection

2. Hardware Activation

3. Hardware -> Engines Integration

dan-menlo commented Oct 15, 2024 • edited Loading

cortex run and cortex model start can take in hardware param?

Model Compatibility

nguyenhoangthuan99 commented Oct 15, 2024 • edited Loading

1. Hardware Detection

2. Hardware Activation

3. Hardware -> Engines Integration

Can cortex run and cortex model start take in hardware params?

Model Compatibility

dan-menlo commented Oct 31, 2024

dan-menlo commented Sep 8, 2024 •

edited

Loading

dan-menlo commented Sep 8, 2024 •

edited

Loading

nguyenhoangthuan99 commented Sep 17, 2024 •

edited

Loading

louis-jan commented Sep 17, 2024 •

edited

Loading

nguyenhoangthuan99 commented Oct 14, 2024 •

edited

Loading

Task 3.3: Handle `ngl` settings

dan-menlo commented Oct 15, 2024 •

edited

Loading

`cortex run` and `cortex model start` can take in hardware param?

nguyenhoangthuan99 commented Oct 15, 2024 •

edited

Loading