-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planning: Cortex Hardware API #1165
Comments
@louis-jan I'm assigning this to you in Sprint 20, as this has a significant CLI and API design component.
EDIT: adding @nguyenhoangthuan99 for implementation |
@louis-jan @nguyenhoangthuan99 I am going to move this to Sprint 21, as I think you guys should land the Model Folder and |
The hardware detection serves two main purposes:
To achieve these goals and to make debugging easier, as well as to help users choose the appropriate model, the hardware API/CLI should provide the following information:
example return body:
Note: The commenter mentions that getting the free VRAM information from C++ is challenging and requires further investigation (current approach is parse from output of |
From Jan, we expect to just have some sort of information to select a corresponding engine versions / Settings, such as CPU Instructions / GPUs But we need to gather comprehensive hardware information for debugging, including CPU, GPU, RAM, OS, and connected monitors (as issues like projector connections have been known to impact performance). StructureTo make it easier for user support, the hardware information should be grouped for quick lookup (users support), a mix of flattened and grouped structures can be visually overwhelming. E.g. The supporter have to scroll to the bottom of the file to see
Consistent from system to systemDifferent devices but the same output format, such as GPU Driver. Should not have different response body structure per GPU family. E.g.
Try to gather anything could affect the performance?
RequestIt would be beneficial to have filter query support, allowing clients to only poll for the data they need. E.g. @nguyenhoangthuan99 @dan-homebrew |
Cortex Hardware Management Specifications1. Hardware DetectionTask 1.1: Implement hardware detection function
Task 1.2: Implement platform-specific detection
2. Hardware ActivationCortex is stateless, so it's mandatory to save activated hardware on DB to reuse it. Task 2.1: Design database schema
We can decide what hardwares should be place in database. Task 2.2: Implement CLI for hardware activation
Task 2.3: Create API for hardware activation
3. Hardware -> Engines IntegrationTask 3.1: Modify engine initialization
Task 3.2: Implement hardware passing to enginesTask 3.3: Handle
|
@nguyenhoangthuan99 This is well drafted. I have a few clarifications 1. Hardware Detection
2. Hardware ActivationWe should specify that Cortex by default activates all hardware compute units, as part of setup/installation. Will we need to define
There are some open questions on my side:
3. Hardware -> Engines IntegrationWe may need to specify the logic if there are multiple active GPUs
|
@nguyenhoangthuan99 There are a couple of out-of-scope but related issues that I'd like to also put down ideas for:
|
1. Hardware DetectionWe need to gather comprehensive hardware information for debugging, including CPU, GPU, RAM, OS, and connected monitors (as issues like projector connections have been known to impact performance). As Louis mentioned earlier, when using GPUs to render high-resolution content, high-frequency monitors can also affect GPU performance. Monitors are an additional hardware component that could impact performance, so we should notify users about this. 2. Hardware ActivationIn my opinion, we should only allow activation of GPUs (or NPUs in the future when Cortex supports them). CPUs and RAM are always used to save and process buffers, even if we offload everything to the GPU. 3. Hardware -> Engines IntegrationThis part should fall back to the Model Compatibility API. Choosing which hardware to run depends mostly on the model, not just the engines. The best scenario, I think, would be to have a threshold for available VRAM (e.g., 3GB, 4GB, etc., depending on which model to run). Among activated hardware, we should choose the ones that have enough available VRAM. If more than one GPU is suitable, prefer NVIDIA GPUs. Can cortex run and cortex model start take in hardware params?I believe we have to implement this and provide an API for it. This is an important feature that makes Cortex more dynamic, eliminating the need to manually set environment variables. This API should, like hardware activation, only allow running with GPU (and NPU). However, it will require users to set hardware of the same type in a list (e.g., only NVIDIA GPUs). Model CompatibilityModel compatibility will depend on activated hardware. Here are some ideas: If users don't specify which hardware to use, we can choose hardware by:
If users specify hardware: |
Marking as complete. Implementation epic: #1568 |
Goal
Key Functionality
ngl
settings?Tasklist
GET /hardware
)cortex hardware list
?)Functionality
Cortex & Jan Integration
Previous Issues
Appendix
UX Goal
Cortex.cpp's Hardware API should enable us to do this in Jan
The text was updated successfully, but these errors were encountered: