generated from osmoscraft/osmosmemo-template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3173bf6
commit 27d5d75
Showing
4 changed files
with
350 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
--- | ||
layout: post | ||
--- | ||
# 三千预算本地 70b 大模型 - V2EX | ||
- URL: [原文](https://www.v2ex.com/t/1102193) | ||
- Added At: 2025-01-03 05:32:42 | ||
- [Link To Text](_posts/2025-01-03-三千预算本地-70b-大模型---v2ex_raw.md) | ||
|
||
## TL;DR | ||
本文总结了一台机器的硬件和软件配置,以及其在不同模型下的性能测试结果。测试结果显示,AMD Instinct MI50 的性能不如预期,尤其是在大模型推理和并发性能方面。与 NVIDIA 2080 Ti 相比,MI50 的性能较差,尤其是在小模型推理和并发性能方面。文章还指出,MI50 的显存大小和带宽不足,导致性能瓶颈。此外,ROCm 的支持不足,也导致了软件兼容性问题。 | ||
|
||
## Summary | ||
1. **硬件配置**: | ||
- **显卡**:三张 AMD Instinct MI50,单精算力 13 TFLOPS,显存 16G HBM 2,带宽 1TB/s | ||
- **CPU**:E5-2666V3,10 核心 20 线程,全核睿频 3.2 GHz | ||
- **内存**:128G DDR3 RDIMM | ||
- **主板**:精粤 X99-TI D3 PLUS | ||
- **固态硬盘**:凯侠 XG6 1TB | ||
- **电源**:玄武 500k ATX 电源 + DELL 750W 服务器电源 | ||
- **机架**:开放式机架 | ||
|
||
2. **软件配置**: | ||
- **操作系统**:PVE | ||
- **容器**:LXC | ||
- **显卡驱动**:AMDGPU | ||
- **ROCm 版本**:6.2.4 | ||
- **编译工具**:llama.cpp | ||
|
||
3. **性能测试**: | ||
- **Llama 3.3 70b**:prefill 52.73 t/s,decode 11.56 t/s | ||
- **Qwen 2.5 72b**:prefill 75.72 t/s,decode 9.85 t/s | ||
- **QwQ Preview 32b**:prefill 141.30 t/s,decode 20.65 t/s | ||
- **Dolphin Mistral Nemo 12b**:prefill 482.98 t/s,decode 35.92 t/s | ||
|
||
4. **并发性能测试**: | ||
- **Llama 3.3 70b**:B=1,prefill 74.46 t/s,decode 10.26 t/s;B=2,prefill 78.58 t/s,decode 13.07 t/s;B=4,prefill 76.52 t/s,decode 12.13 t/s | ||
- **Dolphin Mistral Nemo 12b**:B=1,prefill 493.28 t/s,decode 33.45 t/s;B=2,prefill 474.35 t/s,decode 49.12 t/s;B=4,prefill 442.91 t/s,decode 55.89 t/s;B=8,prefill 393.13 t/s,decode 47.56 t/s | ||
- **QwQ Preview 32b**:B=1,prefill 140.37 t/s,decode 17.80 t/s;B=2,prefill 142.07 t/s,decode 26.88 t/s;B=4,prefill 136.37 t/s,decode 32.62 t/s;B=8,prefill 126.70 t/s,decode 9.75 t/s | ||
|
||
5. **结论**: | ||
- MI50 的性能不如预期,尤其是在大模型推理和并发性能方面 | ||
- 2080 Ti 的性能优于 MI50,尤其是在小模型推理和并发性能方面 | ||
- MI50 的显存大小和带宽不足,导致性能瓶颈 | ||
- ROCm 的支持不足,导致软件兼容性问题 |
Oops, something went wrong.