Add new summaries

yiGmMk · Jan 3, 2025 · 27d5d75 · 27d5d75
1 parent 3173bf6
commit 27d5d75
Show file tree

Hide file tree

Showing 4 changed files with 350 additions and 0 deletions.
diff --git a/_posts/202501/2025-01-03-三千预算本地-70b-大模型---v2ex.md b/_posts/202501/2025-01-03-三千预算本地-70b-大模型---v2ex.md
@@ -0,0 +1,44 @@
+---
+layout: post
+---
+# 三千预算本地 70b 大模型 - V2EX
+- URL: [原文](https://www.v2ex.com/t/1102193)
+- Added At: 2025-01-03 05:32:42
+- [Link To Text](_posts/2025-01-03-三千预算本地-70b-大模型---v2ex_raw.md)
+
+## TL;DR
+本文总结了一台机器的硬件和软件配置，以及其在不同模型下的性能测试结果。测试结果显示，AMD Instinct MI50 的性能不如预期，尤其是在大模型推理和并发性能方面。与 NVIDIA 2080 Ti 相比，MI50 的性能较差，尤其是在小模型推理和并发性能方面。文章还指出，MI50 的显存大小和带宽不足，导致性能瓶颈。此外，ROCm 的支持不足，也导致了软件兼容性问题。
+
+## Summary
+1. **硬件配置**：
+   - **显卡**：三张 AMD Instinct MI50，单精算力 13 TFLOPS，显存 16G HBM 2，带宽 1TB/s
+   - **CPU**：E5-2666V3，10 核心 20 线程，全核睿频 3.2 GHz
+   - **内存**：128G DDR3 RDIMM
+   - **主板**：精粤 X99-TI D3 PLUS
+   - **固态硬盘**：凯侠 XG6 1TB
+   - **电源**：玄武 500k ATX 电源 + DELL 750W 服务器电源
+   - **机架**：开放式机架
+
+2. **软件配置**：
+   - **操作系统**：PVE
+   - **容器**：LXC
+   - **显卡驱动**：AMDGPU
+   - **ROCm 版本**：6.2.4
+   - **编译工具**：llama.cpp
+
+3. **性能测试**：
+   - **Llama 3.3 70b**：prefill 52.73 t/s，decode 11.56 t/s
+   - **Qwen 2.5 72b**：prefill 75.72 t/s，decode 9.85 t/s
+   - **QwQ Preview 32b**：prefill 141.30 t/s，decode 20.65 t/s
+   - **Dolphin Mistral Nemo 12b**：prefill 482.98 t/s，decode 35.92 t/s
+
+4. **并发性能测试**：
+   - **Llama 3.3 70b**：B=1，prefill 74.46 t/s，decode 10.26 t/s；B=2，prefill 78.58 t/s，decode 13.07 t/s；B=4，prefill 76.52 t/s，decode 12.13 t/s
+   - **Dolphin Mistral Nemo 12b**：B=1，prefill 493.28 t/s，decode 33.45 t/s；B=2，prefill 474.35 t/s，decode 49.12 t/s；B=4，prefill 442.91 t/s，decode 55.89 t/s；B=8，prefill 393.13 t/s，decode 47.56 t/s
+   - **QwQ Preview 32b**：B=1，prefill 140.37 t/s，decode 17.80 t/s；B=2，prefill 142.07 t/s，decode 26.88 t/s；B=4，prefill 136.37 t/s，decode 32.62 t/s；B=8，prefill 126.70 t/s，decode 9.75 t/s
+
+5. **结论**：
+   - MI50 的性能不如预期，尤其是在大模型推理和并发性能方面
+   - 2080 Ti 的性能优于 MI50，尤其是在小模型推理和并发性能方面
+   - MI50 的显存大小和带宽不足，导致性能瓶颈
+   - ROCm 的支持不足，导致软件兼容性问题