[train_rlhf_llama] When using vllm，Time-consuming optimization is not obvious as expectation. #197

tkqie · 2024-12-31T03:52:59Z

When running the train_rlhf_llama.sh script for LLaMA2-7B, the performance gain from using vLLM as the inference backend is not significant compared to Megatron (approximately a 20% improvement). According to some other reports, the expected improvement should be around 200%. How can it be promoted?

adoda · 2025-01-07T09:02:35Z

Our tests show an improvement of about 2 times. You can compare this with the configuration and performance data provided in our documentation to see if there are any differences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train_rlhf_llama] When using vllm，Time-consuming optimization is not obvious as expectation. #197

[train_rlhf_llama] When using vllm，Time-consuming optimization is not obvious as expectation. #197

tkqie commented Dec 31, 2024

adoda commented Jan 7, 2025

[train_rlhf_llama] When using vllm，Time-consuming optimization is not obvious as expectation. #197

[train_rlhf_llama] When using vllm，Time-consuming optimization is not obvious as expectation. #197

Comments

tkqie commented Dec 31, 2024

adoda commented Jan 7, 2025