Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练显存 #32

Open
RongPiKing opened this issue Dec 24, 2024 · 6 comments
Open

训练显存 #32

RongPiKing opened this issue Dec 24, 2024 · 6 comments

Comments

@RongPiKing
Copy link

请问训练的显存需要多少呢?
我在单张H100 80G上也会爆显存,使用了CogVideoX-5b-I2V为初始权重,加入is_train_lora参数也会爆显存

@SHYuanBest
Copy link
Member

1、DeepSpeed Zero2模式,全参微调,单卡80G可能无法运行(可以尝试打开low_vram,vae.enable_tiling参数)
2、DeepSpeed Zero2模式,全参微调,双卡80G可以正常运行,甚至不需要打开low_vram,vae.enable_tiling参数(因为Zero2会将部分显存均摊到每张卡上)
3、换成lora微调,印象中只需要50G显存

@RongPiKing
Copy link
Author

好的,谢谢您。那请问batchsize设置成1大概要train多少个step呢

@SHYuanBest
Copy link
Member

好的,谢谢您。那请问batchsize设置成1大概要train多少个step呢

感谢关注,这个得具体看实验结果来判断了,我还没尝试过bs1训练。

@RongPiKing
Copy link
Author

我看您论文中是batchsize为80,1.8k个step,那我train 大概80×1.8k个step可能会有成效吗

@SHYuanBest
Copy link
Member

这里面有两个变量,我也不是很好判断:

  1. 论文中的数据数量和质量都要比目前我们开源出来的数据要多且好(小bs时,按照这个角度不需要80x1.8k step)
  2. 大bs的梯度方向会比小bs的好(小bs时,按照这个角度需要多于80x1.8k step)

@RongPiKing
Copy link
Author

明白,谢谢您的回答

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants