-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide Training File #56
Comments
Sorry, but this work was done during my internship at Tencent AI Lab. Since I have left the company, I can no longer access the data files stored on their servers any more. Nevertheless, I have found the training CSV files on the internet, such as:
However, I cannot guarantee the quality or authenticity of these links, as they are unofficial sources. |
Hi, as described in the paper, the sparse2dense flow prediction and controlnet are trained together in stage1 , may I ask that how many videos are used for training in that stage? And can you provide some guidelines for selecting the training videos from the original WebVid-10M? |
We trained the model for approximately 100,000 iterations using the WebVid-10M dataset, with a batch size of 8 (one per A100 GPU). This means a total of about 800,000 video clips were used for training. No specific video selection was applied; the model was trained directly on the entire WebVid-10M dataset. |
Thanks for your reply! And I would like to check if the S2D module directly adopt the CMP pre-trained weights or go through finetuning based on this weights? |
We observe no significant performance gap between the two following choices:
|
Got it! By the way, as demonstrated in Figure 2, feature interactions between the warped feature and denoising unet feature take place at the decoder part, which seems different from feature interactions at the encoder part in Figure 3. |
Sorry I may have misunderstood the model structure before, i.e. the "warp" part of the mofa-adapter illustrated in Figure 2 mainly consists of two encoders (the reference encoder and the fusion encoder)? |
Nick Work!!
Could you please provide the file '/apdcephfs/share_1290939/0_public_datasets/WebVid/metadata/metadata_2048_val.csv' in your training code? Because the WebVid dataset can not be found in the official github repository.
The text was updated successfully, but these errors were encountered: