Provide Training File #56

ALEX13679173326 · 2024-11-20T07:44:01Z

Nick Work!!
Could you please provide the file '/apdcephfs/share_1290939/0_public_datasets/WebVid/metadata/metadata_2048_val.csv' in your training code? Because the WebVid dataset can not be found in the official github repository.

MyNiuuu · 2024-11-20T09:07:07Z

Sorry, but this work was done during my internship at Tencent AI Lab. Since I have left the company, I can no longer access the data files stored on their servers any more.

Nevertheless, I have found the training CSV files on the internet, such as:

However, I cannot guarantee the quality or authenticity of these links, as they are unofficial sources.

tyrink · 2024-12-09T07:52:21Z

Sorry, but this work was done during my internship at Tencent AI Lab. Since I have left the company, I can no longer access the data files stored on their servers any more.

Nevertheless, I have found the training CSV files on the internet, such as:

https://anon-datasets.s3.amazonaws.com/results_10M_train.csv

https://webvid.datasette.io/webvid/videos

However, I cannot guarantee the quality or authenticity of these links, as they are unofficial sources.

Hi, as described in the paper, the sparse2dense flow prediction and controlnet are trained together in stage1 , may I ask that how many videos are used for training in that stage? And can you provide some guidelines for selecting the training videos from the original WebVid-10M?

MyNiuuu · 2024-12-09T08:05:28Z

We trained the model for approximately 100,000 iterations using the WebVid-10M dataset, with a batch size of 8 (one per A100 GPU). This means a total of about 800,000 video clips were used for training. No specific video selection was applied; the model was trained directly on the entire WebVid-10M dataset.

tyrink · 2024-12-10T09:06:14Z

We trained the model for approximately 100,000 iterations using the WebVid-10M dataset, with a batch size of 8 (one per A100 GPU). This means a total of about 800,000 video clips were used for training. No specific video selection was applied; the model was trained directly on the entire WebVid-10M dataset.

Thanks for your reply! And I would like to check if the S2D module directly adopt the CMP pre-trained weights or go through finetuning based on this weights?

MyNiuuu · 2024-12-10T09:16:58Z

We trained the model for approximately 100,000 iterations using the WebVid-10M dataset, with a batch size of 8 (one per A100 GPU). This means a total of about 800,000 video clips were used for training. No specific video selection was applied; the model was trained directly on the entire WebVid-10M dataset.

Thanks for your reply! And I would like to check if the S2D module directly adopt the CMP pre-trained weights or go through finetuning based on this weights?

We observe no significant performance gap between the two following choices:

initialize S2D with CMP weight and finetune the S2D
directly using CMP pre-trained weights

tyrink · 2024-12-10T09:43:08Z

We trained the model for approximately 100,000 iterations using the WebVid-10M dataset, with a batch size of 8 (one per A100 GPU). This means a total of about 800,000 video clips were used for training. No specific video selection was applied; the model was trained directly on the entire WebVid-10M dataset.

Thanks for your reply! And I would like to check if the S2D module directly adopt the CMP pre-trained weights or go through finetuning based on this weights?

We observe no significant performance gap between the two following choices:

initialize S2D with CMP weight and finetune the S2D

directly using CMP pre-trained weights

Got it! By the way, as demonstrated in Figure 2, feature interactions between the warped feature and denoising unet feature take place at the decoder part, which seems different from feature interactions at the encoder part in Figure 3.

MyNiuuu · 2024-12-11T07:27:01Z

different

The Encoder in Figure 3 is a part of the Controlnet it self, which is called 'Fusion Encoder', which is illustrated in Figure 3 and the text. The arrow from Controlnet to SVD Decoder depicted in Figure 2 correspond to the greyscale font 'To SVD Encoders' part in Figure 3

tyrink · 2024-12-11T07:37:34Z

different

The Encoder in Figure 3 is a part of the Controlnet it self, which is called 'Fusion Encoder', which is illustrated in Figure 3 and the text. The arrow from Controlnet to SVD Decoder depicted in Figure 2 correspond to the greyscale font 'To SVD Encoders' part in Figure 3

Sorry I may have misunderstood the model structure before, i.e. the "warp" part of the mofa-adapter illustrated in Figure 2 mainly consists of two encoders (the reference encoder and the fusion encoder)?

MyNiuuu · 2024-12-11T07:41:01Z

different

The Encoder in Figure 3 is a part of the Controlnet it self, which is called 'Fusion Encoder', which is illustrated in Figure 3 and the text. The arrow from Controlnet to SVD Decoder depicted in Figure 2 correspond to the greyscale font 'To SVD Encoders' part in Figure 3

Sorry I may have misunderstood the model structure before, i.e. the "warp" part of the mofa-adapter illustrated in Figure 2 mainly consists of two encoders (the reference encoder and the fusion encoder)?

Yes, the "warp" part of the mofa-adapter illustrated in Figure 2 consists of two encoders, the reference encoder and the fusion encoder.

tyrink · 2024-12-11T07:43:16Z

different

The Encoder in Figure 3 is a part of the Controlnet it self, which is called 'Fusion Encoder', which is illustrated in Figure 3 and the text. The arrow from Controlnet to SVD Decoder depicted in Figure 2 correspond to the greyscale font 'To SVD Encoders' part in Figure 3

Sorry I may have misunderstood the model structure before, i.e. the "warp" part of the mofa-adapter illustrated in Figure 2 mainly consists of two encoders (the reference encoder and the fusion encoder)?

Yes, the "warp" part of the mofa-adapter illustrated in Figure 2 consists of two encoders, the reference encoder and the fusion encoder.

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide Training File #56

Provide Training File #56

ALEX13679173326 commented Nov 20, 2024

MyNiuuu commented Nov 20, 2024

tyrink commented Dec 9, 2024

MyNiuuu commented Dec 9, 2024

tyrink commented Dec 10, 2024

MyNiuuu commented Dec 10, 2024

tyrink commented Dec 10, 2024

MyNiuuu commented Dec 11, 2024

tyrink commented Dec 11, 2024

MyNiuuu commented Dec 11, 2024

tyrink commented Dec 11, 2024

Provide Training File #56

Provide Training File #56

Comments

ALEX13679173326 commented Nov 20, 2024

MyNiuuu commented Nov 20, 2024

tyrink commented Dec 9, 2024

MyNiuuu commented Dec 9, 2024

tyrink commented Dec 10, 2024

MyNiuuu commented Dec 10, 2024

tyrink commented Dec 10, 2024

MyNiuuu commented Dec 11, 2024

tyrink commented Dec 11, 2024

MyNiuuu commented Dec 11, 2024

tyrink commented Dec 11, 2024