Regressing video quality on generated videos #54

zlenyk · 2025-01-15T12:27:36Z

Hello, and thank you for open sourcing such an amazing work!
I wanted to see what output should I expect and if I'm doing potentially something wrong. The quality of videos generated using 5B autoregressive video2world model is always a little worse than it's input. I wanted to see if I can get sort of "infinite generation" using output of generation as input to the next iteration in a loop.
As a result, after 10-20 generation I start getting complete gibberish. I thought that running diffusion decoder should be preventing this effect. Am I misusing this model? Here is how I'm generating videos:

python cosmos1/models/autoregressive/inference/video2world.py   
--ar_model_dir=Cosmos-1.0-Autoregressive-5B-Video2World     
--top_p=0.7     
--temperature=1.0     
--offload_guardrail_models     
--offload_diffusion_decoder     
--offload_ar_model     
--offload_tokenizer     
--offload_text_encoder_model 
...

The text was updated successfully, but these errors were encountered:

zlenyk · 2025-01-17T10:53:27Z

My own update: to large degree (but not 100%), responsible for that fact is that sending output videos as input to the next iteration goes through encoding/decoding process everytime.
We could bypass this by only passing tokens (not videos) between iterations and send some new parameter to control length of generation. It looks like that was the idea of parameter "num_chunks_to_generate", but in current implementation it's rather useless.
I do have an implementation of just passing tokens, is this something that would be useful for others?

monko9j1 · 2025-01-18T09:06:23Z

My own update: to large degree (but not 100%), responsible for that fact is that sending output videos as input to the next iteration goes through encoding/decoding process everytime. We could bypass this by only passing tokens (not videos) between iterations and send some new parameter to control length of generation. It looks like that was the idea of parameter "num_chunks_to_generate", but in current implementation it's rather useless. I do have an implementation of just passing tokens, is this something that would be useful for others?

@zlenyk I would be interested to see your implementation, sounds useful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regressing video quality on generated videos #54

Regressing video quality on generated videos #54

zlenyk commented Jan 15, 2025

zlenyk commented Jan 17, 2025

monko9j1 commented Jan 18, 2025

Regressing video quality on generated videos #54

Regressing video quality on generated videos #54

Comments

zlenyk commented Jan 15, 2025

zlenyk commented Jan 17, 2025

monko9j1 commented Jan 18, 2025