You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, and thank you for open sourcing such an amazing work!
I wanted to see what output should I expect and if I'm doing potentially something wrong. The quality of videos generated using 5B autoregressive video2world model is always a little worse than it's input. I wanted to see if I can get sort of "infinite generation" using output of generation as input to the next iteration in a loop.
As a result, after 10-20 generation I start getting complete gibberish. I thought that running diffusion decoder should be preventing this effect. Am I misusing this model? Here is how I'm generating videos:
My own update: to large degree (but not 100%), responsible for that fact is that sending output videos as input to the next iteration goes through encoding/decoding process everytime.
We could bypass this by only passing tokens (not videos) between iterations and send some new parameter to control length of generation. It looks like that was the idea of parameter "num_chunks_to_generate", but in current implementation it's rather useless.
I do have an implementation of just passing tokens, is this something that would be useful for others?
My own update: to large degree (but not 100%), responsible for that fact is that sending output videos as input to the next iteration goes through encoding/decoding process everytime. We could bypass this by only passing tokens (not videos) between iterations and send some new parameter to control length of generation. It looks like that was the idea of parameter "num_chunks_to_generate", but in current implementation it's rather useless. I do have an implementation of just passing tokens, is this something that would be useful for others?
@zlenyk I would be interested to see your implementation, sounds useful!
Hello, and thank you for open sourcing such an amazing work!
I wanted to see what output should I expect and if I'm doing potentially something wrong. The quality of videos generated using 5B autoregressive video2world model is always a little worse than it's input. I wanted to see if I can get sort of "infinite generation" using output of generation as input to the next iteration in a loop.
As a result, after 10-20 generation I start getting complete gibberish. I thought that running diffusion decoder should be preventing this effect. Am I misusing this model? Here is how I'm generating videos:
The text was updated successfully, but these errors were encountered: