You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"LT consistency is evaluated by checking the presence of the target character in
selected frames across the test set. Finally, we calculate the ratio of frames containing the target
character to the total number of selected frames as a measure of the accuracy in maintaining long-term
character consistency."
Can you please describe how you select "selected frames" and what do you use to detect the target character in selected frames? If it's possible please describe this metric in more details.
Thank you.
The text was updated successfully, but these errors were encountered:
To select the frames where a character appears, we first select an image of the main character and use FARL to obtain the character's embedding. We then compute the similarity of this embedding with FARL embeddings of all other frames and select images above a certain threshold. Although this is not exactly precise, it still provide a way to evaluate the consistency.
For short-term consistency, we select all the frames where a character appears. We then calculate the similarity of CLIP embeddings of selected consecutive keyframes, as consecutive keyframes where the same character appears usually have high similarity.
For long-term consistency, We calculate the proportion of generated frames where the character appears to the ground truth frames where the character appears. For example, if the GT frames where a character appears are 1, 2, 3, 4, 5, and the frames generated by a method where the character appears are 1, 2, 3, 6, then the long-term consistency is calculated as 3/5.
We will improve our writing and present a better paper in the revised version :)
Hi!
"LT consistency is evaluated by checking the presence of the target character in
selected frames across the test set. Finally, we calculate the ratio of frames containing the target
character to the total number of selected frames as a measure of the accuracy in maintaining long-term
character consistency."
Can you please describe how you select "selected frames" and what do you use to detect the target character in selected frames? If it's possible please describe this metric in more details.
Thank you.
The text was updated successfully, but these errors were encountered: