You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding the understanding of the code. In the eval_forward function, I noticed that the code concatenates answer_embeds with input_embeds and then feeds the combined features into the LLM. My question is: answer_embeds appears to be the embedding representation of the correct answer, so why is the feature of the correct answer also being input into the model during the inference phase? From my understanding, the model should only receive video features and question features during inference, and should not have access to the answer information. Doesn't directly inputting the answer features into the model lead to answer leakage, thereby affecting the model's inference process?
Hello Author,
I have a question regarding the understanding of the code. In the eval_forward function, I noticed that the code concatenates answer_embeds with input_embeds and then feeds the combined features into the LLM. My question is: answer_embeds appears to be the embedding representation of the correct answer, so why is the feature of the correct answer also being input into the model during the inference phase? From my understanding, the model should only receive video features and question features during inference, and should not have access to the answer information. Doesn't directly inputting the answer features into the model lead to answer leakage, thereby affecting the model's inference process?
Here is the relevant code snippet:
/NVlabs/VILA/tree/main/llava/eval/vision_niah_vila/eval_vision_niah.py
def eval_forward(accelerator, model, input_embeds, answer_embeds, pad_id, answer_ids, tokenizer): # first append answer_embeds to input_embeds prompt_length = input_embeds.shape[1] labels_length = answer_embeds.shape[1] input_embeds = **torch.cat([input_embeds, answer_embeds], dim=1)**
The text was updated successfully, but these errors were encountered: