Confusion about answer_embeds Usage in eval_forward for Inference #185

hshc123 · 2025-01-12T06:22:55Z

Hello Author,

I have a question regarding the understanding of the code. In the eval_forward function, I noticed that the code concatenates answer_embeds with input_embeds and then feeds the combined features into the LLM. My question is: answer_embeds appears to be the embedding representation of the correct answer, so why is the feature of the correct answer also being input into the model during the inference phase? From my understanding, the model should only receive video features and question features during inference, and should not have access to the answer information. Doesn't directly inputting the answer features into the model lead to answer leakage, thereby affecting the model's inference process?

Here is the relevant code snippet:

/NVlabs/VILA/tree/main/llava/eval/vision_niah_vila/eval_vision_niah.py

def eval_forward(accelerator, model, input_embeds, answer_embeds, pad_id, answer_ids, tokenizer): # first append answer_embeds to input_embeds prompt_length = input_embeds.shape[1] labels_length = answer_embeds.shape[1] input_embeds = **torch.cat([input_embeds, answer_embeds], dim=1)**

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about answer_embeds Usage in eval_forward for Inference #185

Confusion about answer_embeds Usage in eval_forward for Inference #185

hshc123 commented Jan 12, 2025

Confusion about answer_embeds Usage in eval_forward for Inference #185

Confusion about answer_embeds Usage in eval_forward for Inference #185

Comments

hshc123 commented Jan 12, 2025