Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about answer_embeds Usage in eval_forward for Inference #185

Open
hshc123 opened this issue Jan 12, 2025 · 0 comments
Open

Confusion about answer_embeds Usage in eval_forward for Inference #185

hshc123 opened this issue Jan 12, 2025 · 0 comments

Comments

@hshc123
Copy link

hshc123 commented Jan 12, 2025

Hello Author,

I have a question regarding the understanding of the code. In the eval_forward function, I noticed that the code concatenates answer_embeds with input_embeds and then feeds the combined features into the LLM. My question is: answer_embeds appears to be the embedding representation of the correct answer, so why is the feature of the correct answer also being input into the model during the inference phase? From my understanding, the model should only receive video features and question features during inference, and should not have access to the answer information. Doesn't directly inputting the answer features into the model lead to answer leakage, thereby affecting the model's inference process?

Here is the relevant code snippet:

/NVlabs/VILA/tree/main/llava/eval/vision_niah_vila/eval_vision_niah.py

def eval_forward(accelerator, model, input_embeds, answer_embeds, pad_id, answer_ids, tokenizer): # first append answer_embeds to input_embeds prompt_length = input_embeds.shape[1] labels_length = answer_embeds.shape[1] input_embeds = **torch.cat([input_embeds, answer_embeds], dim=1)**

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant