About camera_param embedding and "loss is NaN" #127

Bowa529 · 2024-12-14T09:39:18Z

Hello, I am using my own dataset to replace the nuscenes dataset for experiments. When I input camera params as conditions into the model, after training for thousands of steps, I always encounter the problem of "loss is NaN". I noticed that after UNet, my model_pred has many Inf values. May I ask if the embedding of camera params will have any impact? When there are differences in the focal length and center coordinates of my camera parameters, including rotation, translation, and the parameters of the Nuscenes dataset.

Bowa529 · 2024-12-14T09:56:58Z

I also print the input, but the input has no inf; and I only put text prompt and bev map into model is correct. But when I also input my camera param, it has problems.

flymin · 2024-12-14T13:37:42Z

The generalization ability of camera params is limited, which is a known issue as presented in our work, MagicDrive3D. However, we have tried some parameters different from the nuScene and did not observe NaN or Inf (anyway, the results are not satisfactory).

NaN in training can be due to many reasons. You may refer to some previous issues for solutions.

Besides, if you think camera pose embedding is the key reason. In our latest work, we implemented the "base_token" + "zero_proj" module to mitigate such issues on any token in the sequence embeddings (we do not include this part in the paper as they may not be useful when training from scratch). Please check
https://github.com/flymin/MagicDriveDiT/blob/d537ecfbf7d83af4518b6509c8b99c4c467c8264/magicdrivedit/models/magicdrive/magicdrive_stdit3.py#L999

Bowa529 · 2024-12-17T02:56:41Z

When handling two Field of View (FOV) perspectives, I tried training with camera parameters for both views separately. One perspective trains normally, but the other still encounters the "loss is NaN" issue. If there are significant differences in camera intrinsics like focal length, do you think the camera parameter encoding process needs to be modified? Or do you have any other suggestions? Thanks for your help.

flymin · 2024-12-17T03:37:03Z

Please consider this, zero init should help to stable the training process.

Besides, if you think camera pose embedding is the key reason. In our latest work, we implemented the "base_token" + "zero_proj" module to mitigate such issues on any token in the sequence embeddings (we do not include this part in the paper as they may not be useful when training from scratch). Please check
flymin/MagicDriveDiT@d537ecf/magicdrivedit/models/magicdrive/magicdrive_stdit3.py#L999

Bowa529 · 2024-12-17T06:28:24Z

Sure, thank you. I will try the methods you suggested.

Bowa529 · 2024-12-17T08:17:26Z

I'm sorry that I still have a question: Why do you concatenate the original input with the results after sine and cosine encoding to form camera_emb in your camera parameter encoding process? Thanks for your help.

flymin · 2024-12-17T08:38:51Z

I think you are talking about the Fourier Embedding, which we borrowed from NeRF. You can find the citation (Mildenhall et al., 2020) in our paper.

github-actions · 2024-12-24T16:45:04Z

This issue is stale because it has been open for 7 days with no activity. If you do not have any follow-ups, the issue will be closed soon.

3buffers · 2024-12-29T10:48:14Z

I'm sorry that I still have a question: Why do you concatenate the original input with the results after sine and cosine encoding to form camera_emb in your camera parameter encoding process? Thanks for your help.

did u solve that? thanks

3buffers · 2024-12-31T02:25:35Z

I'm sorry that I still have a question: Why do you concatenate the original input with the results after sine and cosine encoding to form camera_emb in your camera parameter encoding process? Thanks for your help.

did u solve that? thanks

acturally i think is due to data type reason, for instance if you use your dataset to replace nuscenes, you should focus on your bboxes.dtype, if you still use float16 which will be make overflow when doing fourier_embedder specificlly in doing embed_fns will make float overflow to make you box embed to be inf or nan, thus you should make bigger dtype like float32 or 64 i think

github-actions · 2025-01-07T16:55:38Z

This issue is stale because it has been open for 7 days with no activity. If you do not have any follow-ups, the issue will be closed soon.

Bowa529 closed this as completed Dec 17, 2024

Bowa529 reopened this Dec 17, 2024

github-actions bot added the stale label Dec 24, 2024

github-actions bot removed the stale label Dec 30, 2024

github-actions bot added the stale label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About camera_param embedding and "loss is NaN" #127

About camera_param embedding and "loss is NaN" #127

Bowa529 commented Dec 14, 2024

Bowa529 commented Dec 14, 2024

flymin commented Dec 14, 2024

Bowa529 commented Dec 17, 2024

flymin commented Dec 17, 2024

Bowa529 commented Dec 17, 2024

Bowa529 commented Dec 17, 2024

flymin commented Dec 17, 2024

github-actions bot commented Dec 24, 2024

3buffers commented Dec 29, 2024

3buffers commented Dec 31, 2024

github-actions bot commented Jan 7, 2025

About camera_param embedding and "loss is NaN" #127

About camera_param embedding and "loss is NaN" #127

Comments

Bowa529 commented Dec 14, 2024

Bowa529 commented Dec 14, 2024

flymin commented Dec 14, 2024

Bowa529 commented Dec 17, 2024

flymin commented Dec 17, 2024

Bowa529 commented Dec 17, 2024

Bowa529 commented Dec 17, 2024

flymin commented Dec 17, 2024

github-actions bot commented Dec 24, 2024

3buffers commented Dec 29, 2024

3buffers commented Dec 31, 2024

github-actions bot commented Jan 7, 2025