Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when trying to train on a custom dataset #6

Open
eypros opened this issue Nov 24, 2024 · 4 comments
Open

Error when trying to train on a custom dataset #6

eypros opened this issue Nov 24, 2024 · 4 comments

Comments

@eypros
Copy link

eypros commented Nov 24, 2024

I am trying to train a model on a custom dataset which contains only 1 class basically (and the background of course).

I have modified the pipeline to include the new dataset. For some reason I cannot train the model though.

First of all how should the masks be created. I have tried using those combination:
a) grayscale image with 255 value for positive pixels and 0 for negative
b) grayscale image with 1 value for positive pixels and 0 for negative.

None of them seem to work as expected. If the training is able to be run then the IoU value is 1 from the first epoch. When I debug the issue I found that no actual mask was passed (mask seem like it's all negative values) and, thus, both Intersection and Union are 0 and only eps is present (IoU=eps/eps=1).
On the other hand there is the case where the training can't be completed (it throws an error runtimeError: CUDA error: device-side assert triggered in loss = criterion(logits, masks)).

a) Should I define 2 classes for the model or 1?

b) Additionally, the above error does not seem to be thrown if I use
mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) instead of
mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) - 1 (but the IoU issue appears then). Why are you subtracting 1 from the masks?

c) How should I create the masks for the model?

@eypros
Copy link
Author

eypros commented Nov 25, 2024

Just an update, I have also tried with masks having 1 as positive and 255 as negative value but it also crushes with an error:
RuntimeError: CUDA error: device-side assert triggered

For some reason no matter what combination I have tried I fail to train my simple model.

a) I have used number of class 1 or 2.
b) I have changes the ignore_index to -100, 0 and 255 to no avail
c) I have used mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) - 1 or mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) with no difference.

Obviously, I am missing something here.

@RobvanGastel
Copy link
Owner

I feel this is related to setting up the mask indices correctly for cross entropy. Do you have a similar setup that worked for other segmentation models you have tried? There is nothing specific regarding output classes to finetuning the DINOv2 model.

@eypros
Copy link
Author

eypros commented Dec 2, 2024

I am not sure what do you mean by setup to be honest. I used the setup you provided for ADE20k so this should be fine I guess.

I also suspect the masks might be the problem but I don't know how to solve the issue.

@RobvanGastel
Copy link
Owner

It is hard to diagnose a problem like this. Could you provide a small code snippet to reproduce your problem, maybe with dummy input?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants