Error when trying to train on a custom dataset #6

eypros · 2024-11-24T22:04:02Z

I am trying to train a model on a custom dataset which contains only 1 class basically (and the background of course).

I have modified the pipeline to include the new dataset. For some reason I cannot train the model though.

First of all how should the masks be created. I have tried using those combination:
a) grayscale image with 255 value for positive pixels and 0 for negative
b) grayscale image with 1 value for positive pixels and 0 for negative.

None of them seem to work as expected. If the training is able to be run then the IoU value is 1 from the first epoch. When I debug the issue I found that no actual mask was passed (mask seem like it's all negative values) and, thus, both Intersection and Union are 0 and only eps is present (IoU=eps/eps=1).
On the other hand there is the case where the training can't be completed (it throws an error runtimeError: CUDA error: device-side assert triggered in loss = criterion(logits, masks)).

a) Should I define 2 classes for the model or 1?

b) Additionally, the above error does not seem to be thrown if I use
mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) instead of
mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) - 1 (but the IoU issue appears then). Why are you subtracting 1 from the masks?

c) How should I create the masks for the model?

The text was updated successfully, but these errors were encountered:

eypros · 2024-11-25T10:51:28Z

Just an update, I have also tried with masks having 1 as positive and 255 as negative value but it also crushes with an error:
RuntimeError: CUDA error: device-side assert triggered

For some reason no matter what combination I have tried I fail to train my simple model.

a) I have used number of class 1 or 2.
b) I have changes the ignore_index to -100, 0 and 255 to no avail
c) I have used mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) - 1 or mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) with no difference.

Obviously, I am missing something here.

RobvanGastel · 2024-11-28T12:12:33Z

I feel this is related to setting up the mask indices correctly for cross entropy. Do you have a similar setup that worked for other segmentation models you have tried? There is nothing specific regarding output classes to finetuning the DINOv2 model.

eypros · 2024-12-02T15:06:48Z

I am not sure what do you mean by setup to be honest. I used the setup you provided for ADE20k so this should be fine I guess.

I also suspect the masks might be the problem but I don't know how to solve the issue.

RobvanGastel · 2024-12-08T14:01:52Z

It is hard to diagnose a problem like this. Could you provide a small code snippet to reproduce your problem, maybe with dummy input?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when trying to train on a custom dataset #6

Error when trying to train on a custom dataset #6

eypros commented Nov 24, 2024

eypros commented Nov 25, 2024

RobvanGastel commented Nov 28, 2024

eypros commented Dec 2, 2024

RobvanGastel commented Dec 8, 2024

Error when trying to train on a custom dataset #6

Error when trying to train on a custom dataset #6

Comments

eypros commented Nov 24, 2024

eypros commented Nov 25, 2024

RobvanGastel commented Nov 28, 2024

eypros commented Dec 2, 2024

RobvanGastel commented Dec 8, 2024