Fine-tuning reproducibility with GPU #752

alanwilter · 2024-05-23T19:47:31Z

I'm using this repo in my app, which is in Python and, for now, made to cater our own private data.

The app facilitates the fine-tuned models creation from Vanilla SAM and MedSAM. And we also do inference, of course.

We use just these methods:

from segment_anything import SamPredictor, sam_model_registry
from segment_anything.utils.transforms import ResizeLongestSide

Our plan is to release our package for the public via GitHub, open source.

However, we have been having problems to make the fine-tuned models reproducible and the only culprit I can see here is ResizeLongestSide.

When using inference the results are deterministic.

I've updated our code to use the latest PyTorch 2.3. And I have done this in hoping to make it reproducible:

import random

import numpy as np
import torch

seed = 42

np.random.seed(seed)
random.seed(seed)
torch.cuda.empty_cache()
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(False)  # True only for CPU

However if I do torch.use_deterministic_algorithms(True) and try with GPU it does not even work, even trying the suggestions the warning message provides. Basically, there are some routines in PyTorch/Cuda that are not made deterministic yet, apparently.

By reproducibility I mean, I do the fine-tuning training and get a model. If I repeat the same procedure, I got a different model, that gives different results for inference.
If running inference with a given model, results are deterministic.

Sometimes the resulting model gives really poor results sometimes they are good or even great.

If I use torch.use_deterministic_algorithms(True) and set run with CPU only, I got reproducible training results, however it's like 100x slower, hence impractical.

I'm wondering if anyone has faced this issue.

The text was updated successfully, but these errors were encountered:

Fixes facebookresearch#752 Add deterministic resizing method to `ResizeLongestSide` class and update relevant scripts and notebooks. * Add `apply_image_deterministic` method to `ResizeLongestSide` class in `segment_anything/utils/transforms.py` to ensure deterministic resizing using `torch.nn.functional.interpolate` with `mode='nearest'`. * Update `notebooks/onnx_model_example.ipynb` to use `apply_image_deterministic` method for resizing images and add a note about the non-reproducibility issue with GPU and potential workaround using CPU. * Update `scripts/amg.py` to use `apply_image_deterministic` method for resizing images. * Add a note in `README.md` about the non-reproducibility issue with GPU and potential workaround using CPU.

HendricksJudy linked a pull request Oct 21, 2024 that will close this issue

Fix reproducibility issue with GPU #781

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning reproducibility with GPU #752

Fine-tuning reproducibility with GPU #752

alanwilter commented May 23, 2024

Fine-tuning reproducibility with GPU #752

Fine-tuning reproducibility with GPU #752

Comments

alanwilter commented May 23, 2024