Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning reproducibility with GPU #752

Open
alanwilter opened this issue May 23, 2024 · 0 comments · May be fixed by #781
Open

Fine-tuning reproducibility with GPU #752

alanwilter opened this issue May 23, 2024 · 0 comments · May be fixed by #781

Comments

@alanwilter
Copy link

I'm using this repo in my app, which is in Python and, for now, made to cater our own private data.

The app facilitates the fine-tuned models creation from Vanilla SAM and MedSAM. And we also do inference, of course.

We use just these methods:

from segment_anything import SamPredictor, sam_model_registry
from segment_anything.utils.transforms import ResizeLongestSide

Our plan is to release our package for the public via GitHub, open source.

However, we have been having problems to make the fine-tuned models reproducible and the only culprit I can see here is ResizeLongestSide.

When using inference the results are deterministic.

I've updated our code to use the latest PyTorch 2.3. And I have done this in hoping to make it reproducible:

import random

import numpy as np
import torch

seed = 42

np.random.seed(seed)
random.seed(seed)
torch.cuda.empty_cache()
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(False)  # True only for CPU

However if I do torch.use_deterministic_algorithms(True) and try with GPU it does not even work, even trying the suggestions the warning message provides. Basically, there are some routines in PyTorch/Cuda that are not made deterministic yet, apparently.

By reproducibility I mean, I do the fine-tuning training and get a model. If I repeat the same procedure, I got a different model, that gives different results for inference.
If running inference with a given model, results are deterministic.

Sometimes the resulting model gives really poor results sometimes they are good or even great.

If I use torch.use_deterministic_algorithms(True) and set run with CPU only, I got reproducible training results, however it's like 100x slower, hence impractical.

I'm wondering if anyone has faced this issue.

HendricksJudy added a commit to HendricksJudy/segment-anything that referenced this issue Oct 21, 2024
Fixes facebookresearch#752

Add deterministic resizing method to `ResizeLongestSide` class and update relevant scripts and notebooks.

* Add `apply_image_deterministic` method to `ResizeLongestSide` class in `segment_anything/utils/transforms.py` to ensure deterministic resizing using `torch.nn.functional.interpolate` with `mode='nearest'`.
* Update `notebooks/onnx_model_example.ipynb` to use `apply_image_deterministic` method for resizing images and add a note about the non-reproducibility issue with GPU and potential workaround using CPU.
* Update `scripts/amg.py` to use `apply_image_deterministic` method for resizing images.
* Add a note in `README.md` about the non-reproducibility issue with GPU and potential workaround using CPU.
@HendricksJudy HendricksJudy linked a pull request Oct 21, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant