Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/tmp/pytorch/c10/cuda/CUDACachingAllocator.cpp":1154, please report a bug to PyTorch. #32

Open
AnilSarode opened this issue Apr 3, 2024 · 1 comment

Comments

@AnilSarode
Copy link

t-tech@ubuntu:~/nanoowl/examples$ python3 tree_predict.py \
    --prompt="[an owl [a wing, an eye]]" \
    --threshold=0.15 \
    --image_encoder_engine=../data/owl_image_encoder_patch32.engine
/home/t-tech/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/t-tech/.local/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/home/t-tech/.local/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /tmp/pytorch/aten/src/ATen/native/TensorShape.cpp:3526.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Traceback (most recent call last):
  File "/home/t-tech/nanoowl/examples/tree_predict.py", line 51, in <module>
    output = predictor.predict(
  File "/home/t-tech/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/t-tech/nanoowl/nanoowl/tree_predictor.py", line 121, in predict
    owl_image_encodings[label_index] = self.owl_predictor.encode_rois(image_tensor, boxes[label_index])
  File "/home/t-tech/nanoowl/nanoowl/owl_predictor.py", line 267, in encode_rois
    roi_images, rois = self.extract_rois(image, rois, pad_square, padding_scale)
  File "/home/t-tech/nanoowl/nanoowl/owl_predictor.py", line 257, in extract_rois
    roi_images = roi_align(image, [rois], output_size=self.get_image_size())
  File "/home/t-tech/.local/lib/python3.10/site-packages/torchvision/ops/roi_align.py", line 236, in roi_align
    return _roi_align(input, rois, spatial_scale, output_size[0], output_size[1], sampling_ratio, aligned)
  File "/home/t-tech/.local/lib/python3.10/site-packages/torchvision/ops/roi_align.py", line 168, in _roi_align
    val = _bilinear_interpolate(input, roi_batch_ind, y, x, ymask, xmask)  # [K, C, PH, PW, IY, IX]
  File "/home/t-tech/.local/lib/python3.10/site-packages/torchvision/ops/roi_align.py", line 62, in _bilinear_interpolate
    v1 = masked_index(y_low, x_low)
  File "/home/t-tech/.local/lib/python3.10/site-packages/torchvision/ops/roi_align.py", line 55, in masked_index
    return input[
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/tmp/pytorch/c10/cuda/CUDACachingAllocator.cpp":1154, please report a bug to PyTorch. 
@YixinZhu042
Copy link

Hi!
I met the same issue and I changed the optimizer from AdamW to 8bitAdamW to reduce the memory and solved this issue. But I think 8bitAdamW maybe not stable and my GPU is A100-80G, which is sufficient to finetune SDXL. Have you solved it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants