Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on Execution: _C is not defined in ms_deform_attn.py #78

Open
huyyxy opened this issue Dec 30, 2024 · 2 comments
Open

Crash on Execution: _C is not defined in ms_deform_attn.py #78

huyyxy opened this issue Dec 30, 2024 · 2 comments

Comments

@huyyxy
Copy link

huyyxy commented Dec 30, 2024

Description:

I encountered an error when executing the grounded_sam2_local_demo.py script in a Docker environment built for the Grounded-SAM-2 project. The error arises specifically within the ms_deform_attn.py file, indicating that the _C variable is not defined.

企业微信20241230-115355@2x 企业微信20241230-115336@2x

Environment:

Hardware: NVIDIA GeForce RTX 4060 Ti
Host: Operating System: Ubuntu 20.04

Docker Version:
Client:
Version: 24.0.7
Context: default
Debug Mode: false

Server:
Containers: 11
Running: 4
Paused: 0
Stopped: 7
Images: 62
Server Version: 24.0.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version:
runc version:
init version:
Security Options:
apparmor
seccomp
Profile: builtin
Kernel Version: 5.15.0-124-generic
Operating System: Ubuntu 20.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 20
Total Memory: 31.18GiB
Name: rtx4060ti
ID: 1ca7cc46-dbd0-45a2-bcf4-fda93d13f064
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

Docker NVIDIA-SMI Version: 560.35.03
企业微信20241230-115305@2x

Host CUDA Version: 12.6
Docker CUDA Version: 12.1
Docker Python Version: 3.10.14
Docker PyTorch Version: 2.3.1

import torch
torch.version.cuda
'12.1'

企业微信20241230-115605@2x

Steps to Reproduce:

Clone the Grounded-SAM-2 repository.
Navigate to the Grounded-SAM-2 directory and build the Docker image with the command: make build-image.
Start the Docker container with: make run.
Execute python3 grounded_sam2_local_demo.py.
Expected Behavior:

The script should execute without errors, providing the output for the demo successfully.

Actual Behavior:

Execution halts with a NameError, indicating that _C is not defined. Here is the relevant portion of the traceback:

File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 53, in forward
output = _C.ms_deform_attn_forward(
NameError: name '_C' is not defined
Additionally, there were warnings about:

Running in CPU mode due to failing to load custom C++ ops.
The need to specify indexing arguments in future torch.meshgrid releases.
Converting non-writable NumPy array to tensor.
torch.utils.checkpoint behavior change in the future release.
None of the inputs have requires_grad=True, leading to None gradients.
Possible Cause:

It looks like the issue might be related to the custom C++ extensions not being properly compiled or loaded, most likely the _C reference is supposed to be a bridge to these compiled C++ extensions.

Requested Assistance:

Any guidance on resolving this error would be greatly appreciated. Is there a specific setup or dependency that I might be missing? Additionally, any tips on ensuring the custom C++ ops are correctly compiled and loaded would be helpful.

@huyyxy
Copy link
Author

huyyxy commented Dec 30, 2024

Regarding the warning message, “Failed to load custom C++ ops. Running on CPU mode Only!”, I am curious as to why the system defaults back to CPU-only mode despite running on an environment equipped with an NVIDIA GeForce RTX 4060 Ti graphics card.

Could you please offer some guidance on potential reasons for this issue? Specifically, I am wondering about the following:

Could this issue be related to the mismatch between the CUDA version used inside the Docker (12.1) and the CUDA version on the host machine (12.6)?
Is there a way to ensure custom C++ operations utilize GPU acceleration, even in the presence of CUDA version discrepancies?
Are there recommended steps to debug or check why PyTorch or custom C++ ops are not properly recognizing or utilizing my GPU resources?
I am confident that the correct NVIDIA drivers are installed in my environment and that nvidia-smi successfully identifies my GPU card, showing the correct CUDA version information.

Thank you!

@hidara2000
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants