You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running a script with genv and allocating 1 GPU. So, I started running the script and ran the enforcement command with 0 devices as the enforcement rule. Genv detects that I am using more than I am allowed to:
User ekinkarabulut is using 1 devices which is 1 more than the maximum allowed
Detaching environment 43155 of user ekinkarabulut from device 0
It detaches the genv environment from the device. I can't see any device attached when I run genv devices:
ID ENV ID ENV NAME ATTACHED
0
1
However, it doesn’t terminate the process so my job is still running (I can see it running when I check nvidia-smi):
Wed Aug 2 09:47:34 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 73C P0 75W / 149W | 505MiB / 11441MiB | 43% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:05.0 Off | 0 |
| N/A 38C P8 28W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 43155 C ray::_wrapper 502MiB |
+-----------------------------------------------------------------------------+
Enforcement with sudo using sudo -E env PATH="$PATH" genv enforce --interval 3 --max-devices-per-user 0 is giving the same result.
P.s.: To make sure, I also ran another script within a genv environment to make sure that it is not a general issue and enforced the same thing - it terminates the process smoothly with normal scripts without ray. It seems to be an issue for Ray integration
The text was updated successfully, but these errors were encountered:
I am running a script with genv and allocating 1 GPU. So, I started running the script and ran the enforcement command with 0 devices as the enforcement rule. Genv detects that I am using more than I am allowed to:
It detaches the genv environment from the device. I can't see any device attached when I run
genv devices
:However, it doesn’t terminate the process so my job is still running (I can see it running when I check
nvidia-smi
):Enforcement with sudo using
sudo -E env PATH="$PATH" genv enforce --interval 3 --max-devices-per-user 0
is giving the same result.P.s.: To make sure, I also ran another script within a genv environment to make sure that it is not a general issue and enforced the same thing - it terminates the process smoothly with normal scripts without ray. It seems to be an issue for Ray integration
The text was updated successfully, but these errors were encountered: