Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed GIL release issue with Python System and Python TestFixture. #2618

Open
wants to merge 8 commits into
base: gz-sim9
Choose a base branch
from
11 changes: 11 additions & 0 deletions python/src/gz/sim/TestFixture.cc
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,14 @@ defineSimTestFixture(pybind11::object module)
[](TestFixture* self, std::function<void(
const UpdateInfo &, EntityComponentManager &)> _cb)
{
// Add explicit scoped acquire and release of GIL, so that Python Systems
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we apply the changes in PythonSystemLoader, we shouldn't need any change in this file. Can you revert?

// can be executed
// This acquire and release is only required from the PythonSystem code
// However, adding this here may prevent undefined or unintended behaviors
// in future
pybind11::gil_scoped_acquire gil;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not run into this issue with our python systems in the examples. Are you doing something specific that's causing a problem? It would be great if we can pin down the actual problem

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi azeey,

Yes. I will explain the issue in detail. The following files from my branch are required to reproduce the issue:

  1. src/gz-sim/python/test/testFixture_TEST.Py
  2. src/gz-sim/python/test/python_system_TEST.py
  3. src/gz-sim/python/test/gravity.sdf

The goal is to run the Server in Python, with an SDF model that contains a Python system plugin. I executed the following command before my proposed modifications:
python testFixture_TEST.py and the output was as follows:

(2024-09-13 12:52:31.329) [info] [SystemManager.cc:54] Serving entity system service on [/entity/system/add]
(2024-09-13 12:52:31.400) [debug] [Physics.cc:877] Loaded [gz::physics::dartsim::Plugin] from library [/home/sdcnlab/gz_contrib/src/install/lib/gz-physics-8/engine-plugins/libgz-physics-dartsim-plugin.so]
(2024-09-13 12:52:31.400) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::Physics] for entity [1]
Configured on 4
sdf name: plugin
Applying 0.0 N on link link
(2024-09-13 12:52:31.426) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::PythonSystemLoader] for entity [4]
(2024-09-13 12:52:31.426) [info] [LevelManager.cc:641] Loaded level [default]
(2024-09-13 12:52:31.427) [debug] [ServerConfig.cc:1025] Loading (3) plugins from file [/home/sdcnlab/.gz/sim/9/server.config]
(2024-09-13 12:52:31.427) [debug] [SimulationRunner.cc:1647] Additional plugins to load:
(2024-09-13 12:52:31.427) [debug] [SimulationRunner.cc:1650] gz::sim::systems::UserCommands gz-sim-user-commands-system
(2024-09-13 12:52:31.427) [debug] [SimulationRunner.cc:1650] gz::sim::systems::SceneBroadcaster gz-sim-scene-broadcaster-system
(2024-09-13 12:52:31.432) [info] [UserCommands.cc:664] Create service on [/world/gravity/create]
(2024-09-13 12:52:31.432) [info] [UserCommands.cc:671] Remove service on [/world/gravity/remove]
(2024-09-13 12:52:31.433) [info] [UserCommands.cc:678] Pose service on [/world/gravity/set_pose]
(2024-09-13 12:52:31.433) [info] [UserCommands.cc:686] Pose service on [/world/gravity/set_pose_vector]
(2024-09-13 12:52:31.434) [info] [UserCommands.cc:693] Light configuration service on [/world/gravity/light_config]
(2024-09-13 12:52:31.434) [info] [UserCommands.cc:719] Physics service on [/world/gravity/set_physics]
(2024-09-13 12:52:31.435) [info] [UserCommands.cc:727] SphericalCoordinates service on [/world/gravity/set_spherical_coordinates]
(2024-09-13 12:52:31.435) [info] [UserCommands.cc:736] Enable collision service on [/world/gravity/enable_collision]
(2024-09-13 12:52:31.435) [info] [UserCommands.cc:745] Disable collision service on [/world/gravity/disable_collision]
(2024-09-13 12:52:31.436) [info] [UserCommands.cc:754] Material service on [/world/gravity/visual_config]
(2024-09-13 12:52:31.436) [info] [UserCommands.cc:762] Material service on [/world/gravity/wheel_slip]
(2024-09-13 12:52:31.436) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::UserCommands] for entity [1]
(2024-09-13 12:52:31.438) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::SceneBroadcaster] for entity [1]
(2024-09-13 12:52:31.440) [info] [SimulationRunner.cc:260] Serving world controls on [/world/gravity/control], [/world/gravity/control/state] and [/world/gravity/playback/control]
(2024-09-13 12:52:31.440) [info] [SimulationRunner.cc:286] Serving GUI information on [/world/gravity/gui/info]
(2024-09-13 12:52:31.440) [info] [SimulationRunner.cc:289] World [gravity] initialized with [default_physics] physics profile.
(2024-09-13 12:52:31.441) [info] [SimulationRunner.cc:296] Serving world SDF generation service on [/world/gravity/generate_world_sdf]
(2024-09-13 12:52:31.441) [info] [ServerPrivate.cc:334] Serving world names on [/gazebo/worlds]
(2024-09-13 12:52:31.441) [info] [ServerPrivate.cc:347] Resource path add service on [/gazebo/resource_paths/add].
(2024-09-13 12:52:31.441) [info] [ServerPrivate.cc:360] Resource path get service on [/gazebo/resource_paths/get].
(2024-09-13 12:52:31.442) [info] [ServerPrivate.cc:375] Resource path resolve service on [/gazebo/resource_paths/resolve].
(2024-09-13 12:52:31.442) [info] [ServerPrivate.cc:389] Resource paths published on [/gazebo/resource_paths].
(2024-09-13 12:52:31.442) [info] [ServerPrivate.cc:402] Server control service on [/server_control].
(2024-09-13 12:52:33.066) [info] [SimulationRunner.cc:743] Found no publishers on /stats, adding root stats topic
(2024-09-13 12:52:33.068) [info] [SimulationRunner.cc:777] Found no publishers on /clock, adding root clock topic
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:542] Creating PostUpdate worker threads: 4
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (0)
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (1)
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (2)
pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::inc_ref() call was triggered on a method object.
E(2024-09-13 12:52:33.080) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (1)
(2024-09-13 12:52:33.080) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (2)
(2024-09-13 12:52:33.080) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (0)

======================================================================
ERROR: test_test_fixture (__main__.TestTestFixture)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/sdcnlab/gz_contrib/src/gz-sim/python/test/testFixture_TEST.py", line 58, in test_test_fixture
    server.run(True, 1000, False)
RuntimeError: pybind11::handle::inc_ref() PyGILState_Check() failure.

----------------------------------------------------------------------
Ran 1 test in 2.028s

FAILED (errors=1)

Specifically, the error is this:

pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::inc_ref() call was triggered on a method object.

I wasn't sure how to debug this issue, however, I realized that a probable issue could be the Python system and the Python TestFixture are most likely trying to simultaneously execute.
Since GIL is required to run the Python interpreter, this could potentially cause this problem. Based on this assumption, I made the modification by explicitly acquiring and releasing GIL so that interpreter is not simultaneously used.

This is the result of running python testFixture_TEST.py after my proposed modifications:

(2024-09-13 13:03:42.006) [info] [Server.cc:145] Loading SDF world file[/home/sdcnlab/gz_contrib/src/gz-sim/python/test/gravity.sdf].
(2024-09-13 13:03:42.256) [info] [SystemManager.cc:54] Serving entity system service on [/entity/system/add]
(2024-09-13 13:03:42.313) [debug] [Physics.cc:877] Loaded [gz::physics::dartsim::Plugin] from library [/home/sdcnlab/gz_contrib/src/install/lib/gz-physics-8/engine-plugins/libgz-physics-dartsim-plugin.so]
(2024-09-13 13:03:42.314) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::Physics] for entity [1]
Configured on 4
sdf name: plugin
Applying 0.0 N on link link
(2024-09-13 13:03:42.338) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::PythonSystemLoader] for entity [4]
(2024-09-13 13:03:42.338) [info] [LevelManager.cc:641] Loaded level [default]
(2024-09-13 13:03:42.339) [debug] [ServerConfig.cc:1025] Loading (3) plugins from file [/home/sdcnlab/.gz/sim/9/server.config]
(2024-09-13 13:03:42.340) [debug] [SimulationRunner.cc:1647] Additional plugins to load:
(2024-09-13 13:03:42.340) [debug] [SimulationRunner.cc:1650] gz::sim::systems::UserCommands gz-sim-user-commands-system
(2024-09-13 13:03:42.340) [debug] [SimulationRunner.cc:1650] gz::sim::systems::SceneBroadcaster gz-sim-scene-broadcaster-system
(2024-09-13 13:03:42.344) [info] [UserCommands.cc:664] Create service on [/world/gravity/create]
(2024-09-13 13:03:42.345) [info] [UserCommands.cc:671] Remove service on [/world/gravity/remove]
(2024-09-13 13:03:42.345) [info] [UserCommands.cc:678] Pose service on [/world/gravity/set_pose]
(2024-09-13 13:03:42.346) [info] [UserCommands.cc:686] Pose service on [/world/gravity/set_pose_vector]
(2024-09-13 13:03:42.346) [info] [UserCommands.cc:693] Light configuration service on [/world/gravity/light_config]
(2024-09-13 13:03:42.347) [info] [UserCommands.cc:719] Physics service on [/world/gravity/set_physics]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:727] SphericalCoordinates service on [/world/gravity/set_spherical_coordinates]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:736] Enable collision service on [/world/gravity/enable_collision]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:745] Disable collision service on [/world/gravity/disable_collision]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:754] Material service on [/world/gravity/visual_config]
(2024-09-13 13:03:42.349) [info] [UserCommands.cc:762] Material service on [/world/gravity/wheel_slip]
(2024-09-13 13:03:42.349) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::UserCommands] for entity [1]
(2024-09-13 13:03:42.351) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::SceneBroadcaster] for entity [1]
(2024-09-13 13:03:42.352) [info] [SimulationRunner.cc:260] Serving world controls on [/world/gravity/control], [/world/gravity/control/state] and [/world/gravity/playback/control]
(2024-09-13 13:03:42.353) [info] [SimulationRunner.cc:286] Serving GUI information on [/world/gravity/gui/info]
(2024-09-13 13:03:42.353) [info] [SimulationRunner.cc:289] World [gravity] initialized with [default_physics] physics profile.
(2024-09-13 13:03:42.353) [info] [SimulationRunner.cc:296] Serving world SDF generation service on [/world/gravity/generate_world_sdf]
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:334] Serving world names on [/gazebo/worlds]
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:347] Resource path add service on [/gazebo/resource_paths/add].
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:360] Resource path get service on [/gazebo/resource_paths/get].
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:375] Resource path resolve service on [/gazebo/resource_paths/resolve].
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:389] Resource paths published on [/gazebo/resource_paths].
(2024-09-13 13:03:42.355) [info] [ServerPrivate.cc:402] Server control service on [/server_control].
(2024-09-13 13:03:44.003) [info] [SimulationRunner.cc:743] Found no publishers on /stats, adding root stats topic
(2024-09-13 13:03:44.005) [info] [SimulationRunner.cc:777] Found no publishers on /clock, adding root clock topic
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:542] Creating PostUpdate worker threads: 4
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (0)
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (1)
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (2)
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:634] Serving scene information on [/world/gravity/scene/info]
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:643] Serving graph information on [/world/gravity/scene/graph]
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:654] Serving full state on [/world/gravity/state]
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:663] Serving full state (async) on [/world/gravity/state_async]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:671] Publishing scene information on [/world/gravity/scene/info]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:680] Publishing entity deletions on [/world/gravity/scene/deletion]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:689] Publishing state changes on [/world/gravity/state]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:700] Publishing pose messages on [/world/gravity/pose/info]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:711] Publishing dynamic pose messages on [/world/gravity/dynamic_pose/info]
(2024-09-13 13:03:45.154) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (0)
(2024-09-13 13:03:45.154) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (1)
(2024-09-13 13:03:45.154) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (2)
.
----------------------------------------------------------------------
Ran 1 test in 3.165s

OK

self->OnPreUpdate(_cb);
pybind11::gil_scoped_release gilr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? I thought the scoped lock above would automatically release when going out of scope

}
),
pybind11::return_value_policy::reference,
Expand All @@ -67,7 +74,9 @@ defineSimTestFixture(pybind11::object module)
[](TestFixture* self, std::function<void(
const UpdateInfo &, EntityComponentManager &)> _cb)
{
pybind11::gil_scoped_acquire gil;
self->OnUpdate(_cb);
pybind11::gil_scoped_release gilr;
}
),
pybind11::return_value_policy::reference,
Expand All @@ -78,7 +87,9 @@ defineSimTestFixture(pybind11::object module)
[](TestFixture* self, std::function<void(
const UpdateInfo &, const EntityComponentManager &)> _cb)
{
pybind11::gil_scoped_acquire gil;
self->OnPostUpdate(_cb);
pybind11::gil_scoped_release gilr;
}
),
pybind11::return_value_policy::reference,
Expand Down
5 changes: 5 additions & 0 deletions python/test/gravity.sdf
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@
<mass>1.0</mass>
</inertial>
</link>
<plugin filename="gz-sim-python-system-loader-system"
name="gz::sim::systems::PythonSystemLoader">
<module_name>python_system_TEST</module_name>
</plugin>
</model>

AmalDevHaridevan marked this conversation as resolved.
Show resolved Hide resolved
</world>
</sdf>
45 changes: 45 additions & 0 deletions python/test/python_system_TEST.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use the TestModelSystem plugin that's already in python/test/plugin?

Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/usr/bin/env python3
# Copyright (C) 2024 Open Source Robotics Foundation

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

# http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from gz_test_deps.math import Vector3d
from gz_test_deps.sim import Model, Link
import random


class TestSystem(object):
def __init__(self):
self.id = random.randint(1, 100)

def configure(self, entity, sdf, ecm, event_mgr):
self.model = Model(entity)
self.link = Link(self.model.canonical_link(ecm))
print("Configured on", entity)
print("sdf name:", sdf.get_name())
self.force = sdf.get_double("force")
print(f"Applying {self.force} N on link {self.link.name(ecm)}")

def pre_update(self, info, ecm):
if info.paused:
return

if info.iterations % 3000 == 0:
self.link.add_world_force(
ecm, Vector3d(0, 0, self.force),
Vector3d(random.random(), random.random(), 0))


def get_system():
return TestSystem()

8 changes: 8 additions & 0 deletions src/systems/python_system_loader/PythonSystemLoader.cc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need more py::gil_scoped_acquire gil; calls in PythonSystemLoader::Configure, PythonSystemLoader::Reset(), and PythonSystemLoader::~PythonSystemLoader(). Basically, all System interfaces since they will be accessing python state.

Original file line number Diff line number Diff line change
Expand Up @@ -194,14 +194,21 @@ void PythonSystemLoader::CallPythonMethod(py::object _method, Args &&..._args)
void PythonSystemLoader::PreUpdate(const UpdateInfo &_info,
EntityComponentManager &_ecm)
{
// Add explicit scoped acquire and release of GIL, so that Python
// Systems can be executed.This acquire and release is only required
// from the PythonSystem code
py::gil_scoped_acquire gil;
CallPythonMethod(this->preUpdateMethod, _info, &_ecm);
py::gil_scoped_release gilr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary since py::gil_scoped_acquire will release the gil when going out of scope.

}

//////////////////////////////////////////////////
void PythonSystemLoader::Update(const UpdateInfo &_info,
EntityComponentManager &_ecm)
{
py::gil_scoped_acquire gil;
CallPythonMethod(this->updateMethod, _info, &_ecm);
py::gil_scoped_release gilr;
}

//////////////////////////////////////////////////
Expand All @@ -210,6 +217,7 @@ void PythonSystemLoader::PostUpdate(const UpdateInfo &_info,
{
py::gil_scoped_acquire gil;
CallPythonMethod(this->postUpdateMethod, _info, &_ecm);
py::gil_scoped_release gilr;
}
//////////////////////////////////////////////////
void PythonSystemLoader::Reset(const UpdateInfo &_info,
Expand Down
Loading