-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed GIL release issue with Python System and Python TestFixture. #2618
base: gz-sim9
Are you sure you want to change the base?
Fixed GIL release issue with Python System and Python TestFixture. #2618
Conversation
d11629b
to
c436b61
Compare
…gned-off-by: Amal Dev Haridevan [email protected] Signed-off-by: sdcnlab <[email protected]> Signed-off-by: Amal Dev Haridevan <[email protected]>
c436b61
to
29598fd
Compare
Co-authored-by: Alejandro Hernández Cordero <[email protected]> Signed-off-by: AmaldevHaridevan <[email protected]>
Signed-off-by: AmaldevHaridevan <[email protected]>
Signed-off-by: AmaldevHaridevan <[email protected]>
self->OnPreUpdate(_cb); | ||
pybind11::gil_scoped_release gilr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? I thought the scoped lock above would automatically release when going out of scope
// This acquire and release is only required from the PythonSystem code | ||
// However, adding this here may prevent undefined or unintended behaviors | ||
// in future | ||
pybind11::gil_scoped_acquire gil; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have not run into this issue with our python systems in the examples. Are you doing something specific that's causing a problem? It would be great if we can pin down the actual problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi azeey,
Yes. I will explain the issue in detail. The following files from my branch are required to reproduce the issue:
src/gz-sim/python/test/testFixture_TEST.Py
src/gz-sim/python/test/python_system_TEST.py
src/gz-sim/python/test/gravity.sdf
The goal is to run the Server in Python, with an SDF model that contains a Python system plugin. I executed the following command before my proposed modifications:
python testFixture_TEST.py
and the output was as follows:
(2024-09-13 12:52:31.329) [info] [SystemManager.cc:54] Serving entity system service on [/entity/system/add]
(2024-09-13 12:52:31.400) [debug] [Physics.cc:877] Loaded [gz::physics::dartsim::Plugin] from library [/home/sdcnlab/gz_contrib/src/install/lib/gz-physics-8/engine-plugins/libgz-physics-dartsim-plugin.so]
(2024-09-13 12:52:31.400) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::Physics] for entity [1]
Configured on 4
sdf name: plugin
Applying 0.0 N on link link
(2024-09-13 12:52:31.426) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::PythonSystemLoader] for entity [4]
(2024-09-13 12:52:31.426) [info] [LevelManager.cc:641] Loaded level [default]
(2024-09-13 12:52:31.427) [debug] [ServerConfig.cc:1025] Loading (3) plugins from file [/home/sdcnlab/.gz/sim/9/server.config]
(2024-09-13 12:52:31.427) [debug] [SimulationRunner.cc:1647] Additional plugins to load:
(2024-09-13 12:52:31.427) [debug] [SimulationRunner.cc:1650] gz::sim::systems::UserCommands gz-sim-user-commands-system
(2024-09-13 12:52:31.427) [debug] [SimulationRunner.cc:1650] gz::sim::systems::SceneBroadcaster gz-sim-scene-broadcaster-system
(2024-09-13 12:52:31.432) [info] [UserCommands.cc:664] Create service on [/world/gravity/create]
(2024-09-13 12:52:31.432) [info] [UserCommands.cc:671] Remove service on [/world/gravity/remove]
(2024-09-13 12:52:31.433) [info] [UserCommands.cc:678] Pose service on [/world/gravity/set_pose]
(2024-09-13 12:52:31.433) [info] [UserCommands.cc:686] Pose service on [/world/gravity/set_pose_vector]
(2024-09-13 12:52:31.434) [info] [UserCommands.cc:693] Light configuration service on [/world/gravity/light_config]
(2024-09-13 12:52:31.434) [info] [UserCommands.cc:719] Physics service on [/world/gravity/set_physics]
(2024-09-13 12:52:31.435) [info] [UserCommands.cc:727] SphericalCoordinates service on [/world/gravity/set_spherical_coordinates]
(2024-09-13 12:52:31.435) [info] [UserCommands.cc:736] Enable collision service on [/world/gravity/enable_collision]
(2024-09-13 12:52:31.435) [info] [UserCommands.cc:745] Disable collision service on [/world/gravity/disable_collision]
(2024-09-13 12:52:31.436) [info] [UserCommands.cc:754] Material service on [/world/gravity/visual_config]
(2024-09-13 12:52:31.436) [info] [UserCommands.cc:762] Material service on [/world/gravity/wheel_slip]
(2024-09-13 12:52:31.436) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::UserCommands] for entity [1]
(2024-09-13 12:52:31.438) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::SceneBroadcaster] for entity [1]
(2024-09-13 12:52:31.440) [info] [SimulationRunner.cc:260] Serving world controls on [/world/gravity/control], [/world/gravity/control/state] and [/world/gravity/playback/control]
(2024-09-13 12:52:31.440) [info] [SimulationRunner.cc:286] Serving GUI information on [/world/gravity/gui/info]
(2024-09-13 12:52:31.440) [info] [SimulationRunner.cc:289] World [gravity] initialized with [default_physics] physics profile.
(2024-09-13 12:52:31.441) [info] [SimulationRunner.cc:296] Serving world SDF generation service on [/world/gravity/generate_world_sdf]
(2024-09-13 12:52:31.441) [info] [ServerPrivate.cc:334] Serving world names on [/gazebo/worlds]
(2024-09-13 12:52:31.441) [info] [ServerPrivate.cc:347] Resource path add service on [/gazebo/resource_paths/add].
(2024-09-13 12:52:31.441) [info] [ServerPrivate.cc:360] Resource path get service on [/gazebo/resource_paths/get].
(2024-09-13 12:52:31.442) [info] [ServerPrivate.cc:375] Resource path resolve service on [/gazebo/resource_paths/resolve].
(2024-09-13 12:52:31.442) [info] [ServerPrivate.cc:389] Resource paths published on [/gazebo/resource_paths].
(2024-09-13 12:52:31.442) [info] [ServerPrivate.cc:402] Server control service on [/server_control].
(2024-09-13 12:52:33.066) [info] [SimulationRunner.cc:743] Found no publishers on /stats, adding root stats topic
(2024-09-13 12:52:33.068) [info] [SimulationRunner.cc:777] Found no publishers on /clock, adding root clock topic
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:542] Creating PostUpdate worker threads: 4
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (0)
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (1)
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (2)
pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::inc_ref() call was triggered on a method object.
E(2024-09-13 12:52:33.080) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (1)
(2024-09-13 12:52:33.080) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (2)
(2024-09-13 12:52:33.080) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (0)
======================================================================
ERROR: test_test_fixture (__main__.TestTestFixture)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/sdcnlab/gz_contrib/src/gz-sim/python/test/testFixture_TEST.py", line 58, in test_test_fixture
server.run(True, 1000, False)
RuntimeError: pybind11::handle::inc_ref() PyGILState_Check() failure.
----------------------------------------------------------------------
Ran 1 test in 2.028s
FAILED (errors=1)
Specifically, the error is this:
pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::inc_ref() call was triggered on a method object.
I wasn't sure how to debug this issue, however, I realized that a probable issue could be the Python system and the Python TestFixture are most likely trying to simultaneously execute.
Since GIL
is required to run the Python interpreter, this could potentially cause this problem. Based on this assumption, I made the modification by explicitly acquiring and releasing GIL
so that interpreter is not simultaneously used.
This is the result of running python testFixture_TEST.py
after my proposed modifications:
(2024-09-13 13:03:42.006) [info] [Server.cc:145] Loading SDF world file[/home/sdcnlab/gz_contrib/src/gz-sim/python/test/gravity.sdf].
(2024-09-13 13:03:42.256) [info] [SystemManager.cc:54] Serving entity system service on [/entity/system/add]
(2024-09-13 13:03:42.313) [debug] [Physics.cc:877] Loaded [gz::physics::dartsim::Plugin] from library [/home/sdcnlab/gz_contrib/src/install/lib/gz-physics-8/engine-plugins/libgz-physics-dartsim-plugin.so]
(2024-09-13 13:03:42.314) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::Physics] for entity [1]
Configured on 4
sdf name: plugin
Applying 0.0 N on link link
(2024-09-13 13:03:42.338) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::PythonSystemLoader] for entity [4]
(2024-09-13 13:03:42.338) [info] [LevelManager.cc:641] Loaded level [default]
(2024-09-13 13:03:42.339) [debug] [ServerConfig.cc:1025] Loading (3) plugins from file [/home/sdcnlab/.gz/sim/9/server.config]
(2024-09-13 13:03:42.340) [debug] [SimulationRunner.cc:1647] Additional plugins to load:
(2024-09-13 13:03:42.340) [debug] [SimulationRunner.cc:1650] gz::sim::systems::UserCommands gz-sim-user-commands-system
(2024-09-13 13:03:42.340) [debug] [SimulationRunner.cc:1650] gz::sim::systems::SceneBroadcaster gz-sim-scene-broadcaster-system
(2024-09-13 13:03:42.344) [info] [UserCommands.cc:664] Create service on [/world/gravity/create]
(2024-09-13 13:03:42.345) [info] [UserCommands.cc:671] Remove service on [/world/gravity/remove]
(2024-09-13 13:03:42.345) [info] [UserCommands.cc:678] Pose service on [/world/gravity/set_pose]
(2024-09-13 13:03:42.346) [info] [UserCommands.cc:686] Pose service on [/world/gravity/set_pose_vector]
(2024-09-13 13:03:42.346) [info] [UserCommands.cc:693] Light configuration service on [/world/gravity/light_config]
(2024-09-13 13:03:42.347) [info] [UserCommands.cc:719] Physics service on [/world/gravity/set_physics]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:727] SphericalCoordinates service on [/world/gravity/set_spherical_coordinates]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:736] Enable collision service on [/world/gravity/enable_collision]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:745] Disable collision service on [/world/gravity/disable_collision]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:754] Material service on [/world/gravity/visual_config]
(2024-09-13 13:03:42.349) [info] [UserCommands.cc:762] Material service on [/world/gravity/wheel_slip]
(2024-09-13 13:03:42.349) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::UserCommands] for entity [1]
(2024-09-13 13:03:42.351) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::SceneBroadcaster] for entity [1]
(2024-09-13 13:03:42.352) [info] [SimulationRunner.cc:260] Serving world controls on [/world/gravity/control], [/world/gravity/control/state] and [/world/gravity/playback/control]
(2024-09-13 13:03:42.353) [info] [SimulationRunner.cc:286] Serving GUI information on [/world/gravity/gui/info]
(2024-09-13 13:03:42.353) [info] [SimulationRunner.cc:289] World [gravity] initialized with [default_physics] physics profile.
(2024-09-13 13:03:42.353) [info] [SimulationRunner.cc:296] Serving world SDF generation service on [/world/gravity/generate_world_sdf]
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:334] Serving world names on [/gazebo/worlds]
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:347] Resource path add service on [/gazebo/resource_paths/add].
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:360] Resource path get service on [/gazebo/resource_paths/get].
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:375] Resource path resolve service on [/gazebo/resource_paths/resolve].
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:389] Resource paths published on [/gazebo/resource_paths].
(2024-09-13 13:03:42.355) [info] [ServerPrivate.cc:402] Server control service on [/server_control].
(2024-09-13 13:03:44.003) [info] [SimulationRunner.cc:743] Found no publishers on /stats, adding root stats topic
(2024-09-13 13:03:44.005) [info] [SimulationRunner.cc:777] Found no publishers on /clock, adding root clock topic
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:542] Creating PostUpdate worker threads: 4
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (0)
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (1)
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (2)
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:634] Serving scene information on [/world/gravity/scene/info]
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:643] Serving graph information on [/world/gravity/scene/graph]
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:654] Serving full state on [/world/gravity/state]
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:663] Serving full state (async) on [/world/gravity/state_async]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:671] Publishing scene information on [/world/gravity/scene/info]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:680] Publishing entity deletions on [/world/gravity/scene/deletion]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:689] Publishing state changes on [/world/gravity/state]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:700] Publishing pose messages on [/world/gravity/pose/info]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:711] Publishing dynamic pose messages on [/world/gravity/dynamic_pose/info]
(2024-09-13 13:03:45.154) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (0)
(2024-09-13 13:03:45.154) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (1)
(2024-09-13 13:03:45.154) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (2)
.
----------------------------------------------------------------------
Ran 1 test in 3.165s
OK
@AmalDevHaridevan just wanted to give you a heads-up that I won't be able to review this for the next couple of weeks due to ROSCon. |
Hi Azeey, any updates on the pull request? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience. I can see the issue now and I can reproduce it with
from gz_test_deps.sim import Server, ServerConfig
from gz_test_deps.common import set_verbosity
set_verbosity(4)
config = ServerConfig()
config.set_sdf_file('gravity.sdf')
server = Server(config)
server.run(True, 1000, False)
So it has nothing to do with TestFixture
, but with how Server
is being loaded from python. As mentioned in
gz-sim/src/SimulationRunner.cc
Lines 69 to 77 in c72be04
// Helper struct to maybe do a scoped release of the Python GIL. There are a | |
// number of ways where releasing and acquiring the GIL is not necessary: | |
// 1. Gazebo is built without Pybind11 | |
// 2. The python interpreter is not initialized. This could happen in tests that | |
// create a SimulationRunner without sim::Server where the interpreter is | |
// intialized. | |
// 3. sim::Server was instantiated by a Python module. In this case, there's a | |
// chance that this would be called with the GIL already released. | |
struct MaybeGilScopedRelease |
When Server
is instantiated from a python interpreter, pybind11 automatically releases the GIL before calling the C++ code (Server()
). When PythonSystemLoader
then tries to load the python system, it fails since any access to Python state requires locking the GIL. The solution you proposed works pretty well I think. My only comment is that we limit the changes to PythonSystemLoader
and acquire the GIL
in every System
interface call.
CallPythonMethod(this->preUpdateMethod, _info, &_ecm); | ||
py::gil_scoped_release gilr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary since py::gil_scoped_acquire
will release the gil when going out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use the TestModelSystem
plugin that's already in python/test/plugin
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think need more py::gil_scoped_acquire gil;
calls in PythonSystemLoader::Configure
, PythonSystemLoader::Reset()
, and PythonSystemLoader::~PythonSystemLoader()
. Basically, all System
interfaces since they will be accessing python state.
@@ -56,7 +56,14 @@ defineSimTestFixture(pybind11::object module) | |||
[](TestFixture* self, std::function<void( | |||
const UpdateInfo &, EntityComponentManager &)> _cb) | |||
{ | |||
// Add explicit scoped acquire and release of GIL, so that Python Systems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we apply the changes in PythonSystemLoader
, we shouldn't need any change in this file. Can you revert?
…gned-off-by: Amal Dev Haridevan [email protected]
🦟 Bug fix
Fixes #
Add explicit scoped acquire and release of GIL, so that Python Systems can be executed from a Python Server (run using TestFixture for example).
Summary
Python system plugin, which is attached to my model cannot be run using TestFixture in Python. This is due to GIL not being explicitly released in the PythonSystemLoader.
To overcome this issue, we need to do a scoped_acquire of GIL explicitly and then perform scoped_release after each of the system methods, namely, PreUpdate, Update, PostUpdate of the PythonSystem, so the GIL can be accessed by the TestFixture after.
As a safety mechanism, for future, I also added the scoped_acquire and release of GIL within the pybind code for TestFixture.
Checklist
codecheck
passed (See contributing)Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining
Signed-off-by
messages.🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸
Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining
Signed-off-by
messages.