Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unloading stonesense with the stonesense window open causes a hang #111

Open
ab9rf opened this issue Dec 24, 2024 · 1 comment
Open

unloading stonesense with the stonesense window open causes a hang #111

ab9rf opened this issue Dec 24, 2024 · 1 comment

Comments

@ab9rf
Copy link
Member

ab9rf commented Dec 24, 2024

to replicate:
load fort
open stonesense
unload stonesense from console

game hangs

@ab9rf ab9rf transferred this issue from DFHack/dfhack Dec 25, 2024
@ab9rf
Copy link
Member Author

ab9rf commented Jan 16, 2025

this is caused by stonesense's use of a CoreSuspender in its fetch and draw loops, which results in a deadlock. the specific scenario is that some other thread (typically either the DF simulation thread or the DFHack console thread, depending on where the unload command came from) running the plugin unload command, while holding a CoreSuspender, waits for stonesense's main loop to exit, which is itself waiting to acquire a CoreSuspender, leading to a deadlock

from what i can tell, there is no reason for stonesense to acquire a CoreSuspender for its draw loop because that loop does not access DF memory or use Lua, and thus does not need a CoreSuspender, and so we can remove the CoreSuspender in paintboard (in Gui.cpp) without affecting thread safety. (note that there are mutexes internal to stonesense to protect the map segments that are shared between the draw and read loops; we don't need to rely on a CoreSuspender here)

the read loop does need to synchronize with the DF core, so we need a yieldable wait here. i tested using a ConditionalCoreSuspender here (in read_segment in MapLoading.cpp), and found that this worked well. if the ConditionalCoreSuspender fails, stonesense will simply fail to read data from DF on that pass, which will pause stonesense updating but not cause stonesense to freeze during periods when a core lock is unavailable for some reason.

this could cause the read loop to spin under some circumstances, but ConditionalCoreSuspender has a 100 ms wait for the lock, so the spin rate would be at most ten times a second which is not likely to be a problem

it's unclear to me how this will impact megashots; that will need to be tested more thoroughly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant