fix missing increment of the instance count on every call to startup() #3100

mbechard · 2025-01-03T21:46:38Z

Change how the implicit startup is tracked.
The extra change is needed since the startup() calls in
srt::CUDT::socket()
and
srt::CUDT::createGroup
would endlessly increment the startup counter as sockets were created
otherwise.

Force a clean() when the singleton is getting destructed.
Previously it just did a single decrement of the instance count,
which could result in attempting to exit without actually cleaning up.

fixes #3098

ethouris · 2025-01-06T10:37:29Z

As for me it looks ok, but Max will be back in the second half of January.

mbechard · 2025-01-06T20:43:32Z

Still has bugs with existing code, as per the issue discussion

mbechard · 2025-01-06T21:18:32Z

I think this modified solution is reasonable to handle the remaining issues.

srtcore/api.cpp

ethouris · 2025-01-08T14:51:41Z

Wait, there's one thing I don't understand.

Why do you need to prevent the call to cleanup()? The call to cleanup() should be able to be done multiple times, even if at least one of the series has done the real cleanup.

The only rule was that there should be done as many (meaning, the minimum) cleanup calls as there was startup calls, but extra cleanup() calls, which actually do nothing, should be allowed. Have you found a problem with extra cleanups?

mbechard · 2025-01-08T16:23:59Z

This doesn't disallow extra cleanup in general, that allowance remains in place. The change here though is that it avoids an unpaired cleanup from internal code. Doing a cleanup internally when a startup wasn't done internally can hide bugs in the API users code. And since the cleanup done in srt::CUDTUnited::~CUDTUnited() is only just doing a single count decrement, whether or not it actually avoids the app closing without doing the cleanup isn't dependable.
If anything, I would contend that srt::CUDTUnited::~CUDTUnited() should force a cleanup no matter what counter is, as long as it's > 0. That would ensure the cleanup on shutdown.

ethouris · 2025-01-09T13:58:41Z

The cleanup call is prevented if this wasn't started internally.

I don't understand the reason of this blocking. Starting it internally should simply call exactly the same startup(), with exception that it should check if it isn't initialized already, to ensure that an internal startup is really done only once. So, this is called when creating a socket and it should ensure that this can be called only once, and only once increase the number of instances. Just the call from srt_startup() should increase the number of instances every time it is called.

mbechard · 2025-01-09T17:32:54Z

It depends what version of the code you are looking at. What you are stating was correct for the old code, before the original bug was introduced. However multiple changes occurred with that bug commit. The current master fails to increment on any but the first startup call, which results in incorrect cleanups due to the counting being incorrect. If we fix it so every startup() increments the count, then the counting increments too much due to internal startups, which occur every time a socket is created now. This is new behavior because the code used to avoid the internal startup() call with a if (m_bGCStatus) check, which has now been removed.

This PR handles both of those new issues. There are certainly multiple ways to solve this. My commit is one way. Feel free to solve it another if you wish.

ethouris · 2025-01-10T09:44:38Z

All I was asking was the reasoning of introducing the new field for marking the implicit initialization and performing the cleanup only under this condition. And how well it handles a situation, when after creating a socket without having called srt_startup() first this function is called again afterwards.

Things that I think should be taken into account are:

The user may call srt_startup() in the beginning (before creating the first socket) or not
The user may forget to call srt_cleanup() at the end
The srt_startup() may be called again without having called srt_cleanup()

This version you provided, for example, prevents the cleanup action from being called in a situation when a user called srt_startup() in the beginning, but didn't call srt_cleanup() (for example, when the program has ended with an exception). At least this is the behavior as I understand it.

mbechard · 2025-01-10T16:47:19Z

I added the new field because making a choice on implicit startup/cleanups based on the state of the GC thread bool seems error prone. If the logic for the GC thread changes in the future, it may introduce bugs. The GC management is a sub-portion of the startup/cleanup procedure.

Agreed the new behavior does avoid a single srt_startup() getting implicitly cleaned up on program exit. However the old behavior only did that for a single call. If the user called srt_startup() twice, the old behavior wouldn't do a cleanup either since it's only doing a single decrement. So I think that old behavior was of dubious usefulness.

It would seem the improvement over both the old and new behavior would to ignore the counter and force a cleanup on program exit if it has still been left in the started state. I'll add those changes.

mbechard · 2025-01-10T18:10:15Z

The old code has race condition in it too. The check in srt::CUDT::socket() for the state of m_bGCStatus isn't protected by a lock so it could result in multiple implicit startups calls if multiple threads hit it at the same time. That would result in the count growing more than 1 via implicit startups, so the cleanup() wouldn't end up actually triggering a cleanup anyways.

mbechard · 2025-01-10T18:38:45Z

Ok see how you feel about this one instead. I'm not a huge fan of the boolean traps added to startup()/cleanup(). For cleanup() I could make it an internal flag in the class that the destructor sets, if you prefer.
For startup() it's hard to do this cleanly since the internal implementation of UniqueLock doesn't have move constructor/operator.

ethouris · 2025-01-13T12:34:44Z

srtcore/api.cpp

+        if (m_iInstanceCount == 0)
+            return 0;
+
+        if (--m_iInstanceCount > 0)
+            return 0;


This can be bound.

Suggested change

if (m_iInstanceCount == 0)

return 0;

if (--m_iInstanceCount > 0)

return 0;

if (m_iInstanceCount == 0 || --m_iInstanceCount > 0)

return 0;

ethouris · 2025-01-13T12:36:09Z

srtcore/api.cpp

@@ -233,15 +228,24 @@ string srt::CUDTUnited::CONID(SRTSOCKET sock)
    return os.str();
 }

-int srt::CUDTUnited::startup()
+void srt::CUDTUnited::implicitStartup()


Once you have a parameter in startup(), this function is no longer necessary. You can call it as startup(true) in socket and group creation functions.

srtcore/api.cpp

ethouris · 2025-01-13T12:56:34Z

Ok, now this looks definitely better. I would keenly have some unit tests for this, though; I'll try to add something myself. For this these functions would have to return some status value, which simply states whether something was executed or not. I'll try to figure out something myself; we'd need also a function that can report the current number of instances, but should be able to use the CUDTClass directly, not the API.

…ision#3100

ethouris · 2025-01-13T15:03:19Z

Ok, I made a new #3104 as draft so that you can take the changes there. Things that I reported in review comments are applied, and the test is added. I marked places that should be changed for 1.6.0, but for the 1.5.5 they must stay the same to maintain the rules of the API, so this is only marked, and changes would be then provided in a separate PR for 1.6.0. And the test is also there.

Please take a look and you can also get these changes to apply here - just add ".patch" to the URI.

mbechard · 2025-01-13T21:09:31Z

Ok thanks, I've re-pushed your patch. Appreciate the feedback and the patch

ethouris · 2025-01-14T11:28:36Z

Ok, that comment I reported I added it - forgot about that myself ;).

ethouris · 2025-01-14T11:56:12Z

Ok, something's wrong, the tests on Travis fail. On my draft PR they fail either. Need to see into this.

ethouris · 2025-01-14T12:42:09Z

Alright, I found the problem. there are two additional changes in the tests. One of them just performs one testing step in the text that HAPPENS NEXT to check if the PREVIOUS TEST did the job right ;). Unfortunately there's no way to check if the global destructor has run correctly because it happens at the end of the testing application itself. Can you please take the updates from #3104?

ethouris · 2025-01-14T14:44:36Z

Ok small update: only RIGHT NOW I was able to make all tests pass also in Travis. Some tests rely on doing some things in timely manner and irregularly working machines can fail there, including in cases when an error is expected.

Change how the implicit startup is tracked. The extra change is needed since the startup() calls in srt::CUDT::socket() and srt::CUDT::createGroup would endlessly increment the startup counter as sockets were created otherwise. Force a clean() when the singleton is getting destructed. Previously it just did a single decrement of the instance count, which could result in attempting to exit without actually cleaning up. fixes Haivision#3098

mbechard · 2025-01-15T18:07:38Z

Ok re-pushed, thanks for your edits!

srtcore/api.cpp

-    if (m_bGCStatus)
+
+    if (implicit && m_iInstanceCount > 0)
+        // [reinstate in 1.6.0] return m_iInstanceCount;


srtcore/api.cpp

        return 1;

    if (m_iInstanceCount++ > 0)
+        // [reinstate in 1.6.0] return m_iInstanceCount;


srtcore/api.cpp

+        // [reinstate in 1.6.0] return m_iInstanceCount;
+        return 1;
+
+    if (m_bGCStatus)


srtcore/api.cpp

-        return 0;
+    if (!force)
+    {
+        if (m_iInstanceCount == 0 || --m_iInstanceCount > 0)


mbechard marked this pull request as draft January 3, 2025 22:08

ethouris added this to the v1.5.5 milestone Jan 6, 2025

mbechard force-pushed the master branch from dad4f88 to 20def32 Compare January 6, 2025 21:17

mbechard marked this pull request as ready for review January 6, 2025 21:18

ethouris reviewed Jan 8, 2025

View reviewed changes

srtcore/api.cpp Outdated Show resolved Hide resolved

mbechard force-pushed the master branch from 20def32 to 6e4537c Compare January 10, 2025 18:35

mbechard changed the title ~~fix missing increment of the reference count on every call to startup()~~ fix missing increment of the instance count on every call to startup() Jan 10, 2025

ethouris reviewed Jan 13, 2025

View reviewed changes

srtcore/api.cpp Show resolved Hide resolved

ethouris pushed a commit to ethouris/srt that referenced this pull request Jan 13, 2025

[DRAFT] Add test and some future API changes for startup/cleanup Haiv…

2116963

…ision#3100

ethouris mentioned this pull request Jan 13, 2025

[DRAFT] Add test and some future API changes for startup/cleanup #3100 #3104

Draft

mbechard force-pushed the master branch from 6e4537c to 4275d7b Compare January 13, 2025 20:41

ethouris added Type: Bug Indicates an unexpected problem or unintended behavior [core] Area: Changes in SRT library core labels Jan 14, 2025

mbechard force-pushed the master branch from 8b0a9fb to 9658714 Compare January 15, 2025 18:07

github-advanced-security bot found potential problems Jan 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix missing increment of the instance count on every call to startup() #3100

fix missing increment of the instance count on every call to startup() #3100

mbechard commented Jan 3, 2025 •

edited

Loading

ethouris commented Jan 6, 2025

mbechard commented Jan 6, 2025

mbechard commented Jan 6, 2025 •

edited

Loading

ethouris commented Jan 8, 2025

mbechard commented Jan 8, 2025

ethouris commented Jan 9, 2025

mbechard commented Jan 9, 2025 •

edited

Loading

ethouris commented Jan 10, 2025

mbechard commented Jan 10, 2025 •

edited

Loading

mbechard commented Jan 10, 2025

mbechard commented Jan 10, 2025

ethouris Jan 13, 2025

ethouris Jan 13, 2025

ethouris commented Jan 13, 2025

ethouris commented Jan 13, 2025

mbechard commented Jan 13, 2025

ethouris commented Jan 14, 2025

ethouris commented Jan 14, 2025

ethouris commented Jan 14, 2025

ethouris commented Jan 14, 2025

mbechard commented Jan 15, 2025

fix missing increment of the instance count on every call to startup() #3100

Are you sure you want to change the base?

fix missing increment of the instance count on every call to startup() #3100

Conversation

mbechard commented Jan 3, 2025 • edited Loading

ethouris commented Jan 6, 2025

mbechard commented Jan 6, 2025

mbechard commented Jan 6, 2025 • edited Loading

ethouris commented Jan 8, 2025

mbechard commented Jan 8, 2025

ethouris commented Jan 9, 2025

mbechard commented Jan 9, 2025 • edited Loading

ethouris commented Jan 10, 2025

mbechard commented Jan 10, 2025 • edited Loading

mbechard commented Jan 10, 2025

mbechard commented Jan 10, 2025

ethouris Jan 13, 2025

Choose a reason for hiding this comment

ethouris Jan 13, 2025

Choose a reason for hiding this comment

ethouris commented Jan 13, 2025

ethouris commented Jan 13, 2025

mbechard commented Jan 13, 2025

ethouris commented Jan 14, 2025

ethouris commented Jan 14, 2025

ethouris commented Jan 14, 2025

ethouris commented Jan 14, 2025

mbechard commented Jan 15, 2025

mbechard commented Jan 3, 2025 •

edited

Loading

mbechard commented Jan 6, 2025 •

edited

Loading

mbechard commented Jan 9, 2025 •

edited

Loading

mbechard commented Jan 10, 2025 •

edited

Loading