Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle limited open Connections due to keepalive connections #9916

Merged
merged 17 commits into from
Oct 16, 2024

Conversation

sgothel
Copy link

@sgothel sgothel commented Aug 28, 2024

Summary

  • ProtocolHandlerInterface::checkTimeout(..)
    is now called via StreamSocket::checkRemoval()
    to ensure tests even w/o input data (was: handlePoll())

  • ProtocolHandlerInterface::checkTimeout(..)

    • Add bool return value: true -> shutdown connection, caller shall stop processing
    • Implemented for http::Session
      • Timeout (30s) net::Defaults::HTTPTimeout with missing response
    • Implemented for WebSocketHandler
      • Timeout (2s = net::Defaults::WSPingTimeout)
        after missing WS native frame ping/pong (server only)
  • StreamSocket -> Socket (properties moved)

    • bytes sent/received
    • closed state
  • Socket (added properties)

    • creation- and last-seen -time
    • socket type and port
    • checkRemoval(..)
      • called directly from SocketPoll::poll()
      • only for IPv4/v6 network connections
      • similar to ProtocolHandlerInterface::checkTimeout(..)
      • added further criteria (age, throughput, ..)
        • Timeout (64s = net::Defaults::SocketPollTimeout)
          if (now - lastSeen) > timeout
        • Timeout (net::Defaults::MinBytesPerSec)
          if Throughput < MinBytesPerSec (disabled by default)
        • TODO: Add maximimal IPv4/IPv6 socket-count criteria, drop oldest.
        • age via Socket::creationTime is currently not used as requested
  • SocketPoll::poll()

    • Additionally erases if !socket->isOpen() || socket->checkRemoval()

Changes to original PR

  • net::Config -> net::Defaults, without any xml config file side-effects
    • only allow net::Defaults to be changed in unit tests via dedicated callback
  • removed Socket age criteria via Socket::creationTime and net::defaults::MaxDuration
    • Socket::creationTime still relevant for minimum throughput criteria
    • dropped the net::defaults::MaxDuration
  • max-connections
    • removed under- overflow checks
    • using SC atomic ops (not just relaxed)
  • enhanced API doc of Util::TimeAverage (clarification)
  • unit tests in one place / commit
  • more unification/squashing to single semantic commits
  • more sensible order of commits
  • removed dead code
  • dropped noexcept patch

Further:

  • cleaned up the commit messages
  • removed the COOLWSD.cpp default config reordering
  • reordered the 'Unify socket stats', as it must be upfront the actual connection count limitation
    and before moved _lastSeenHTTPHeader
  • fixed another intermediate breakage in the commit chain
  • bisectable proof (but no unit test runs ofc) .. via
    • passed git rebase -i --exec "make -j 20" for all commits
  • Use scoped enums with efficient types (STATE_ENUM)
    • now using desired STATE_ENUM

Unit tests: +7 files, 879 insertions(+), remaining changed lines +800, -222

Checklist

  • I have run make prettier-write and formatted the code.
  • All commits have Change-Id
  • I have run tests with make check
  • Passed certain unit tests and cypress desktop annotations.js
  • I have issued make run and manually verified that everything looks okay
  • Documentation (manuals or wiki) has been updated or is not required

@sgothel sgothel requested review from mmeeks, Ashod and caolanm August 28, 2024 00:30
@sgothel sgothel force-pushed the private/sgothel/cool_9833_connection_limit branch 3 times, most recently from a2c3068 to 3ef99c5 Compare August 28, 2024 00:58
@sgothel sgothel self-assigned this Aug 28, 2024
@sgothel
Copy link
Author

sgothel commented Aug 28, 2024

This change set created a linkage error (android), the 'usual' type_info, vtable thing.
Can't figure a missed 'virtual' or even pure virtual '=0' virtual key in the code, also reviewed the MOBILEAPP macro.
Hence I need to reproduce and build the android locally here .. WIP.

net/Socket.cpp Outdated
@@ -1298,6 +1385,52 @@ LocalServerSocket::~LocalServerSocket()
# define LOG_CHUNK(X)
#endif

bool StreamSocket::checkForcedRemoval(std::chrono::steady_clock::time_point now) noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need to move this impl to after

"#endif // !MOBILEAPP" (if mobile needs it) and/or put ifdef MOBILEAPP around it in the header if it doesn't or a !MOBILE stub, something like that anyway

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. Great eyes. Thx!

Copy link
Contributor

@Ashod Ashod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @sgothel. There are some really good stuff here!

I understand this is a draft, but it reads like a mix of random fixes and improvements (some of it is good and useful, like adding const and = default, make_shared, etc.), re-inventing methods for doing things already supported, and debugging/tracing support for your understanding (to be removed).

It's very hard to read, as there is no clear theme or purpose.

I left various comments, but I didn't dive too deep in a number of areas because it was unclear why we are either replacing an existing approach, or inventing a new one when we can improve on the existing one (where there are shortcomings).

At a minimum, I'd separate the backtrace improvements from the socket improvements, and likely the misc. improvements from new code/features/tests.

Finally, I'm troubled by vformatString and the re-working a lot of the socket states and internals with a huge set of new members that change the existing semantics with unclear benefits.

Perhaps there is some context I'm missing. Perhaps this was agreed on and I'm unaware.

net/HttpRequest.hpp Outdated Show resolved Hide resolved
net/HttpRequest.hpp Show resolved Hide resolved
net/HttpRequest.hpp Show resolved Hide resolved
net/ServerSocket.hpp Outdated Show resolved Hide resolved
net/Socket.cpp Outdated Show resolved Hide resolved
test/UnitSocketLifecycle.cpp Outdated Show resolved Hide resolved
test/UnitSocketLifecycle.cpp Outdated Show resolved Hide resolved
test/WebSocketSession.hpp Outdated Show resolved Hide resolved
common/Util.cpp Outdated Show resolved Hide resolved
net/Socket.cpp Outdated Show resolved Hide resolved
@sgothel
Copy link
Author

sgothel commented Aug 28, 2024

Thank you @Ashod for the review.

I understand this is a draft, but it reads like a mix of random fixes and improvements (some of it is good and useful, like adding const and = default, make_shared, etc.), re-inventing methods for doing things already supported, and debugging/tracing support for your understanding (to be removed).

The draft status was motivated by my intention to discuss the actual semantics and goals,
i.e. mitigate limited connections - or better, ensure connections are closed as well as
to reduce the risk of running out of sockets/file-handles.

Which brings us to the next point I guess.

It's very hard to read, as there is no clear theme or purpose.

When reading commit by commit, I believe everything is well split apart semantically.
I believed the purpose was well described in #9833 and the commit 3ef99c5
actually starting to address (matching this drafts' description) was meaningful.

Indeed, I added a few unrelated single commits into this review. Understanding that we allow
reasonable small changes w/o issue report, esp of intrinsic nature, I believe it should be OK.
(admin vs workflow compromise as discussed earlier)

I left various comments, but I didn't dive too deep in a number of areas because it was unclear why we are either replacing an existing approach, or inventing a new one when we can improve on the existing one (where there are shortcomings).

I am not aware of really replacing anything. I merely moved and added functionality.

At a minimum, I'd separate the backtrace improvements from the socket improvements, and likely the misc. improvements from new code/features/tests.

As described above, a zero side-effect change in its own commit.

The Backtrace object in particular is useful for all, as we can produce an instance
capturing the current stack, move it around and output where desired.
As discussed, I did not add addr2line or libunwind dependencies, which I understand ofc.

Finally, I'm troubled by vformatString and the re-working a lot of the socket states and internals with a huge set of new members that change the existing semantics with unclear benefits.

I just like printf formatting as it is more performant and flexible (at least to me).
IMHO I use it in a most reasonable save fashion, while being supported by the compiler to
remove the risk of wrong format specifiers and stack values. The latter is still a weakness, which I am aware of.

Perhaps there is some context I'm missing. Perhaps this was agreed on and I'm unaware.

Allow me to continue answering your other remarks below.

@sgothel sgothel force-pushed the private/sgothel/cool_9833_connection_limit branch from 2d51475 to 508ccb0 Compare August 28, 2024 19:01
@caolanm
Copy link
Contributor

caolanm commented Aug 29, 2024

I think it'll be easier to manage this if its split up so that the mostly uncontroversial misc cleanups of existing stuff where no change in logic is intended, the make_unique, make_shared, privator ctor -> = delete, constifying, expansion of auto& to real type, emplace_back, etc goes into a separate pr.

The backtrace debugging aid looks probably useful to have to me, but can also be its own pr I think (and probably not worth fighting the "CodeQL scanning - action / Analyze" step which apparently can't handle the constexpr use which is blocking ci from passing this pr).

And generally trim this pr down to the core change to make it easier to think about it

@Ashod
Copy link
Contributor

Ashod commented Aug 30, 2024

Thanks for the responses, @sgothel! We've discussed some offline, so didn't reply here (e.g. printf-style formatting being fast--in fact it's very slow compared to only serialization).

Indeed, I added a few unrelated single commits into this review. Understanding that we allow reasonable small changes w/o issue report, esp of intrinsic nature, I believe it should be OK. (admin vs workflow compromise as discussed earlier)

In general, agreed. But I'll echo @caolanm's comment here that breaking up into PRs, that can be decided to merge independent of other parts, is always helpful.

I should also point out that this PR is well above a 1000 lines. This is a rare size. Considering that some parts change the interface of a large hierarchy (the sockets), which is a fundamental core part of the whole project and very delicate (can easily break things catastrophically), it requires reading and studying far more code (i.e. the context and consequences) to review properly. An isolated and low-risk change of a few 100 lines is fine. A complex few 100 line change (such as in the sockets) is challenging and time consuming. Sometimes we have no choice because we can't break it up. But a 1000 lines that can be easily broken up is hardly necessary.

Finally, even though we can review individual commits, it's preferable to have a single theme to a PR. Helps the reviewer. In some cases a few misc. changes can go together (comment fixes, cosmetics, formatting, logging, etc.). But mixing functional changes with non-functional changes dilutes the process and can reduce the quality of the review. More so with high-risk changes (i.e. socket code) mixed with debugging code (i.e. backtracing).

Regardless, you have a lot of great improvements here, and I don't want to undermine the importance of your contribution!
Thank you.

@sgothel sgothel force-pushed the private/sgothel/cool_9833_connection_limit branch from 508ccb0 to cb801db Compare September 2, 2024 03:41
@sgothel
Copy link
Author

sgothel commented Sep 2, 2024

Thank you both for your review and kind words.

I have split up single commits into 4 branches, i.e. this pull-request branch and

For this feature work, I need to complete the unit test incl. the timeout configuration.
Looks OK .. shall provide it in 1-2 days.

net/HttpRequest.hpp Fixed Show fixed Hide fixed
net/Socket.cpp Fixed Show fixed Hide fixed
net/Socket.hpp Fixed Show fixed Hide fixed
net/WebSocketHandler.hpp Fixed Show fixed Hide fixed
net/WebSocketHandler.hpp Fixed Show fixed Hide fixed
net/WebSocketHandler.hpp Fixed Show fixed Hide fixed
@sgothel sgothel force-pushed the private/sgothel/cool_9833_connection_limit branch from cb801db to 6fc374e Compare September 6, 2024 11:28
net/Socket.cpp Fixed Show fixed Hide fixed
net/Socket.hpp Fixed Show fixed Hide fixed
net/Socket.hpp Fixed Show fixed Hide fixed
net/Socket.hpp Fixed Show fixed Hide fixed
@sgothel sgothel force-pushed the private/sgothel/cool_9833_connection_limit branch from 6fc374e to 31c1b94 Compare September 6, 2024 11:51
net/NetUtil.hpp Show resolved Hide resolved
@Ashod
Copy link
Contributor

Ashod commented Sep 6, 2024

@sgothel, there were multiple suggestions to split this large PR up to reduce its size. That was when it had 18 commits and was "only" a 1000 lines. Now it's past 1500 lines and is 1 commit! I'm sensing this isn't going in the direction of the feedback.

What's the plan, please?

test/UnitTimeoutNone.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@mmeeks mmeeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said needs to be split up; lets try to get the less controversial pieces in and merged as separate pull requests - to slim this down =) Can you work on that Sven ?

wsd/COOLWSD.cpp Outdated Show resolved Hide resolved
@sgothel
Copy link
Author

sgothel commented Sep 6, 2024

As said needs to be split up; lets try to get the less controversial pieces in and merged as separate pull requests - to slim this down =) Can you work on that Sven ?

will do the split later then (tomorrow) + finishing the tests

@sgothel
Copy link
Author

sgothel commented Sep 6, 2024

@sgothel, there were multiple suggestions to split this large PR up to reduce its size. That was when it had 18 commits and was "only" a 1000 lines. Now it's past 1500 lines and is 1 commit! I'm sensing this isn't going in the direction of the feedback.

What's the plan, please?

As I wrote, I will split out the config and TimeAverage (~400 lines or so) and put the remainder at the bottom -> 3 commits within this pull request. Perhaps more, w/ a fresh look.

But splitting this up in multiple pull request (PR) makes would probably makes no sense, as these pieces are required and each PR must be tested. IMHO.

Sven Göthel added 8 commits October 16, 2024 13:11
Preparation for cool#9833, to be used to provide a time-based
ping response time average.

TimeAverage allows us to calculate the time based average of a value
while adding new time-point/value tuples at arbitrary time-points.

The result is stable against small fluctuations
and shall provide a reasonable actionable criteria.

Signed-off-by: Sven Göthel <[email protected]>
Change-Id: I7e1a9329e0848c40a210f6250e29e26950da6fbc
…imeout)

Preparation for cool#9833.

Using runtime mutable defaults for network properties allows us to
test the socket limitation code as well as to fine-tune certain properties for clients.

Costs are removal of constexpr properties, using a static local (singleton) instance,
cached where used - hence rendering the code less optimized and more pessimistic.

However, the singleton instance is fetched at object creation w/o cache hit using
a static-local, networking timing outweights memory reads and no hot-spot
performance heavy loop is affected.
Hence rendering the feature testable at runtime shall outweight the lesser costs.

- All network limits and timeouts are configurable within our unit tests only
    - see net::Defaults in NetUtil.hpp

Signed-off-by: Sven Göthel <[email protected]>
Signed-off-by: Caolán McNamara <[email protected]>
Change-Id: I7e1a9329e0848c40a210f6250e29e26950da6fbc
…incl. creation- and lastSeenTime

Socket Restructuring:
- Unify stats(bytes + throughput)
- Persistent creationTime (immutable) and lastSeenTime (mutable)

Signed-off-by: Caolán McNamara <[email protected]>
Signed-off-by: Sven Göthel <[email protected]>
Change-Id: If9e3b28cfd4c6fdb73b978b80ef2063c63def6d1
moved `StreamSocket::_lastSeenHTTPHeader` to
`std::chrono::steady_clock::time_point &lastHTTPHeader`
where it is being required - reduces general footprint.

Signed-off-by: Sven Göthel <[email protected]>
Signed-off-by: Caolán McNamara <[email protected]>
Change-Id: I5148fa7ee2aa5ce88ef8bf31c2e036921d01277c
- ProtocolHandlerInterface::checkTimeout(..)
    is now called via StreamSocket::checkRemoval()
    to ensure tests even w/o input data (was: handlePoll())

- ProtocolHandlerInterface::checkTimeout(..)
    - Add bool return value: true -> shutdown connection, caller shall stop processing
    - Implemented for http::Session
    - Timeout (30s) net::Defaults::HTTPTimeout with missing response
    - Implemented for WebSocketHandler
    - Timeout (2s = net::Defaults::WSPingTimeout)
        after missing WS native frame ping/pong (server only)

- StreamSocket::checkRemoval(..)
    - called directly from SocketPoll::poll()
    - only for IPv4/v6 network connections
    - similar to ProtocolHandlerInterface::checkTimeout(..)
    - calls ProtocolHandlerInterface::checkTimeout(..)
    - added further criteria (throughput, ..)
    - Timeout (64s = net::Defaults::SocketPollTimeout)
        if (now - lastSeen) > timeout
    - Timeout (net::Defaults::MinBytesPerSec)
        if Throughput < MinBytesPerSec (disabled by default)
    - TODO: Add maximal IPv4/IPv6 socket-count criteria, drop oldest.
    - age via Socket::creationTime is currently not used as requested

- SocketPoll::poll()
    - Additionally erases if !socket->isOpen() || socket->checkRemoval()

Signed-off-by: Sven Göthel <[email protected]>
Signed-off-by: Caolán McNamara <[email protected]>
Change-Id: I7e1a9329e0848c40a210f6250e29e26950da6fbc
Adding connection limit where registered via SocketPoll::setLimiter(),
i.e. DocumentBrokerPoll and WebServerPoll only.

SocketPoll::poll() will drop _new_ overhead connections exceeding MaxConnections.
This has been discussed, in favor of dropping the oldest connections.

Aligned net::Defaults::MaxConnections w/ pre-existing MAX_CONNECTIONS.

SocketPoll::setLimiter():
- Increments given non-zero connectionLimit by one
  for WS upgrade socket.

Signed-off-by: Sven Göthel <[email protected]>
Change-Id: I7e1a9329e0848c40a210f6250e29e26950da6fbc
test/UnitTimeoutBase.hpp
  - Base for UnitTimeout* tests
test/UnitTimeoutNone.cpp
  - Test without limitation, no disconnection
test/UnitTimeoutConnections.cpp
  - Test disconnection with limited max-connections (5) via net::Defaults
test/UnitTimeoutWSPing.cpp
  - Test disconnection with limited WSPingTimout (20us) using WSPingPeriod (10ms) via net::Defaults
test/UnitTimeoutSocket.cpp
  - Test disconnection with limited MinBytesPerSec (100kBps) via net::Defaults

test/WebSocketSession's WebSocketSession::poll()
added a isConnected() check allowing to abort.

Signed-off-by: Sven Göthel <[email protected]>
Change-Id: I7e1a9329e0848c40a210f6250e29e26950da6fbc
Signed-off-by: Sven Göthel <[email protected]>
Signed-off-by: Caolán McNamara <[email protected]>
Change-Id: I9f7a6fbbf93381de72471ea878582f8e5db88962
@caolanm caolanm force-pushed the private/sgothel/cool_9833_connection_limit branch from 1f60a19 to 607a062 Compare October 16, 2024 12:12
@caolanm
Copy link
Contributor

caolanm commented Oct 16, 2024

StreamSocket::checkRemoval does something on (cNow || cIDLE || cMinThroughput || cTermination ) where:

cTermination is SigUtil::getTerminationFlag() which looks straightforward

cNow is cNow = now < getCreationTime() which is presumably never in practice

cMinThroughput depends on _minBytesPerSec getting set to non-zero which is not enabled by default (test/UnitTimeoutSocket.cpp uses it) so doesn't (currently) affect normal usage

cIDLE is durLast > _pollTimeout; where _pollTimeout is 64secs (preexisting value as DefaultPollTimeoutMicroS)

@caolanm caolanm merged commit 621b5c2 into master Oct 16, 2024
13 checks passed
@caolanm caolanm deleted the private/sgothel/cool_9833_connection_limit branch October 16, 2024 13:36
sgothel pushed a commit that referenced this pull request Oct 23, 2024
Regression injected with commit 0579ee8
of PR #9916

Original behavior was to do nothing in Socket::shutdown(),
while the commit in question did setClosed() leading
to an eventual socket close within SocketPoll::poll().

The latter causes an error within system ::ppoll().

Signed-off-by: Sven Göthel <[email protected]>
Change-Id: I57d46e0428905137e05d34dad450fea1d6a0f127
caolanm pushed a commit that referenced this pull request Oct 23, 2024
Regression injected with commit 0579ee8
of PR #9916

Original behavior was to do nothing in Socket::shutdown(),
while the commit in question did setClosed() leading
to an eventual socket close within SocketPoll::poll().

The latter causes an error within system ::ppoll().

Signed-off-by: Sven Göthel <[email protected]>
Change-Id: I57d46e0428905137e05d34dad450fea1d6a0f127
caolanm pushed a commit that referenced this pull request Oct 23, 2024
Regression injected with commit 0579ee8
of PR #9916

Original behavior was to do nothing in Socket::shutdown(),
while the commit in question did setClosed() leading
to an eventual socket close within SocketPoll::poll().

The latter causes an error within system ::ppoll().

Signed-off-by: Sven Göthel <[email protected]>
Change-Id: I57d46e0428905137e05d34dad450fea1d6a0f127
sgothel pushed a commit that referenced this pull request Oct 25, 2024
…` minimum throughput

PR #9916 added `MinBytesPerSec` minimum throughput criteria for checkRemoval
to limit connections, i.e. cleaning up unresponsiveness or lagging sockets.

Due to its unreliable nature per request, the criteria has been disabled per default
but remained in the code test w/ UT UnitTimeoutSocket.

Turns out that even the UT may fail on certain test setups.

We decided to remove this criteria altogether, as it will nor be used.

Signed-off-by: Sven Göthel <[email protected]>
Change-Id: I9212cdf8b87c1c7b0df5761db956cbaa4fd44f58
sgothel pushed a commit that referenced this pull request Oct 25, 2024
…` minimum throughput

PR #9916 added `MinBytesPerSec` minimum throughput criteria for checkRemoval
to limit connections, i.e. cleaning up unresponsiveness or lagging sockets.

Due to its unreliable nature per request, the criteria has been disabled per default
but remained in the code test w/ UT UnitTimeoutSocket.

Turns out that even the UT may fail on certain test setups.

We decided to remove this criteria altogether, as it will nor be used.

Signed-off-by: Sven Göthel <[email protected]>
Change-Id: I9212cdf8b87c1c7b0df5761db956cbaa4fd44f58
andreasma pushed a commit to freeonlineoffice/online that referenced this pull request Oct 26, 2024
Regression injected with commit 0579ee81c709865ea420c2d70dcc443bdf69de1a
of PR CollaboraOnline/online#9916

Original behavior was to do nothing in Socket::shutdown(),
while the commit in question did setClosed() leading
to an eventual socket close within SocketPoll::poll().

The latter causes an error within system ::ppoll().

Signed-off-by: Sven Göthel <[email protected]>
Change-Id: I57d46e0428905137e05d34dad450fea1d6a0f127
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Handle limited open Connections due to keepalive connections (cool#9621)
4 participants