Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balance received message handling between connections #99

Open
gavin-norman-sociomantic opened this issue Aug 1, 2017 · 8 comments
Open

Comments

@gavin-norman-sociomantic

The current connection (read) handling works as follows:

  1. All connections waiting to read are registered with epoll.
  2. epoll_wait reports that one or more connections have received something.
  3. For each connection that has received something:
    a. Check if at least one full message has been received. If not, re-register with epoll.
    b. If one or more full messages have been received, iterate over them, passing them to the appropriate request handlers.

The problem is at 3b. Given the various input buffers that are in use, it's possible for hundreds (even thousands -- in experiments we have seen this) of messages on a connection to have been received and now be ready for handling. So, step 3b could actually take quite some time to handle, leaving other connections and requests starved in the meantime.

This is not ideal asynchronous behaviour -- the assumption is that handling an event is quick, allowing other epoll clients to get their share of attention.

One idea to fix this problem would be to do the following:

  1. All connections waiting to read are registered with epoll.
  2. epoll_wait reports that one or more connections have received something.
  3. For each connection that has received something:
    a. Check if at least one full message has been received. If not, re-register with epoll.
    b. If one or more full messages have been received, set a flag indicating that they are waiting to be handled.
  4. Iterate over all connections which are flagged as having messages ready to be handled, handling one message per connection until all have been handled. In this way, the handling of messages is balanced between connections.
@gavin-norman-sociomantic
Copy link
Author

Iterate over all connections which are flagged as having messages ready to be handled, handling one message per connection until all have been handled.

The exact finishing condition here is open to debate. It may, for example make sense to also limit the number of messages handled (across all connections) per epoll cycle, so as to not overwhelm other epoll clients.

@gavin-norman-sociomantic
Copy link
Author

Iterate over all connections which are flagged as having messages ready to be handled, handling one message per connection until all have been handled. In this way, the handling of messages is balanced between connections.

This will require some means of performing an action after all epoll clients have been handled, before the next call to epoll_wait. To do this directly, we'll need to enhance EpollSelectDispatcher to add a more flexible system for specifying hooks, something like this.

Another way would be to rely on an event fd which is triggered every epoll cycle. It wouldn't be guaranteed to be handled at the end of the cycle, but simply processing pending connection messages once per cycle should be sufficient.

@nemanja-boric-sociomantic
Copy link
Contributor

Small disadvantage of using always-ready evetfd is doubling number of open file descriptors. However, it also solves the problem of starving other select clients, since we would go back to the epoll_wait frequently.

@gavin-norman-sociomantic
Copy link
Author

Why doubling? One event fd for the whole system, not one per connection.

@nemanja-boric-sociomantic
Copy link
Contributor

Ah, it wouldn't actually double it, yes, I was just going to write that.

@david-eckardt-sociomantic
Copy link
Contributor

It's possible for hundreds (even thousands -- in experiments we have seen this) of messages on a connection to have been received and now be ready for handling. So, step 3b could actually take quite some time to handle, leaving other connections and requests starved in the meantime.

The processing time is not the only problem. Without a way of suspending the read loop fiber while iterating over hundreds of received messages a client application can be flooded with received messages without a chance of processing them or throttling the input stream.
The situation is that the read loop fiber is suspended only after all messages in the MessageReceiver's buffer have been delivered to the requests. Suspending the read loop fiber may be needed

  • to send a message such as when the client application wants to suspend a streaming request,
  • for the client application to process the received messages.

Many client applications need an event loop cycle to process received records, for example because they schedule a task to send another record. But because the next event loop cycle is going to happen only after all of the hundreds of received messages have been delivered to the request handler, a high amount of scheduled tasks or outgoing messages can stack up in the application. It can be that many that they exhaust resources such as the capacity of task or message queues.

@gavin-norman-sociomantic
Copy link
Author

But because the next event loop cycle is going to happen only after all of the hundreds of received messages have been delivered to the request handler, a high amount of scheduled tasks or outgoing messages can stack up in the application. It can be that many that they exhaust resources such as the capacity of task or message queues.

Right, this ties into what I wrote above:

The exact finishing condition here is open to debate. It may, for example make sense to also limit the number of messages handled (across all connections) per epoll cycle, so as to not overwhelm other epoll clients.

If we implement this at some point, we'll need to experiment with different approaches to see what works best.

@gavin-norman-sociomantic
Copy link
Author

I wonder if this is necessary any more. The idea originated in our troubles with stream requests. These have now been (/ are being) re-implemented as batch requests.

@david-eckardt-sociomantic @nemanja-boric-sociomantic what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants