Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling Stop() on Initiator blocks on Dial method #653

Closed
abronan opened this issue Jul 15, 2024 · 0 comments · Fixed by #654
Closed

Calling Stop() on Initiator blocks on Dial method #653

abronan opened this issue Jul 15, 2024 · 0 comments · Fixed by #654

Comments

@abronan
Copy link
Contributor

abronan commented Jul 15, 2024

When calling Start() and then Stop() on the initiator with a connection that is failing or timing out, sending a SIGTERM prevents the program from exiting because of the call to Dial in handleConnection.

Reducing the SocketTimeout to a lower value mitigates this issue, since the call to Dial will exit early and then exit when checking on the stopChan in waitForReconnectInterval. However this is not a clean solution and prevents flexibility from consumers of the library to be able to stop the connect process immediately.

The reason why is that the handleConnection method is taking in a proxy.Dialer which doesn't contain a dialer.DialContext method required to cancel the connection mid-way with a cancellation signal.

As a solution, we can cast the proxy.Dialer to a proxy.ContextDialer and let the call to Stop() properly exit the connection process with the use of a context on handleConnection.

abronan added a commit to abronan/quickfix that referenced this issue Jul 15, 2024
This commit fixes an issue when calling Start() and then
Stop() on the initiator while the connection is likely
to fail and timeout. Sending a SIGTERM will fail since
Dial will attempt to connect until it times out and returns
on the 'waitForReconnectInterval' call.

We mitigate this problem by using a proxy.ContextDialer and
allowing to pass a context with cancellation method to the
dialer.DialContext method on 'handleConnection'.

We need to start a routine listening for the stopChan in
order to call cancel() explicitly and thus exit the DialContext
method.

Note: there are scenarios where cancel() will be called twice,
this choice was made in order to avoid a larger refactor of the
reconnect logic, but since the call to cancel() is idempotent,
this doesn't lead to any adverse effect.

fixes quickfixgo#653

Signed-off-by: Alexandre Beslic <[email protected]>
abronan added a commit to abronan/quickfix that referenced this issue Jul 18, 2024
This commit fixes an issue when calling Start() and then
Stop() on the initiator while the connection is likely
to fail and timeout. Calling initiator.Stop() will block since
Dial will attempt to connect until it times out and returns
on the 'waitForReconnectInterval' call.

We mitigate this problem by using a proxy.ContextDialer and
allowing to pass a context with cancellation method to the
dialer.DialContext method on 'handleConnection'.

We need to start a routine listening for the stopChan in
order to call cancel() explicitly and thus exit the DialContext
method.

Note: there are scenarios where cancel() will be called twice,
this choice was made in order to avoid a larger refactor of the
reconnect logic, but since the call to cancel() is idempotent,
this doesn't lead to any adverse effect.

fixes quickfixgo#653

Signed-off-by: Alexandre Beslic <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant