Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] srt-live-transmit stops listening after SRT disconnect during no active input #2997

Open
Frenzie opened this issue Aug 5, 2024 · 7 comments
Labels
[apps] Area: Test applications related improvements help wanted Indicates that a maintainer wants help on an issue or pull request Type: Bug Indicates an unexpected problem or unintended behavior

Comments

@Frenzie
Copy link

Frenzie commented Aug 5, 2024

Describe the bug
srt-live-transmit stops listening on the configured port if the SRT connection was closed at a moment when there was no input

To Reproduce
Steps to reproduce the behavior:

  1. Start srt-live-transmit and stop sending input. Or to simulate, just pipe nothing into it: cat | srt-live-transmit -v file://con srt://127.0.0.1:5000?mode=listener
  2. Start a receiver: srt-live-transmit -v srt://127.0.0.1:5000?mode=caller udp://127.0.0.1:5001
  3. Stop the receiver

If you now try to reconnect or check with for example netstat -anu | grep 5000 you'll see the bind has disappeared.

Expected behavior
The listener never stops listening on the configured port.

Desktop (please provide the following information):

  • OS: Linux (Debian 12, Ubuntu 24.04)
  • SRT Version / commit ID: 1.5.3
@Frenzie Frenzie added the Type: Bug Indicates an unexpected problem or unintended behavior label Aug 5, 2024
@maxsharabayko maxsharabayko added [apps] Area: Test applications related improvements help wanted Indicates that a maintainer wants help on an issue or pull request labels Aug 5, 2024
@ethouris
Copy link
Collaborator

ethouris commented Jan 9, 2025

I can't reproduce it according to the given steps. When I kill the caller, and then run the same caller again without stopping the listener, it still gets connected, every single time.

There could be something specific, so please enable debug logs when testing. See docs/dev/making-srt-better.md for a rough manual.

@Frenzie
Copy link
Author

Frenzie commented Jan 9, 2025

On which system did you test? Something other than Linux? Additional debug logs don't show anything because it doesn't do anything after step 3.

[removed irrelevant logs]

Edit: pardon, that was accidentally the system copy. This is what it's doing. But it does so on every system I've tried, which is more (but not a lot more) than just those in the OP.

14:14:26.954100/SRT:GC D:SRT.in: GC: sleep 1 s
14:14:27.609872/SRT:RcvQ:w1 D:SRT.qr: INCOMING PACKET: FROM=127.0.0.1:33091 BOUND=127.0.0.1:5000 TARGET=@0 CONTROL: size=48 type=handshake HS: version=4 type=0x2 ISN=447435573 MSS=1500 FLW=8192 reqtype=induction srcID=214896679 cookie=0 srcIP=127.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.
14:14:27.609921/SRT:RcvQ:w1 D:SRT.cn: Got sockID=0 from 127.0.0.1:33091 - trying to resolve it as a connection request...
14:14:27.609930/SRT:RcvQ:w1 D:SRT.cn: RID: NO CONNECTOR FOR ADR:127.0.0.1:33091 while looking for A NEW CONNECTION REQUEST (0 connectors total)
14:14:27.609938/SRT:RcvQ:w1 D:SRT.cn: AsyncOrRND: no sockets expect connection from 127.0.0.1:33091 - POSSIBLE ATTACK, ignore packet
14:14:27.609944/SRT:RcvQ:w1 D:SRT.qr: worker: result for the unit: AGAIN
14:14:27.609951/SRT:RcvQ:w1 D:SRT.qr: worker: packet not dispatched, continuing reading.
14:14:27.860079/SRT:RcvQ:w1 D:SRT.qr: INCOMING PACKET: FROM=127.0.0.1:33091 BOUND=127.0.0.1:5000 TARGET=@0 CONTROL: size=48 type=handshake HS: version=4 type=0x2 ISN=447435573 MSS=1500 FLW=8192 reqtype=induction srcID=214896679 cookie=0 srcIP=127.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.
14:14:27.860141/SRT:RcvQ:w1 D:SRT.cn: Got sockID=0 from 127.0.0.1:33091 - trying to resolve it as a connection request...
14:14:27.860151/SRT:RcvQ:w1 D:SRT.cn: RID: NO CONNECTOR FOR ADR:127.0.0.1:33091 while looking for A NEW CONNECTION REQUEST (0 connectors total)
14:14:27.860159/SRT:RcvQ:w1 D:SRT.cn: AsyncOrRND: no sockets expect connection from 127.0.0.1:33091 - POSSIBLE ATTACK, ignore packet
14:14:27.860167/SRT:RcvQ:w1 D:SRT.qr: worker: result for the unit: AGAIN
14:14:27.860172/SRT:RcvQ:w1 D:SRT.qr: worker: packet not dispatched, continuing reading.
14:14:27.954211/SRT:GC D:SRT.sm: checkBrokenSockets: @508057009 closed 1000128.000000us ago and removed from RcvQ - will remove
14:14:27.954289/SRT:GC.N:SRT.sm: @508057009 busy=0
14:14:27.954310/SRT:GC D:SRT.ei: epoll/update: @508057009 -[R][W][E]: E1 TRACKING: @508057009:[R][E] 
14:14:27.954317/SRT:GC D:SRT.sm: GC/removeSocket: closing associated UDT @508057009
14:14:27.954322/SRT:GC D:SRT.sm: @508057009: closing socket
14:14:27.954330/SRT:GC D:SRT.sm: @508057009: close: SETTING ERR readiness on E[ 1 ]
14:14:27.954340/SRT:GC D:SRT.ei: epoll/update: @508057009 +[E]: E1 TRACKING: @508057009:[R][E] 
14:14:27.954346/SRT:GC D:SRT.sm: @508057009: close: CLEARING subscription on E1
14:14:27.954352/SRT:GC D:SRT.ea: srt_epoll_update_usock: REMOVED E1 socket @508057009
14:14:27.954359/SRT:GC D:SRT.sm: @508057009: close: removing E1 from back-subscribers
14:14:27.954366/SRT:GC D:SRT.sm: @508057009: CLOSING STATE. Acquiring connection lock
14:14:27.954371/SRT:GC D:SRT.sm: @508057009: CLOSING, removing from listener/connector
14:14:27.954379/SRT:GC D:SRT.sm: @508057009: CLOSING, joining send/receive threads
14:14:27.954388/SRT:GC D:SRT.sm: GC/removeSocket: DELETING SOCKET @508057009
14:14:27.954606/SRT:GC D:SRT.sm: GC/removeSocket: socket @508057009 DELETED. Checking muxer.
14:14:27.954614/SRT:GC D:SRT.sm: unrefing underlying muxer 508057010 for @508057009, ref=0
14:14:27.954618/SRT:GC D:SRT.sm: MUXER id=508057010 lost last socket @508057009 - deleting muxer bound to port 5000
14:14:27.954623/SRT:GC D:SRT.rs: RcvQueue: EXIT
14:14:27.956160/SRT:RcvQ:w1 D:SRT.qr: worker: EXIT
14:14:27.956332/SRT:GC D:SRT.rs: SndQueue: EXIT
14:14:27.956440/SRT:GC D:SRT.sm: checkBrokenSockets: after removal: m_ClosedSockets.size()=0
14:14:27.956456/SRT:GC D:SRT.in: GC: sleep 1 s
14:14:28.956577/SRT:GC D:SRT.sm: checkBrokenSockets: after removal: m_ClosedSockets.size()=0
14:14:28.956647/SRT:GC D:SRT.in: GC: sleep 1 s
14:14:29.956733/SRT:GC D:SRT.sm: checkBrokenSockets: after removal: m_ClosedSockets.size()=0
14:14:29.956773/SRT:GC D:SRT.in: GC: sleep 1 s
14:14:30.956834/SRT:GC D:SRT.sm: checkBrokenSockets: after removal: m_ClosedSockets.size()=0
14:14:30.956897/SRT:GC D:SRT.in: GC: sleep 1 s

@ethouris
Copy link
Collaborator

That test I did on WSL, I guess it should the same as a usual Linux (SuSE distro). Hard to know from upside what happens because various nuisances may matter here, including network settings, firewalls, or even some distinct system settings.

Not sure what logs you stated as "irrelevant", but I can see something with RID.

This RID (the name confusion dates back to UDT times, so don't try to pick it up logically ;) is the container that should manage two kind of things: rendezvous connections (for which it needs first the connecting rendezvous socket on the connection receiving side as well) and also normal connections to a listener in non-blocking mode - but this should be used on the caller side only. The appearance of these log lines then:

14:14:27.609921/SRT:RcvQ:w1 D:SRT.cn: Got sockID=0 from 127.0.0.1:33091 - trying to resolve it as a connection request...
14:14:27.609930/SRT:RcvQ:w1 D:SRT.cn: RID: NO CONNECTOR FOR ADR:127.0.0.1:33091 while looking for A NEW CONNECTION REQUEST (0 connectors total)

in particular, the second one after the first one, means that it has jumped over the branch where it tries to dispatch it to the listener because there's no listener here (queue.cpp, worker_ProcessConnectionRequest).

So yes, this confirms that the listener socket indeed has been closed, but I need to understand, what exactly has happened to make the listener socket closed, while the application is still working. I'd need the complete log for it, you can attach it as a large file if need be.

Maybe this: try first to run the listener side app with -v option and check if the application isn't explicitly closing the socket (when it does, it should display some error or additional information). Maybe there's some logic that leads to it somehow. To my best knowledge, listener sockets do not get closed by itself anyhow and can be only closed explicitly. If it isn't clear why it does, then do the same with logs on, and then I can check things around this.

@Frenzie
Copy link
Author

Frenzie commented Jan 10, 2025

That test I did on WSL, I guess it should the same as a usual Linux (SuSE distro).

The behavior as described in the OP is identical on OpenSUSE Tumbleweed 20250108 (srt 1.5.4 from zypper).

Not sure what logs you stated as "irrelevant", but I can see something with RID.

I accidentally posted the wrong logs, which I removed. Apologies for the confusion.

Maybe this: try first to run the listener side app with -v option

These are the logs that I had removed from the other comment:

$ cat | srt-live-transmit -v file://con srt://127.0.0.1:5000?mode=listener
Media path: 'file://con' --> 'srt://127.0.0.1:5000?mode=listener'
SRT parameters specified:

        mode = 'listener'
Opening SRT target listener on 127.0.0.1:5000
14:07:04.385907/srt-live-transm D:SRT.sm: generateSocketID: : @189688441
Binding a server on 127.0.0.1:5000 ...
14:07:04.385966/srt-live-transm D:SRT.km: CHANNEL: Bound to local address: 127.0.0.1:5000
 listen...
14:07:04.465002/SRT:RcvQ:w1.N:SRT.cn: PASSING request from: 127.0.0.1:45388 to agent:189688441
14:07:04.465160/SRT:RcvQ:w1.N:SRT.cn: Listener managed the connection request from: 127.0.0.1:45388 result:waveahand
14:07:04.465260/SRT:RcvQ:w1.N:SRT.cn: PASSING request from: 127.0.0.1:45388 to agent:189688441
14:07:04.465309/SRT:RcvQ:w1 D:SRT.sm: generateSocketID: : @189688440
14:07:04.465327/SRT:RcvQ:w1.N:SRT.cn: HSREQ/rcv: cmd=1(HSREQ) len=12 vers=0x10503 opts=0xbf delay=120
14:07:04.465912/SRT:RcvQ:w1.N:SRT.cn: @189688441: listen ret: -1 - conclusion
14:07:04.465917/SRT:RcvQ:w1.N:SRT.cn: Listener managed the connection request from: 127.0.0.1:45388 result:waveahand
 accept... 
 connected.
Accepted SRT target connection

That's the full extent of the regular logs. Nothing ever happens thereafter.

Here's the full debug log: log.txt

@ethouris
Copy link
Collaborator

Wait. I have a feeling that your procedure when producing this above log doesn't exactly match the aforementioned repro procedure.

In comparison to the log that I have, yours doesn't even show any case of the second connection attempt. There's one connection, which is later broken, and then it looks like the listener was interrupted in the console.

Could you please repeat this?

@Frenzie
Copy link
Author

Frenzie commented Jan 13, 2025

In comparison to the log that I have, yours doesn't even show any case of the second connection attempt. There's one connection, which is later broken, and then it looks like the listener was interrupted in the console.

Well of course there's no second connection attempt. :-) It stops listening after the connection is broken. That's the very problem!

@Frenzie
Copy link
Author

Frenzie commented Jan 13, 2025

In short, if you have input going at the moment of disconnect, the bound port is never released (or immediately rebound?) and all is well. This can be verified with netstat -anu | grep 5000, or simply by the fact that an SRT caller can make a connection.

If there doesn't happen to be any input at the moment of disconnect, the port is released while the program continues running as if all is well. This can be verified with netstat -anu | grep 5000. As a result, there can't be a connection attempt.

This problem only exists with file://con as input, not with for example UDP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[apps] Area: Test applications related improvements help wanted Indicates that a maintainer wants help on an issue or pull request Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants