-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Connection lost with broadcast mode when one of the links is disconnected #2828
Comments
srt-xtransmit-B0-physical-disconnect-connection-loss@clime's comment:
PCAP for B0: no response from A0 starting at 12.440011942 (07:18:26.058145258) PCAP for B1: no data packets are sent from 12.1265 to 12.7568 (for ~600 ms) Wallclock 07:18:28.669548076. I don't see any connection loss on B1. DATA packets keep coming.
Full ACK comes with the seqno of almost the previously received DATA packet. Except for ACK 2007, which acknowledges data packet received at 11.388787. At that's around 1s after link A0--B0 was broken.
So ACK 2007 is sent by SRT 1.4 seconds later than it should be. |
@clime I still think we do need pcaps from both sides. Meaning, for example: You have machines A and B with links 0 and 1. You make the broadcast group with A0->B0 and A1->B1. In the test you physically break the A0->B0 link. In this case we need pcaps recorded from the devices A1 and B1. The other two could be useful, but not that important (we expect the link to break there), like for example to determine a coincidence with the link break. This pcap set (A1 and B1) is essential to see which packets have departed from a given device, but didn't arrive at the destination, in case when no distortion on this link was expected. |
Here are 2 pcap where we can see the connection lost : |
listener_2828.zip |
pcaps_logs_2828_3.zip
pcaps only show srt packets but I still have the full captures if needed. |
This issue can be reproduced pretty fast by specifying 2 IP to the caller, with one of them unreachable :
srt will connect with actual_ip. Connection through fake_ip will obviously fail. After a while, the connection through actual_ip will broke. This seems to only work over ethernet |
@ethouris I think yonmes has done the job already (thanks! : )) but I've uploaded some additional captures (from all the interfaces) to the slack thread. I have also tested 50Mbs transfer where the issue didn't seem to be reproducible. |
Ok, in the logs there's nothing - it looks like the code wasn't properly configured for logs. In the pcaps I can see one connection break with stopped transmission - but the reason that the transmission was stopped was because the ACK packet has reported no space left in the receiver buffer. That explains the stopped transmission. The reason for a broken connection and restoring attempt is unclear, but could be as well due to having been closed by the application. |
@ethouris You are talking about https://srtalliance.slack.com/files/U01C757PSG7/F06BVTHEQ6A/srt-xtransmit-redundancy-all-pcaps-lost-connection-b0-physical-disconnect.tar, right? So I guess the problem is in the size of receive buffers? Is this configurable somewhere? Btw. if somebody could check: |
I'm talking about the results that Yannick provided. Note that there are two distinct possible behaviors of the reader - one is when the receiver completely stopped reading, in which case there's nothing that the sender can do but to break, and reading the packets too slow, or having some temporary spike in the data reading, which could probably be mitigated by increasing the receiver buffer size. I'll check what is in these pcaps you provided. |
@yomnes0, @clime P.S. Please also note the Configuration Guidelines. |
Closing due to inactivity (possibly fixed). |
Setup:
In my testing setup, I have two machines (A, B) with 2 NICs each - so let's call these NICs: A0, A1 (machine A) and B0, B1 (machine B). A0, B0 NICs are connected directly by an ethernet cable and A1, B1 are connected through a switch.
Action:
Disconnected B0 NIC.
Listener's log on machine B:
Caller's log on machine A:
NOTES:
doesn't happen always. I've seen it happened twice from about 5 tries with the exactly same testing setup. In the other attempts, the connection stayed thanks to the remaining connected link.
Discussed at https://srtalliance.slack.com/archives/C79B8M2SZ/p1699464592483509
OS: linux
SRT Version / commit ID: tested with srt-xtransmit 27186a69. It links srt-1.5.3.rc0.
The text was updated successfully, but these errors were encountered: