[core] Fix FlowWindowSize managment when sender drops packets #2815

yomnes0 · 2023-10-26T14:25:58Z

No description provided.

yomnes0 · 2023-10-27T14:28:58Z

When the sender drops packets, it generate a fake ack. In some cases, it could end up in a situation in which its ackseqno was always higher than the one from the receiver. Because of that, ack were ignored and the FlowWindowSize was never update.

Without this fix, when doing a 50mbps transmission at 1200s latency over a 21% loss network with a 200ms RTTT, we would end up with 50% unrecovered packets. With this fix, this goes down to below 1%, which is in line with other cases in which is close to other cases in which the problem didn't appear

yomnes0 · 2023-10-30T15:56:09Z

Here are some tests results :

With the fix, default buffer size, 100ms RTT network and 25% loss packets

Latency	Lost Packets	Droped Packets	Late packets	Unrecovered packets percentage
1000ms	70702	5	86	0.13
1200ms	71290	2	66	0.09
1300ms	71661	64	58	0.17
1400ms	81443	688	85	0.94

With the fix, 1000000000 recv buffer size, 100ms RTT network and 25% loss packets

Latency	Lost Packets	Droped Packets	Late packets	Unrecovered packets percentage
1000ms	71199	2	77	0.11
1200ms	70753	0	69	0.10
1300ms	71165	0	55	0.05
1400ms	71235	0	13	0.02

With the fix, 200ms RTT network and 25% lost packets. Increasing the receiving buffer improves the results the same way it did with the 100ms RTT network

Latency	Lost Packets	Droped Packets	Late packets	Unrecovered packets percentage
1000ms	71065	1075	233	1.84
1200ms	70968	366	46	0.58
1300ms	72389	341	55	0.55
1400ms	75285	890	182	1,42

For reference, here are the results I had without the fix, in a network with 200ms RTT network and 21% losses (I didn't go up to 25% but results are still speaking for themselves). WIth latency greater than 1200ms, results were the same than at 1200 (>=50% unrecovered packets)

Latency	Lost Packets	Droped Packets	Late packets	Unrecovered packets percentage
1000ms	59238	349	134	0.82
1200ms	60574	30980	41	51,21

maxsharabayko · 2023-11-01T10:23:22Z

Results with the fix are collected in a 100 ms RTT network with 25% packet loss and a 200 ms network with 25% packet loss.
Results without the fix are collected in a 200 ms RTT network with 21% packet loss.

Comparing results without the fix (200 ms RTT, 21% packet loss) and without the fix (200 ms RTT, 25% packet loss) there is a lot more dropped packets with the fix at 1s latency (1075 vs 349), but a lot less at 1.2 s latency (366 vs 30980).

Although the results do not provide a solid base for comparison, it looks like the fix uses the available size of the receiver buffer more efficiently.

ethouris · 2023-11-01T10:53:30Z

What I can see what m_iFlowWindowSize field designates, it looks like it's the size of the receiver buffer of the peer in packets.

How can then dropping from sender buffer virtually increase the free space in the peer's receiver buffer?

I understand that it's just a "statement" that these same packets that are dropped on the sender side will be also dropped (independently) on the receiver side when the time to play a packet following the loss comes. The problem is, we don't exactly know when this is going to happen. The perfect solution would be to define the exact time, that would be "peer's play time minus RTT" of the packet that closes the earliest gap, as the time up to which the packets on the sender side should be dismissed. This could at least replicate the phenomena happening on the receiver side at this moment, or very close to it.

Don't forget that people have been complaining here that they have much more drops if the sender dropping is enabled. I added this option to configure the sender dropping time mainly for development needs, to see by adjusting this settings you can have less drops on the same link.

yomnes0 · 2023-11-01T12:05:53Z

Results with the fix are collected in a 100 ms RTT network with 25% packet loss and a 200 ms network with 25% packet loss. Results without the fix are collected in a 200 ms RTT network with 21% packet loss.

Comparing results without the fix (200 ms RTT, 21% packet loss) and without the fix (200 ms RTT, 25% packet loss) there is a lot more dropped packets with the fix at 1s latency (1075 vs 349), but a lot less at 1.2 s latency (366 vs 30980).

Although the results do not provide a solid base for comparison, it looks like the fix uses the available size of the receiver buffer more efficiently.

I'll add some tests results with the fix at 21% loss

maxsharabayko · 2023-11-01T12:29:29Z

@ethouris

I understand that it's just a "statement" that these same packets that are dropped on the sender side will also be dropped (independently) on the receiver side when the time to play a packet following the loss comes. The problem is, we don't exactly know when this is going to happen.

The assumption is that the TL packet drop on the sender side happens later, than the corresponding TL packet drop on the receiver, given some later packet has been received. Even assuming no later packets have been received, still sending the following packets would eventually trigger TL packet drop on the receiver anyway, freeing up space for further packets.
If the receiver fails to receive any packet for a descent time resulting in the SND seqno being much further from the RCV seqno, the sequence discrepancy would be eventually detected. The connection is to be broken by the sequence discrepancy, or by the idle (no response) timeout.

maxsharabayko · 2023-11-01T14:42:08Z

Ran tests to determine the RCV buffer size leading to a 0% packet drop.

50 Mbps, 100 ms RTT, 25% packet loss, SRT latency 1.3 seconds.

The target RCV buffer size recommended by the Configuration Guidelines is 9436992 bytes.
For 130 ms RTT it would be 9541504 bytes.

The value to recommend based on the results below is 11936992 (+26%).

SRT master.

RCVBUF:  9436992 pktRcvUnique 72750, pktRcvDrop: 20131 (27.67%)
RCVBUF:  9936992 pktRcvUnique 70255, pktRcvDrop: 20075 (28.57%)
RCVBUF: 10436992 pktRcvUnique 74912, pktRcvDrop: 15810 (21.1%)
RCVBUF: 10936992 pktRcvUnique 82331, pktRcvDrop: 10879 (13.21%)
RCVBUF: 11436992 pktRcvUnique 72003, pktRcvDrop: 19024 (26.42%)
RCVBUF: 11936992 pktRcvUnique 94710, pktRcvDrop:     4 (0%)
RCVBUF: 12436992 pktRcvUnique 94602, pktRcvDrop:     1 (0%)
RCVBUF: 12936992 pktRcvUnique 94651, pktRcvDrop:     0 (0%)
RCVBUF: 13436992 pktRcvUnique 94560, pktRcvDrop:     0 (0%)

SRT with the FIX (PR 2815)

RCVBUF:  9436992 pktRcvUnique 89856, pktRcvDrop: 3393 (3.78%)
RCVBUF:  9936992 pktRcvUnique 93241, pktRcvDrop: 1268 (1.36%)
RCVBUF: 10436992 pktRcvUnique 91598, pktRcvDrop: 1556 (1.7%)
RCVBUF: 10936992 pktRcvUnique 91747, pktRcvDrop:  727 (0.79%)
RCVBUF: 11436992 pktRcvUnique 94330, pktRcvDrop:   17 (0.02%)
RCVBUF: 11936992 pktRcvUnique 94687, pktRcvDrop:    1 (0%)
RCVBUF: 12436992 pktRcvUnique 94616, pktRcvDrop:    4 (0%)
RCVBUF: 12936992 pktRcvUnique 94605, pktRcvDrop:    2 (0%)
RCVBUF: 13436992 pktRcvUnique 94391, pktRcvDrop:    0 (0%)

50 Mbps, 100 ms RTT, 25% packet loss, SRT latency 1.0 seconds.

The target RCV buffer size recommended by the Configuration Guidelines is 7339392 bytes.

The value to recommend based on the results below is 10339392 (+40%).

SRT master

RCVBUF:  7339392 pktRcvUnique 70568, pktRcvDrop: 20988 (29.74%)
RCVBUF:  7839392 pktRcvUnique 72596, pktRcvDrop: 20726 (28.55%)
RCVBUF:  8339392 pktRcvUnique 71079, pktRcvDrop: 20844 (29.33%)
RCVBUF:  8839392 pktRcvUnique 94360, pktRcvDrop:    92 (0.1%)
RCVBUF:  9339392 pktRcvUnique 94674, pktRcvDrop:    33 (0.03%)
RCVBUF:  9839392 pktRcvUnique 94463, pktRcvDrop:     3 (0%)
RCVBUF: 10339392 pktRcvUnique 94710, pktRcvDrop:     2 (0%)
RCVBUF: 10839392 pktRcvUnique 94752, pktRcvDrop:     2 (0%)
RCVBUF: 11339392 pktRcvUnique 94522, pktRcvDrop:     1 (0%)
RCVBUF: 11839392 pktRcvUnique 94652, pktRcvDrop:     3 (0%)

SRT with the fix

RCVBUF:  7339392 pktRcvUnique 91678, pktRcvDrop: 2682 (2.93%)
RCVBUF:  7839392 pktRcvUnique 92193, pktRcvDrop: 1925 (2.09%)
RCVBUF:  8339392 pktRcvUnique 94336, pktRcvDrop:  312 (0.33%)
RCVBUF:  8839392 pktRcvUnique 94530, pktRcvDrop:   69 (0.07%)
RCVBUF:  9339392 pktRcvUnique 94437, pktRcvDrop:    8 (0.01%)
RCVBUF:  9839392 pktRcvUnique 94499, pktRcvDrop:   13 (0.01%)
RCVBUF: 10339392 pktRcvUnique 94560, pktRcvDrop:    1 (0%)
RCVBUF: 10839392 pktRcvUnique 94552, pktRcvDrop:    1 (0%)
RCVBUF: 11339392 pktRcvUnique 94737, pktRcvDrop:    0 (0%)
RCVBUF: 11839392 pktRcvUnique 94500, pktRcvDrop:    3 (0%)

SRT with the fix and `SRTO_SNDDROPDELAY=-1`

RCVBUF:  7339392 pktRcvUnique 72603, pktRcvDrop: 20784 (28.63%)
RCVBUF:  7839392 pktRcvUnique 86328, pktRcvDrop:  7383 (8.55%)
RCVBUF:  8339392 pktRcvUnique 93663, pktRcvDrop:   128 (0.14%)
RCVBUF:  8839392 pktRcvUnique 94510, pktRcvDrop:   174 (0.18%)
RCVBUF:  9339392 pktRcvUnique 94515, pktRcvDrop:    29 (0.03%)
RCVBUF:  9839392 pktRcvUnique 94607, pktRcvDrop:     6 (0.01%)
RCVBUF: 10339392 pktRcvUnique 94403, pktRcvDrop:     0 (0%)
RCVBUF: 10839392 pktRcvUnique 94771, pktRcvDrop:     2 (0%)
RCVBUF: 11339392 pktRcvUnique 94710, pktRcvDrop:     2 (0%)
RCVBUF: 11839392 pktRcvUnique 94790, pktRcvDrop:     0 (0%)

ethouris · 2023-11-01T15:58:46Z

The assumption is that the TL packet drop on the sender side happens later, than the corresponding TL packet drop on the receiver, given some later packet has been received. Even assuming no later packets have been received, still sending the following packets would eventually trigger TL packet drop on the receiver anyway, freeing up space for further packets.

My point is that on the receiver side the dropping is triggered at the very specific packet - the packet that follows the loss. Dropped are only the lost packets irrecovered at the moment of drop check. So on the sender side at the moment of receiving the loss report you know only one thing for sure - that these cells in the receiver buffer will have been freed exactly at the moment when the "loss coverer" packet is ready to play (actually when retrieved by the application, but we believe that it will be extracted immediately). Whether any packets were recovered up to this moment is unimportant because we believe that every packet preceding the play-ready packet will be removed from the buffer at the play time anyway.

The problem is: packets removed as "too late" from the sender buffer still have their "corresponding cells" active in the peer's receiver buffer after dropping. When these same cells will be removed from the peer's receiver buffer is another story.

This simply means that a sender's prediction that by removing some range of packets from the sender buffer it gets more free space in the peer's receiver buffer at the moment when this dropping happens only increases the risk that the newest packets will be dropped on reception in case when the buffer is almost full and the sender has "tricked itself" that the peer will have the space to accommodate the packet it is about to send.

The problem isn't whether the cells in the peer's receiver buffer will be freed "eventually", but whether they will be really freed at the very moment when the new packet is being sent and the free tokens in the flight window are checked for green light.

If the receiver fails to receive any packet for a descent time resulting in the SND seqno being much further from the RCV seqno, the sequence discrepancy would be eventually detected. The connection is to be broken by the sequence discrepancy, or by the idle (no response) timeout.

This is happening from a completely different reason, although actually tricking the flight window size is only increasing the probability for this to happen. If there's a drop-on-reception once, the only way how it can be avoided is to quickly make space in the receiver buffer.

BTW. if there was at least one problem that has been solved by increasing the flight window tricky way and temporarily (only up to the next incoming ACK), wouldn't increasing the receiver buffer size do exactly the same thing, just without dangerous tricking?

maxsharabayko · 2023-11-02T09:51:06Z

@ethouris

The problem is: packets removed as "too late" from the sender buffer still have their "corresponding cells" active in the peer's receiver buffer after dropping. When these same cells will be removed from the peer's receiver buffer is another story.

The reality is the sender does not know if those cells are free or not, that's true.
At the same time, the sender should drop later than the receiver (this is still to be improved within #1910). Thus it can be assumed if it decides to drop those packets, they are to be dropped/read from the receiver by the time of this event. The only thing I see here is that if the receiving application not reading further, then those packets would still be in the buffer. But we should try to optimize the conventional case when the receiving app reads every time there is something available to be read.

[core] Fix FlowWindowSize managment when sender drops packets

28f80f6

maxsharabayko added this to the v1.6.0 milestone Oct 26, 2023

maxsharabayko added the [core] Area: Changes in SRT library core label Oct 26, 2023

maxsharabayko approved these changes Nov 1, 2023

View reviewed changes

maxsharabayko added the Type: Bug Indicates an unexpected problem or unintended behavior label Nov 1, 2023

maxsharabayko mentioned this pull request Nov 1, 2023

[core] Revise SND DROP logic #1910

Open

maxsharabayko merged commit cb362ee into Haivision:master Nov 3, 2023
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Fix FlowWindowSize managment when sender drops packets #2815

[core] Fix FlowWindowSize managment when sender drops packets #2815

yomnes0 commented Oct 26, 2023

yomnes0 commented Oct 27, 2023

yomnes0 commented Oct 30, 2023 •

edited

Loading

maxsharabayko commented Nov 1, 2023

ethouris commented Nov 1, 2023

yomnes0 commented Nov 1, 2023 •

edited

Loading

maxsharabayko commented Nov 1, 2023

maxsharabayko commented Nov 1, 2023 •

edited

Loading

ethouris commented Nov 1, 2023 •

edited

Loading

maxsharabayko commented Nov 2, 2023

[core] Fix FlowWindowSize managment when sender drops packets #2815

[core] Fix FlowWindowSize managment when sender drops packets #2815

Conversation

yomnes0 commented Oct 26, 2023

yomnes0 commented Oct 27, 2023

yomnes0 commented Oct 30, 2023 • edited Loading

maxsharabayko commented Nov 1, 2023

ethouris commented Nov 1, 2023

yomnes0 commented Nov 1, 2023 • edited Loading

maxsharabayko commented Nov 1, 2023

maxsharabayko commented Nov 1, 2023 • edited Loading

50 Mbps, 100 ms RTT, 25% packet loss, SRT latency 1.3 seconds.

SRT master.

SRT with the FIX (PR 2815)

50 Mbps, 100 ms RTT, 25% packet loss, SRT latency 1.0 seconds.

SRT master

SRT with the fix

SRT with the fix and SRTO_SNDDROPDELAY=-1

ethouris commented Nov 1, 2023 • edited Loading

maxsharabayko commented Nov 2, 2023

yomnes0 commented Oct 30, 2023 •

edited

Loading

yomnes0 commented Nov 1, 2023 •

edited

Loading

maxsharabayko commented Nov 1, 2023 •

edited

Loading

SRT with the fix and `SRTO_SNDDROPDELAY=-1`

ethouris commented Nov 1, 2023 •

edited

Loading