Possible HCI.Poll() bug with _recvIndex and _recvBuffer #73

mattleesmi · 2020-05-03T18:01:19Z

Hello,

I came across an occasional crash while using BLE/HCI.poll(). The crash occurs when _recvIndex goes above the size of _recvBuffer. The crash in my tests occurs when _recvIndex = 533.

I explained how I came across it in this Arduino forum post: https://forum.arduino.cc/index.php?topic=680797.0 but the long and the short of it is that it seems to happen after receiving an HCI_ACLDATA_PKT (possibly without being connected to anything) which leads to _recvIndex++ over and over without being caught and reset by the if statements and reset (not sure why). The basic fix I currently have running in HCI.cpp within the while (HCITransport.available()) loop is:

if (_recvIndex > 257) { _recvIndex = 0; if (_debug) { _debug->println(); _debug->println("***Overflow Catch***"); _debug->println(); } }

Not groundbreaking but seems to be stopping the crash.

The "overflow catch" is still occurring in my tests, however it may be something else. Please let me know if more information would help or if I am missing something obvious.

Thanks,
Matt

The text was updated successfully, but these errors were encountered:

polldo · 2020-07-29T11:23:07Z

Hi @mattleesmi ,
Are you using a nano 33 ble?
This issue is similar to this #102
If too much time passes between calls to BLE.poll() function, it can happen that some bytes are dropped. This can lead to unpredictable results: for example, as in your case, the length of a wrong packet (with byte drop) could be larger than the _recvBuffer size and this would lead to the overflow you have experienced.

mattleesmi · 2020-07-29T12:40:54Z

Hello @polldo

Yes, I am using a BLE and IoT, and it does seem similar to #102. I have stepped away from the work using the boards for a few months now but I hope to return to it soon so I can look this further as well as the #70 issue I have been encountering.

I will report back when I can!

Thank you for the input.
Matt

mattleesmi · 2021-06-23T12:14:33Z

Hello,

Sorry for the very late update, I was running more tests on this and I am still getting crashes with the latest version with the same issues where _recvIndex is going above 258 and crashing at 533.

I am testing a catch if statement under the while loop like this

    while (HCITransport.available()) {
        byte b = HCITransport.read();
        _debug->print("_recvIndex: "), _debug->println(_recvIndex);
        if(_recvIndex == 258) _recvIndex = 0;

It seems that somehow the else statement of void HCIClass::poll is being skipped somehow.

Thanks,
Matt

polldo · 2021-06-23T12:30:04Z

Hi @mattleesmi
do you have this problem on the nano 33 ble only?
If so, please check if this patch solves your issue #96

mattleesmi · 2021-06-23T12:43:43Z

Hey @polldo

So far I have only had this problem on the IoT's, as far as I can see my little tweak is working at the moment. Is that patch of IoT's too? If so how do I integrate it?

Thanks
Matt

polldo · 2021-06-23T12:58:50Z

No, it's a nano33ble specific patch.
Anyway, I took a look at your sketch. My suspicion is that too much time is spent between BLE updates, so the controller's queue becomes full and some packets are dropped.

The BLE should be continuously updated, by executing BLE.poll(). Try to execute this poll statement at each loop, so for example change your code into:

void loop() {
      BLE.poll();
      ....
}

mattleesmi · 2021-06-23T13:00:31Z

With my current code setup it is running about 140 polls per second (have been tracking it), is that too little?

edit: I took a look again as I had switch it off for a bit, it still most gets around 140 or more although sometimes it dips to quite low numbers

mattleesmi · 2021-11-04T01:10:00Z

Small update:

I was trying things out again and I found myself in a very noisy Bluetooth room that was causing very frequent crashes even with quite a lot of polls per second.

I did a bit more digging in HCI.cpp and I found that one of the main causes what the poll was picking up ACL packets that were too big (this may be linked to the other stuff that has been mentioned with regards to the buffer not being read enough)

So from what I have managed to understand is that this piece of code:
if ((_recvIndex > 5) && _recvIndex >= (5 + (_recvBuffer[3] + (_recvBuffer[4] << 8))))
is being used to determine if there is enough of an ACL packet to read and what the size of that packet is. However when I started to read the size of the packets some of them were way too big for the array. What was happening was that (5 + (_recvBuffer[3] + (_recvBuffer[4] << 8)))) was really big, but _recvBuffer[0] was still equal to HCI_ACLDATA_PKT so the index just goes up and up until it crashes.

So I put a separate if statement that looks like this:

if (_recvIndex > 5 && ((5 + (_recvBuffer[3] + (_recvBuffer[4] << 8))) > (sizeof(_recvBuffer) - 1))) {
                _recvIndex = 0;
                if (_debug) {
                    _debug->print("ACL too big: "), _debug->println((5 + (_recvBuffer[3] + (_recvBuffer[4] << 8))));
                }
            }

With this, I would get results like: ACL too big: 21920. As you can see I tried to catch this and reset the index. This seems to work but there are still crashes. Anecdotally these crashes seem to happen when the code tries to handle the packet but I am not sure.

A similar thing did occur with RX events but it was less frequent, although sometimes I did get large RX events like this:

HCI EVENT RX <- 043EFA827339E132A4043EB1964DD52DA2043EB1964DD59E043E2EB1964DD52DA3043E4DD52D2728A3043E2A02010001643E983370D71E0201061AFF4C000215F0910DA64FA24E988024BC5B71E0893E0002017EB8B3043E2202010401643E983370D71607094C4453313334020AF80A160DD03275466D343151B3043E28020102019DBE2EF627551C03036FFD17166FFD75046B93FD537784D430D0F1B2FA827339E13246A2043E0C020104019DBE2EF6275500A0043E2D2728DDA1043E2A02010001643E983370D71E0201061AFF4C000215F0910DA64FA24E988024BC5B71E0893E0002017EB8B3043E2202010401643E983370D71607094C445331 Size:253

Now I might be wrong but that feels like several events mashed up into one as most of the time they are between 45-150 characters and there is a hard limit of 246 (or thereabouts) in the reading capacity of the BLE #70

Is it possible that the if statements at the moment are not flexible enough to deal with data that is coming in "wrong"?

I have uploaded the full HCI.poll that I have been playing with but I am sure that the issue is deeper than this.

If this is of interest I can try to use the extra catches and see what it looks like when it still crashes.

HCI_Poll_Test.txt

facchinm mentioned this issue Jul 2, 2020

Transferring 512 byte packets of data #70

Closed

polldo added the status: waiting for information More information must be provided before work can proceed label Jul 29, 2020

polldo mentioned this issue Oct 26, 2020

[mbed boards] Packet drops cause failures #130

Open

per1234 added the type: imperfection Perceived defect in any part of project label Mar 9, 2021

dlktdr mentioned this issue Mar 22, 2021

Bluetooth Para will intermittently disconnect while GUI is connected dlktdr/HeadTracker#6

Closed

mattleesmi mentioned this issue Dec 16, 2021

Possible signal corruption from other code/modules + filter code for RX #213

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible HCI.Poll() bug with _recvIndex and _recvBuffer #73

Possible HCI.Poll() bug with _recvIndex and _recvBuffer #73

mattleesmi commented May 3, 2020

polldo commented Jul 29, 2020

mattleesmi commented Jul 29, 2020

mattleesmi commented Jun 23, 2021

polldo commented Jun 23, 2021

mattleesmi commented Jun 23, 2021

polldo commented Jun 23, 2021

mattleesmi commented Jun 23, 2021 •

edited

Loading

mattleesmi commented Nov 4, 2021 •

edited

Loading

Possible HCI.Poll() bug with _recvIndex and _recvBuffer #73

Possible HCI.Poll() bug with _recvIndex and _recvBuffer #73

Comments

mattleesmi commented May 3, 2020

polldo commented Jul 29, 2020

mattleesmi commented Jul 29, 2020

mattleesmi commented Jun 23, 2021

polldo commented Jun 23, 2021

mattleesmi commented Jun 23, 2021

polldo commented Jun 23, 2021

mattleesmi commented Jun 23, 2021 • edited Loading

mattleesmi commented Nov 4, 2021 • edited Loading

mattleesmi commented Jun 23, 2021 •

edited

Loading

mattleesmi commented Nov 4, 2021 •

edited

Loading