Feasibility to use it in long-lived connections #16

glerchundi · 2020-09-03T17:34:23Z

We're thinking on using Golden Gate as our communication framework for all of our IoT devices.
After we did a successful (and enjoyable!) first pass of the framework we're following up with a second & deeper research and found some doubts that we would like to address before proceeding.

We saw different signals that make us think that the framework is prepared for one-shot communications like those that usually happen with BLE & mobile apps. In our use case we've those kind of scenarios but we also have others like ones that happen through long-lived connections. These persistent connections require special attention at different layers like reestablish DTLS session without doing handshakes (pion/dtls#264) and/or have streaming requests (CoAP observe pattern, for example).

The framework cannot be more pluggable than it is right now with GG_Data{Sink,Source}, components we could use to cook our own layers and overtake the limitations but I would like to know which is the short-term vision on this matter in Golden Gate? Do you have use cases like the one I mentioned?

The text was updated successfully, but these errors were encountered:

barbibulle · 2020-09-04T02:05:13Z

Hi.
The framework is most definitely designed for long-lived connections. In fact, that is what we use at Fitbit for all our connected watches and trackers, which stay connected to the user's phone 24/7 (when in range of course). Millions of devices are connected that way right now.
The ability to seamlessly get disconnected and automatically reconnected is something we ensure works as well as possible, and there's some support included to allow the detection of bad conditions where, for instance, the BLE channel may become unresponsive (many bugs on mobile phones in different phone models and OS versions...), and recover from it (for example, a callback when the low level link is stalled, which allows the application to take action, like maybe ask for a disconnection and wait for a reconnection, which has the chance to bring things back to a clean state).
Regarding CoAP observe, this is something we have experimented with, but so far decided not to use, because there are some downsides to it, mostly because it is onerous for a small CoAP server to keep track of subscribers that may disappear at any time. We have found more efficient ways of achieving equivalent outcomes, using patterns like the CoAP event emitter class that's included in the framework. This may also be something that would work for your use case.

glerchundi · 2020-09-07T10:30:13Z

The framework is most definitely designed for long-lived connections. In fact, that is what we use at Fitbit for all our connected watches and trackers, which stay connected to the user's phone 24/7 (when in range of course). Millions of devices are connected that way right now.

That's really interesting, thanks for sharing.

The ability to seamlessly get disconnected and automatically reconnected is something we ensure works as well as possible, and there's some support included to allow the detection of bad conditions where, for instance, the BLE channel may become unresponsive (many bugs on mobile phones in different phone models and OS versions...), and recover from it (for example, a callback when the low level link is stalled, which allows the application to take action, like maybe ask for a disconnection and wait for a reconnection, which has the chance to bring things back to a clean state).

Ok, this means that there is somehow a coupling between layers right? I mean, a signal in the lower layer should be somehow forwarded to the upper layers in order to take a decision. This doesn't broke the idea behind the decision of using GG_Data{Sink,Source} abstraction in order to keep layers decoupled?

Regarding CoAP observe, this is something we have experimented with, but so far decided not to use, because there are some downsides to it, mostly because it is onerous for a small CoAP server to keep track of subscribers that may disappear at any time.

Yep, we have had the same concerns and we decided to limit the observable paths/sessions/... to an achievable (in terms of codesize&ram) and finite number of them.

We have found more efficient ways of achieving equivalent outcomes, using patterns like the CoAP event emitter class that's included in the framework. This may also be something that would work for your use case.

Umm interesting, thanks for the pointer. As far as I can see in the code it seems like the IoT device is behaving as server & client at the same time, right? CoAP server for requests started in the cloud/mobile and CoAP client for requests (events in this case) started in the IoT devices, those two converged into the same DTLS pipe?

I wanted to share with you a diagram of a different communication lines we currently have to better illustrate our use cases:

(2) This is fairly typical scenario where you use the BLE (as it is ubiquitous) based connectivity for initialization (and sometimes for maintenance tasks). These kind of connections are usually a one-shot tasks. This means that the DTLS handshake happens every time the mobile device wants to connect against the device and it is not kept anywhere as we consider it ephemeral.

(1) Once the connection with the cloud is established, any other application wise communication happens through Hubs/Gateways. Here we keep the DTLS session alive and is distributed between all our servers as it was explained in the pion's issue I mentioned before.

The main difference between the two scenarios is in the layers we plan to use in them. In the (1)'s scenario we should use all the layers (CoAP, DTLS, IP, Gattlink) including the Gattlink as it's based on BLE and we want to achieve the reliability that layer offers. OTOH, in the (2)'s scenario we're probably going to use CoAP + DTLS and IP layers. Take into account that from the cloud perspective, the Hub or whichever devices in the middle doesn't exist. Those devices in the middle are like routing devices that would work at the layer 3 and route everything to our devices.

Our idea was to use this framework because all you explain in the doc. aligns perfectly with what we thought our framework should have. In fact, a lot of the decisions you took in the framework were exactly the ones we reach ourselves without even knowing about golden gate. That being said, I would love to see this framework being used in all of our devices as I think it could be a big win for everyone ;)

Sorry because my issue got larger than I expected but I thought it was appropriate to establish the basis.

I really appreciate your effort on sharing this information with me as it helps us rethinking design decisions again.

…remote_api_activity_crash to master * commit '18554bce93d6b714180526efe396e4c203d5b5cb': FC-5645: add gg module init call in remote api activity with ble role config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feasibility to use it in long-lived connections #16

Feasibility to use it in long-lived connections #16

glerchundi commented Sep 3, 2020 •

edited

Loading

barbibulle commented Sep 4, 2020

glerchundi commented Sep 7, 2020 •

edited

Loading

Feasibility to use it in long-lived connections #16

Feasibility to use it in long-lived connections #16

Comments

glerchundi commented Sep 3, 2020 • edited Loading

barbibulle commented Sep 4, 2020

glerchundi commented Sep 7, 2020 • edited Loading

glerchundi commented Sep 3, 2020 •

edited

Loading

glerchundi commented Sep 7, 2020 •

edited

Loading