-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best way to mix online and offline updates #74
Comments
Sounds like a good idea! I'll share my first reactions:
I think we should prevent mixing and matching. That sounds like it could open up risky doors and create problems. I'm not sure how best to prevent that, though. I can also see a case for more or less interrupting an online update that's blocking or broken somehow with an offline update that might restore things to a better place.
That I really don't know, but the offline updates don't need a correlation ID, right? What happens if we just leave that out of the equation? Will the Director get particularly upset that the image changed without informing it?
Probably. How else do we know how to prioritize or trigger the offline updates? That's the deeper question, right?
That seems like a good goal and might even be a given. |
The offline updates definitely don't need (or have) a correlation ID. As for director behaviour, I think we should get @simao in this thread (I think github won't let me actually ping him via @ until he joins of his own accord). I think it's technically device registry that keeps track of device state, but state changes are triggered by events emitted by director. Here's the state diagram as I understand it: stateDiagram-v2
NotSeen --> UpToDate: device sends manifest
UpToDate --> Outdated: user assigns an update
Outdated --> UpdatePending: device downloads director metadata w/ correlationId
Outdated --> UpToDate: user cancels assigned update
UpdatePending --> UpToDate: device reports success w/ correlationId
UpdatePending --> Error: device reports failure w/ correlationId
Error --> Outdated: user assigns an update
(by the way: isn't it cool that github added support for inline mermaid diagrams in markdown??) Note that both transitions out of UpdatePending can only happen when we get a manifest from aktualizr containing an InstallationReport with correlationId in its custom metadata. (Director throws an error if you attempt to assign an update to a device that is currently Outdated or UpdatePending.) On principle, I think that the decision about whether to interrupt an in-progress update with an update of another type should be left to the user of libaktualizr: the calling program should make decisions about the circumstances under which an update should be preempted. But that leaves us to decide how to construct the manifest (or specifically, the installation report) in the case where an online update is interrupted by an offline update. Today, an installation report always pertains to a correlationId. If we were to just include an installation report for the failed online update, there are two unsatisfactory options: call it a success, even though it was interrupted, or call it a failure and have the device be in an "Error" state on the server even though it's successfully installed an offline update. I think in an ideal world, we would also be able to send installation reports for the successful install of an offline update, the next time the device sends a manifest to director (I'd imagine that the offline update installation report would use the name and version of the offline update metadata in lieu of correlationId). That does imply, though, that it would have to be possible to include more than one installation report per manifest. And if we can have multiple installation reports, then we also solve the problem of how to report an interrupted online update: it's just a failure, but there's an offline update installation report that comes after it. Of course, director would also have to learn to accept multiple installation reports in aktualizr manifests. |
Yes, we are not sending any correlation ids in the lockboxes, I don't think that is needed, and we also don't have it at the time the user creates the lock box. Regarding the device reporting to director something that was installed offline, director will still process the manifest. If the device has some assignment in director already, an online update, director will still process the manifest and just update the currently installed ecu target, put keep the device queue intact. If the offline installed update somehow matches the update in the queue (ecuId, filename, checksum), and that assignment in director has a correlation id, then the normal execution for online updates will be followed. If the offline update does not match the online update, then the processing will be the same as when the queue is empty, and the device queue will remain as before. If the queue for the device is empty, director will just update the current known installed ecu target to the new target reported by device. If the manifest reports an error, this will be published in the bus, and device registry will update the status to error, otherwise the status will be moved to Up To Date. |
Ok, I think that covers my concern about getting into an non-updateable state: director will just attempt to re-send whatever update was assigned before. Whether that's desirable or not is something that we can discuss separately; it's only a concern for director. I'm also convinced, after talking to @simao about it, that we probably don't need aktualizr to send us an installation report for each of the updates it attempted to install since the last time it sent a manifest. The most recent one should be sufficient. I do think we should probably define some way for aktualizr to note that it was an offline update, so that we can display that info to the user (and allow director to make a decision about whether to cancel the pending update or not). Something like setting the correlationId to |
In the case of offline updates, the device will not have a correlation id, but it could build one like |
Offline updates (#8) will allow a single device to get update instructions from two sources:
If only one or the other of these two happens, the behaviour of the system is pretty clear. However we need to consider what will happen in scenarios where the two are interleaved. I assume this to be rare in normal operation, so I think our goals are to introduce this without breaking other things, rather than to support fancy use cases.
I wanted to open this ticket so we can have this "what happens" discussion in the open rather than via email. Here are the considerations I'm aware of so far:
install online? I think the answer to this question is 'no', but there are more marginal cases to consider.
I hope we can work out a proposal for what these rules should be in this ticket!
The text was updated successfully, but these errors were encountered: