You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the behaviour of objects with a time to live (TTL) is not currently tested.
The primary challenge is when to recognise the expiry of the object in the cached tree. The expiry of the object has no external event trigger, so the aae_controller won't see anything to prompt the cache to represent changes as a result of expiry.
This will be OK at first, as other caches which are being co-ordinated with, will similarly continue to represent the state of expired objects. However, at some stage one of the controllers will go through a rebuild of the tree - and suddenly the trees will be divergent to the extent of the volume of objects expired between rebuilds.
This will gradually repair, as every time a delta is recognised, fetch_clocks will be run and fetch_clocks will reset the cache entry based on the expiry time of the object. So after many exchanges (assuming a lot of objects were expired), the cached trees will be back in sync.
However, if there are o(1m) expired objects that impact the tree, then AAE will be rendered useless by the weight of false positives for a considerable period (maybe many days). This is unsatisfactory.
There are three options:
Co-ordinate rebuilds across controllers;
Dynamically schedule rebuilds in response to upticks in exchange-prompted segment repairs;
Co-ordinate object expiry across the cluster (i.e. have managed by a sweeper event).
Currently, the second approach appears to the best option in turns of simplicity. Track how many replace_dirtysegments messages in the tree_cache are received, and prompt rebuild on accumulating a threshold of such events. That threshold should decrease as the time since the last rebuild increases to stop a second rebuild being driven by false positives (from exchanging with unbuilt trees) after a first rebuild - or this problem could be addressed by counting only segment replacements that were necessary (i.e. read before replace).
The text was updated successfully, but these errors were encountered:
Currently the behaviour of objects with a time to live (TTL) is not currently tested.
The primary challenge is when to recognise the expiry of the object in the cached tree. The expiry of the object has no external event trigger, so the aae_controller won't see anything to prompt the cache to represent changes as a result of expiry.
This will be OK at first, as other caches which are being co-ordinated with, will similarly continue to represent the state of expired objects. However, at some stage one of the controllers will go through a rebuild of the tree - and suddenly the trees will be divergent to the extent of the volume of objects expired between rebuilds.
This will gradually repair, as every time a delta is recognised, fetch_clocks will be run and fetch_clocks will reset the cache entry based on the expiry time of the object. So after many exchanges (assuming a lot of objects were expired), the cached trees will be back in sync.
However, if there are o(1m) expired objects that impact the tree, then AAE will be rendered useless by the weight of false positives for a considerable period (maybe many days). This is unsatisfactory.
There are three options:
Currently, the second approach appears to the best option in turns of simplicity. Track how many replace_dirtysegments messages in the tree_cache are received, and prompt rebuild on accumulating a threshold of such events. That threshold should decrease as the time since the last rebuild increases to stop a second rebuild being driven by false positives (from exchanging with unbuilt trees) after a first rebuild - or this problem could be addressed by counting only segment replacements that were necessary (i.e. read before replace).
The text was updated successfully, but these errors were encountered: