-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User identity for telemetry events is not unique #333
Comments
I really like Pants anonymous telemetry collection system https://www.pantsbuild.org/docs/anonymous-telemetry |
One possible way to tackle this: Try to read a user-wide id from a predefined, user-writable location (for example under The config could have the form
(TOML choice and Thoughts? |
From #510 (comment), we see that GE, DVC, Evidently, Reflex do track unique users, so we can draw some inspiration from it. Final file naming still TBC. Since this would be a kedro-telemetry specific config (and not kedro), we could go for something like |
Let's make this
equivalent home path using https://pypi.org/project/appdirs/. Format using Mechanics as follows:
|
Thank you for the thorough investigation, @astrojuanlu! I have a couple of questions regarding the approach:
|
Good questions!
|
It appears virtually impossible to reconcile data if we simply replace old usernames with new ones. Introducing a new variable for the new username and preserving both could be effective, making data merging much simpler, what do you think?
I agree with that, additionally, we might consider modifications to the Dockerfile in kedro-docker that could enable us to consistently save the same identifier for every run. |
|
I mean, do we foresee a need in the future to identify that username
Ok! |
Clarified asynchronously: we're not retroactively rewriting existing data, not aiming for continuity or trying to link UUIDs with current user "ids" 👍 |
@astrojuanlu , are you certain that when the file exists but is empty, or when it contains an invalid UUID4, we should proceed by sending an empty value instead of attempting to rectify the situation with a new ID? Perhaps it would be more effective to attempt generating and saving a new ID once, and only resort to sending an empty value and maintaining the status quo if that attempt fails? |
For full disclosure (since @idanov asked about this), I landed on |
Looks like that kedro already has a dependency on But as I understand it, |
IMHO, yes :) kedro-org/kedro#2569 (comment) we never opened an issue about it |
This might be a stupid question, but if we completely transition away from |
Sorry, to clarify - I'm not advocating for moving away from the TOML format, but from using the |
TOML discussion kedro-org/kedro#3690 |
In yesterday's stand-up, we agreed on the following actions:
@astrojuanlu, do you agree with this plan or you prefer to finish that discussion first? |
Totally agreed! Thanks for the update @DimedS 👍🏽 |
Description
kedro-telemetry uses the hashed username for the identity of the Heap event:
kedro-plugins/kedro-telemetry/kedro_telemetry/plugin.py
Lines 84 to 94 in 7d092db
kedro-plugins/kedro-telemetry/kedro_telemetry/plugin.py
Lines 41 to 44 in 7d092db
However, this is not enough to disambiguate users. An extreme example is that most people launching Kedro in a container will have the username
root
, hence all their events might be collected under the same identity!Context
If we are interested in number of users, we might be incurring in a dramatic under-reporting. The drawbacks of this are described in https://developers.heap.io/docs/using-identify#using-a-non-unique-identifier
This was done in #58 after discussion in #50. Before that, kedro-telemetry relied on hostname, which again is not necessarily unique.
kedro-plugins/kedro-telemetry/kedro_telemetry/plugin.py
Line 82 in 0f39018
Possible solution
For example, Orchest used to generate a
TELEMETRY_UUID
and store it in a centralized location:https://github.com/orchest/orchest/blob/560103a5c7dbdf615a2e0df435076f0ee78b6a1d/services/orchest-api/app/app/utils.py#L431-L432
The text was updated successfully, but these errors were encountered: