Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEV-6356] installation progress #3564

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Ildyakov
Copy link

@Ildyakov Ildyakov commented Apr 11, 2023

This change is Reviewable

@@ -640,3 +640,19 @@ class DashboardChartGroupBy(create_time_mixin(created_at=True, updated_at=True),
DashboardChartGroupBy.jira_issue_types,
DashboardChartGroupBy.jira_labels,
)


class InstallationProgress(create_time_mixin(created_at=False, updated_at=False), Base):
Copy link
Contributor

@vmarkovtsev vmarkovtsev Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class InstallationProgress(create_time_mixin(created_at=False, updated_at=False), Base):
class GitHubInstallationProgress(Base):

I guess. If we write "github" in the description... So that we don't confuse with GitLab.
Also, create_time_mixin(created_at=False, updated_at=False) does nothing, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Comment on lines 651 to 652
account_created = Column(TIMESTAMP(timezone=True))
fetch_started = Column(TIMESTAMP(timezone=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is "account_created" different from "fetch_started"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more so we don`t have a "account_created" event. yet.
and for a case of re-fetch they will be different.
so for now we can cut this field I think. yes

Comment on lines 656 to 657
precompute_started = Column(TIMESTAMP(timezone=True))
precompute_completed = Column(TIMESTAMP(timezone=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reposet and Athenian account specific. We have to move those timestamps to the repository_sets table.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this force us to have some key in the state DB for github_account_ids.
create a table for comparing github_account_id and athenian_account_id needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably mean account_github_accounts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is contains both this IDs - yes it is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then yes it is

consistency_completed = Column(TIMESTAMP(timezone=True))
precompute_started = Column(TIMESTAMP(timezone=True))
precompute_completed = Column(TIMESTAMP(timezone=True))
current_status = Column(Text())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's delete this for now. I need us to create something absolutely minimal and ASAP. If we make the status, we will have to think what statuses can we have, when to update and what, consider different edge cases, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noone is forcing us to use all the columns of the table from the begining.
we can create a table "as is" and add the features to the pipeline and API when we will be ready

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's migrate to additional fields in the future. To avoid seduction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but how one text field is making it harder? We already have a PR to MD that uses it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addition - PR to cloud-common is ready too. and if we cut the columns it will slow us to implement changes instead of getting some features faster.
and if we delete precompute timestamps from this table we will use all the rest immediately after releasing all the features that in process now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, we can update the status field in the metadata whatever way we want, but we will not use it downstream. If it warms your heart much to update an unused field - I am not standing against 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd still like to have that field for the bot.

Agree about precomputer, we will need a second table then?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precompute- fields dropped. account_created field dropped. others (including status) is used by actual written logic in metadata. the point is that status can have a voluntary (apart of some statuses that would be fixed) value and this can be used for some specific messages like "paused", "delayed" etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dennwc we have a field "precomputed bool" in the table "repository_sets"
so it would be right to add a couple new fields to it. and provide a logic to fill this fields from the precomputer directly (not with event handler).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to write to metadata 👍

@@ -100,6 +100,8 @@ class RepositorySet(
)
tracking_re = Column(Text(), nullable=False, default=".*", server_default=".*")
precomputed = Column(Boolean(), nullable=False, default=False, server_default="false")
precompute_started = Column(TIMESTAMP(timezone=True))
precompute_completed = Column(TIMESTAMP(timezone=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
precompute_completed = Column(TIMESTAMP(timezone=True))
precompute_finished = Column(TIMESTAMP(timezone=True))

Minor syntax nitpick: start-finish; begin-end; initiate-complete. I know that people don't usually care, but this is just good education and conformance to the surrounding table conventions :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and I see we have the same syntax to fix in installation_progress 🙏

consistency_completed = Column(TIMESTAMP(timezone=True))
precompute_started = Column(TIMESTAMP(timezone=True))
precompute_completed = Column(TIMESTAMP(timezone=True))
current_status = Column(Text())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, we can update the status field in the metadata whatever way we want, but we will not use it downstream. If it warms your heart much to update an unused field - I am not standing against 😄

Copy link
Contributor

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @Ildyakov and @vmarkovtsev)


server/athenian/api/models/state/versions/5b3dc49a9d7b_installation_progress.py line 20 at r2 (raw file):

def upgrade():
    op.create_table(
        "installation_progress",

Since we are removing API-related things from the table anyway, does it make sense to move it to MD repo completely? It can be a set of added columns for github.account, for example. This way we can add a data migration as well (set all accounts to "done" status).

@Ildyakov
Copy link
Author

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @Ildyakov and @vmarkovtsev)

server/athenian/api/models/state/versions/5b3dc49a9d7b_installation_progress.py line 20 at r2 (raw file):

def upgrade():
    op.create_table(
        "installation_progress",

Since we are removing API-related things from the table anyway, does it make sense to move it to MD repo completely? It can be a set of added columns for github.account, for example. This way we can add a data migration as well (set all accounts to "done" status).

this is interesting. yes the reason to use state db was precomputing.
I think this is a proper solution to move the draft to metadata. WDYT @vmarkovtsev?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants