Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-267] Query BQ once per BP for spend report, handle errors #3166

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

calypsomatic
Copy link
Contributor

@calypsomatic calypsomatic commented Jan 17, 2025

Ticket: https://broadworkbench.atlassian.net/browse/CORE-267
For the consolidated spend report, instead of unioning all billing projects together into one BQ query, this PR does a separate query for each billing project. If BQ returns an error for any one of them, it is left out of the results.

As currently implemented, BPs that generate errors are essentially ignored, resulting in all these workspaces simply not showing up in the UI. Since the form of the result returned from the Rawls API has a list of SpendReportingForDateRange, it might not be very straightforward to return a result for all the workspaces in a problematic BP that indicates that there was a problem. So that would suggest that the UI would have to figure out which, if any, workspaces are missing and display an 'N/A' for them, but that leaves out some information that we do have. Reviewers, is it worth it to try to shove some indication of the problem into a SpendReportingResult or is it ok just to leave it to the UI?

Testing:
I compared my results from the consolidated spend report API both before and after I split the query up and I believe the substance did not change. I then created a BP and gave it a false BQ dataset in its spend report configuration. After checking that this resulted in errors on the main branch, I verified that I was able to successfully get the report on this branch.


PR checklist

  • Include the JIRA issue number in the PR description and title
  • Make sure Swagger is updated if API changes
    • ...and Orchestration's Swagger too!
  • If you changed anything in model/, then you should publish a new official rawls-model and perform the corresponding dependency updates as specified in the README:
    • in the automation subdirectory
    • in workbench-libs
    • in firecloud-orchestration
  • Get two thumbsworth of PR review
  • Verify all tests go green, including CI tests
  • Squash commits and merge to develop (branches are automatically deleted after merging)
  • Inform other teams of any substantial changes via Slack and/or email

@calypsomatic calypsomatic changed the title split allworkspace query by bp [CORE-267] Query BQ once per BP for spend report, handle errors Jan 17, 2025
@calypsomatic calypsomatic marked this pull request as ready for review January 17, 2025 20:22
@calypsomatic calypsomatic requested a review from a team as a code owner January 17, 2025 20:22
@calypsomatic calypsomatic requested review from davidangb and marctalbott and removed request for a team January 17, 2025 20:22
Copy link
Contributor

@davidangb davidangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some syntax suggestions inline, not a full review:

@@ -358,7 +358,8 @@ class SpendReportingService(
}

def getAllUserWorkspaceQuery(
billingProjects: Map[BillingProjectSpendExport, Seq[(GoogleProjectId, WorkspaceName)]],
billingProject: BillingProjectSpendExport,
workspaces: Seq[(GoogleProjectId, WorkspaceName)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the WorkspaceNames aren't used anywhere inside this method. If you changed this method to accept a Seq[GoogleProjectId] instead then some interior code gets a little simpler

baseQuery
.replace("_PARTITIONTIME", timePartitionColumn)
.replace("_BILLING_ACCOUNT_TABLE", tableName)
.replace("_PROJECT_ID_LIST", "(" + workspaces.map(tuple => s""""${tuple._1.value}"""").mkString(", ") + ")")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/optional: you can use the 3-argument form of mkString here instead; your call on which is more readable

Suggested change
.replace("_PROJECT_ID_LIST", "(" + workspaces.map(tuple => s""""${tuple._1.value}"""").mkString(", ") + ")")
.replace("_PROJECT_ID_LIST", workspaces.map(tuple => s""""${tuple._1.value}"""").mkString("(", ", ", ")"))

}
}
.recoverWith { case ex =>
Future.successful(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log something here to show that we ran into a problem?

Comment on lines 649 to 658
SpendReportingResults(
acc.spendDetails ++ res.spendDetails,
SpendReportingForDateRange(
(BigDecimal(acc.spendSummary.cost) + BigDecimal(res.spendSummary.cost)).toString,
(BigDecimal(acc.spendSummary.credits) + BigDecimal(res.spendSummary.credits)).toString,
acc.spendSummary.currency,
acc.spendSummary.startTime,
acc.spendSummary.endTime
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider writing a + method in the SpendReportingResults case class, then calling it here.

if you had:

case class SpendReportingResults(spendDetails: Seq[SpendReportingAggregation],
                                 spendSummary: SpendReportingForDateRange
) {
  def +(other: SpendReportingResults): SpendReportingResults =
    new SpendReportingResults(
      this.spendDetails ++ other.spendDetails,
      SpendReportingForDateRange(
        (BigDecimal(this.spendSummary.cost) + BigDecimal(other.spendSummary.cost)).toString,
        (BigDecimal(this.spendSummary.credits) + BigDecimal(other.spendSummary.credits)).toString,
        this.spendSummary.currency,
        this.spendSummary.startTime,
        this.spendSummary.endTime
      )
    )
}

then this code could simply be:

combinedResults = results.flatten.reduceOption((acc, res) => acc + res)

baseQuery
.replace("_PARTITIONTIME", timePartitionColumn)
.replace("_BILLING_ACCOUNT_TABLE", tableName)
.replace("_PROJECT_ID_LIST", "(" + workspaces.map(tuple => s""""${tuple._1.value}"""").mkString(", ") + ")")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.replace("_PROJECT_ID_LIST", "(" + workspaces.map(tuple => s""""${tuple._1.value}"""").mkString(", ") + ")")
.replace("_PROJECT_ID_LIST", "(" + workspaces.map { case (project, _) => s""""${project.value}"""" }.mkString(", ") + ")")

the _1 syntax is tough, using the curlies and case make it look nicer

}
}
.recoverWith { case ex =>
Future.successful(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should at least log this exception

@dvoet
Copy link
Contributor

dvoet commented Jan 17, 2025

@davidangb jinx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants