Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYNPY-1551] Tables refactor #1149

Closed
wants to merge 4 commits into from
Closed

Conversation

BryanFauble
Copy link
Contributor

Problem:

  1. Working with tables in the current implementation is a difficult task. Many different functions must be strung together to accomplish some tasks, features like renaming columns is missing, and all data must go through a CSV written to disk when interacting with Synapse.

Solution:

  1. Start the process of updating the initial Tables OOP model to bring forward a number of changes:

The main functions that exist on the Table to interact with required functionality include:

  • store
  • store_rows
  • get
  • delete
  • delete_rows
  • query

Convenience functions (Functions that smooth out setting attributes, but require a call to .store() to persist to Synapse)

  • delete_column
  • add_column
  • reorder_column
  • set_columns
  1. Once the logic for these functions is nailed down the same ideas may be applied to both EntityView and Dataset concepts in Synapse. They're similar in concept to a Table, but serve different purposes for users. When we start to implement those 2 models we will be able to extract out common logic and re-use it across all 3 models.

TODO:

  • Get feedback on proposed changes and smooth out rough edges
  • Add additional examples and doc strings to all functions
  • Implement logic to support loading results from Synapse directly into a pandas dataframe (Without requiring the usage of an intermediate CSV file written to disk)
  • Implement querying and returning results via a DataFrame
  • Add verbose integration tests for all logic
  • Add verbose unit tests for all logic
  • Swap the submission of data to Synapse to use a single transaction via this api: https://rest-docs.synapse.org/rest/POST/entity/id/table/transaction/async/start.html
  • Implement a dry-run feature to both store and store_rows that prints out the changes to the schema that will be applied (Determine if we could do a diff of the data, if it's feasible to implement - If not, do not implement)
  • Factor the delete_rows logic, or document the process well. Current behavior requires both a row_id and version_number to delete sourced from query results. As querying for data changed, so will the delete logic.
  • Implement storing of rows for each of the formats we are going to let users submit data: Csv, list of lists, dictionary, pandas dataframe
  • Implement a _columns_to_delete concept within the models to prevent unintentional deletion of columns and require that deleting a column go through the delete_column function.

@pep8speaks
Copy link

Hello @BryanFauble! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 43:89: E501 line too long (212 > 88 characters)

Line 68:89: E501 line too long (218 > 88 characters)
Line 69:89: E501 line too long (102 > 88 characters)

Line 286:89: E501 line too long (172 > 88 characters)
Line 287:89: E501 line too long (176 > 88 characters)
Line 318:89: E501 line too long (94 > 88 characters)
Line 319:89: E501 line too long (95 > 88 characters)
Line 320:89: E501 line too long (96 > 88 characters)
Line 323:89: E501 line too long (101 > 88 characters)
Line 324:89: E501 line too long (103 > 88 characters)
Line 326:89: E501 line too long (94 > 88 characters)
Line 327:89: E501 line too long (91 > 88 characters)
Line 338:89: E501 line too long (109 > 88 characters)
Line 341:89: E501 line too long (100 > 88 characters)
Line 346:89: E501 line too long (94 > 88 characters)
Line 347:89: E501 line too long (93 > 88 characters)
Line 348:89: E501 line too long (92 > 88 characters)
Line 352:89: E501 line too long (94 > 88 characters)
Line 353:89: E501 line too long (91 > 88 characters)
Line 354:89: E501 line too long (94 > 88 characters)
Line 355:89: E501 line too long (91 > 88 characters)
Line 356:89: E501 line too long (93 > 88 characters)
Line 357:89: E501 line too long (96 > 88 characters)
Line 478:89: E501 line too long (117 > 88 characters)
Line 483:89: E501 line too long (110 > 88 characters)
Line 586:89: E501 line too long (93 > 88 characters)
Line 587:89: E501 line too long (95 > 88 characters)
Line 601:89: E501 line too long (101 > 88 characters)
Line 604:89: E501 line too long (92 > 88 characters)
Line 624:89: E501 line too long (173 > 88 characters)
Line 736:89: E501 line too long (91 > 88 characters)
Line 808:89: E501 line too long (218 > 88 characters)
Line 809:89: E501 line too long (102 > 88 characters)
Line 844:89: E501 line too long (89 > 88 characters)
Line 854:89: E501 line too long (99 > 88 characters)
Line 927:89: E501 line too long (110 > 88 characters)
Line 935:89: E501 line too long (174 > 88 characters)
Line 945:89: E501 line too long (145 > 88 characters)
Line 1078:89: E501 line too long (203 > 88 characters)
Line 1161:89: E501 line too long (98 > 88 characters)
Line 1162:89: E501 line too long (101 > 88 characters)
Line 1240:89: E501 line too long (117 > 88 characters)
Line 1263:89: E501 line too long (120 > 88 characters)
Line 1264:89: E501 line too long (116 > 88 characters)
Line 1265:89: E501 line too long (146 > 88 characters)
Line 1266:89: E501 line too long (241 > 88 characters)

@BryanFauble
Copy link
Contributor Author

Going to reopen and see if readthedocs builds the docs automatically

@BryanFauble BryanFauble closed this Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants