Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table infrastructure, condition refactor #163

Merged
merged 5 commits into from
Jan 10, 2024
Merged

Conversation

dogversioning
Copy link
Contributor

This PR makes the following changes:

  • Adds a schema parser class to databases
    • Per discussion, it's a simple first pass - it is not really database aware.
    • lots of *args **kwargs updates to support optionally passing a parser to a builder
  • Moved condition table (& some prerequisite queries) to a table builder
    • Condition table should now inspect the found schema and null columns it cant find
  • moved count_condition_month table to core counts builder
  • some light updates to medication counts, which wasn't actually being properly tested before

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added
  • Run pylint if you're making changes beyond adding studies
  • Update template repo if there are changes to study configuration

Copy link
Contributor

@mikix mikix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - looks much more pleasant to work with, once you have the schema in hand.

cumulus_library/__init__.py Show resolved Hide resolved
cumulus_library/databases.py Outdated Show resolved Hide resolved
Comment on lines +53 to +54
TODO: on a per database instance, consider a more nuanced approach
if needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We both suspect it will be at some point yeah? I'm not hopeful we can band-aid this too much, but I think this sets us on a good path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - like, already, if you have a variable that you need that could occur more than one place (i'm looking at you, id), this falls over pretty quickly - but we might get away with it since we're talking about a small set of tables.

else:
for field in expected[column]:
output[column][field] = False
return output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this was tested on Athena at some point yeah?

nit: It's not the prettiest little function 😄 Maybe some example input in the docstring or comments would help it be more readable - as you look at it, you can scan above to see, "ah yeah, this would parse that bit" or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i can add some more documentation around this whole thing. there was :some: athena testing, though just of this one - the prereq thing fell over - but this seemed to be reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok take a look at the function docstring and tell me what you think.

cumulus_library/studies/core/count_core.py Outdated Show resolved Hide resolved
@@ -48,8 +72,14 @@ CREATE TABLE {{ table_name }} AS (
{%- if secondary %}
{{ secondary }}_ref,
{%- endif -%}
{%- if fhir_resource=='condition' %}
coalesce(cast(cond_code_display AS varchar), 'missing-or-null') AS cond_code_display,
Copy link
Contributor

@mikix mikix Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this little helper method, because I didn't like reproducing a static string all over, which might get typo'd. It doesn't save much, though, so I get if you like the cleanliness of just doing it directly. But here is my macro:

{% macro coalesce_missing() -%}
    {% if varargs %}
        {% set arg = varargs[0] %}
    {% else %}
        {% set arg = caller() %}
    {% endif %}

    COALESCE(
        {{ arg }},
        'missing-or-null'
    )
{%- endmacro %}

Can be used like so:

 {{ coalesce_missing('field.subfield') }} AS cond_code_display,
 {% call coalesce_missing() %}
    cast(cond_code_display AS varchar)
 {% endcall %} AS cond_code_display

Or maybe in your version, you could have it do the AS varchar and you might actually save some typing that way (unlike mine, which doesn't really 😄)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the macro idea is smart, especially as we're about to change this based on some of the system discussion I was mentioning earlier - would be nice to have one point of entry for this.

That said - I might elect to do that in the next PR so I have a better idea of the expected scopes - lemme sleep on this one and make a call one way or another.

Comment on lines +112 to +114
{%- if fhir_resource=='condition' -%},
cond_code_display
{% endif %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yo there's a lot of condition specific code now 😄 - This is stuff that was missing before and Condition is just a pain in the butt, I'm gathering, rather than the result of all your schema work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I :think: that it still makes sense to keep all these running through the same template since 80% is boilerplate, but i think there's going to be more of this kind of thing as we get the other count types out of static sql and into builders, though at this point i think it should mostly be just elif statements against these blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been doing some this resource-specific logic in the quality study too. Sometimes I have a little "get date fields for resource X" in Python, and sometimes I have "get status value for resource X" in a macro.

For some reason, it feels more natural in Python (logic trees feel like code-code). But I think it makes more pragmatic sense in the templates.

But in both cases, I do like trying to isolate the if-else trees behind a utility function. Not necessary here yet, but that would be my vote as these builders get more complex.

cumulus_library/template_sql/utils.py Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
tests/test_counts_templates.py Outdated Show resolved Hide resolved
]
for sql_file in prereq_sql:
with open(dir_path / sql_file) as file:
queries = sqlparse.split(file.read())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK so this is the sqlparse use. This logic of "read multiple queries from a file" is something I wouldn't mind being added as syntactic sugar for jinja templates too. I had to split some files up in the quality study to accommodate the one-query-one-file requirement

@dogversioning dogversioning force-pushed the mg/core_study_refactor branch from fbd7f47 to d2189f6 Compare January 10, 2024 16:44
@dogversioning dogversioning merged commit 660800f into main Jan 10, 2024
3 checks passed
@dogversioning dogversioning deleted the mg/core_study_refactor branch January 10, 2024 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants