Table infrastructure, condition refactor #163

dogversioning · 2024-01-09T18:53:57Z

This PR makes the following changes:

Adds a schema parser class to databases
- Per discussion, it's a simple first pass - it is not really database aware.
- lots of *args **kwargs updates to support optionally passing a parser to a builder
Moved condition table (& some prerequisite queries) to a table builder
- Condition table should now inspect the found schema and null columns it cant find
moved count_condition_month table to core counts builder
some light updates to medication counts, which wasn't actually being properly tested before

Checklist

Consider if documentation (like in docs/) needs to be updated
Consider if tests should be added
Run pylint if you're making changes beyond adding studies
Update template repo if there are changes to study configuration

cumulus_library/statistics/counts.py

mikix

Nice - looks much more pleasant to work with, once you have the schema in hand.

cumulus_library/__init__.py

cumulus_library/databases.py

mikix · 2024-01-09T19:04:22Z

cumulus_library/databases.py

+        TODO: on a per database instance, consider a more nuanced approach
+        if needed


We both suspect it will be at some point yeah? I'm not hopeful we can band-aid this too much, but I think this sets us on a good path.

yeah - like, already, if you have a variable that you need that could occur more than one place (i'm looking at you, id), this falls over pretty quickly - but we might get away with it since we're talking about a small set of tables.

mikix · 2024-01-09T19:08:30Z

cumulus_library/databases.py

+            else:
+                for field in expected[column]:
+                    output[column][field] = False
+        return output


I assume this was tested on Athena at some point yeah?

nit: It's not the prettiest little function 😄 Maybe some example input in the docstring or comments would help it be more readable - as you look at it, you can scan above to see, "ah yeah, this would parse that bit" or something.

yeah i can add some more documentation around this whole thing. there was :some: athena testing, though just of this one - the prereq thing fell over - but this seemed to be reasonable.

ok take a look at the function docstring and tell me what you think.

cumulus_library/studies/core/count_core.py

mikix · 2024-01-09T19:44:08Z

cumulus_library/template_sql/statistics/count.sql.jinja

@@ -48,8 +72,14 @@ CREATE TABLE {{ table_name }} AS (
            {%- if secondary %}
            {{ secondary }}_ref,
            {%- endif -%}
+            {%- if fhir_resource=='condition' %}
+            coalesce(cast(cond_code_display AS varchar), 'missing-or-null') AS cond_code_display,


I made this little helper method, because I didn't like reproducing a static string all over, which might get typo'd. It doesn't save much, though, so I get if you like the cleanliness of just doing it directly. But here is my macro:

{% macro coalesce_missing() -%} {% if varargs %} {% set arg = varargs[0] %} {% else %} {% set arg = caller() %} {% endif %} COALESCE( {{ arg }}, 'missing-or-null' ) {%- endmacro %}

Can be used like so:

{{ coalesce_missing('field.subfield') }} AS cond_code_display, {% call coalesce_missing() %} cast(cond_code_display AS varchar) {% endcall %} AS cond_code_display

Or maybe in your version, you could have it do the AS varchar and you might actually save some typing that way (unlike mine, which doesn't really 😄)

I think the macro idea is smart, especially as we're about to change this based on some of the system discussion I was mentioning earlier - would be nice to have one point of entry for this.

That said - I might elect to do that in the next PR so I have a better idea of the expected scopes - lemme sleep on this one and make a call one way or another.

mikix · 2024-01-09T19:45:43Z

cumulus_library/template_sql/statistics/count.sql.jinja

+                {%- if fhir_resource=='condition' -%},
+                cond_code_display
+                {% endif %}


Yo there's a lot of condition specific code now 😄 - This is stuff that was missing before and Condition is just a pain in the butt, I'm gathering, rather than the result of all your schema work.

I :think: that it still makes sense to keep all these running through the same template since 80% is boilerplate, but i think there's going to be more of this kind of thing as we get the other count types out of static sql and into builders, though at this point i think it should mostly be just elif statements against these blocks.

I've been doing some this resource-specific logic in the quality study too. Sometimes I have a little "get date fields for resource X" in Python, and sometimes I have "get status value for resource X" in a macro.

For some reason, it feels more natural in Python (logic trees feel like code-code). But I think it makes more pragmatic sense in the templates.

But in both cases, I do like trying to isolate the if-else trees behind a utility function. Not necessary here yet, but that would be my vote as these builders get more complex.

cumulus_library/template_sql/utils.py

pyproject.toml

tests/test_counts_templates.py

mikix · 2024-01-10T14:13:31Z

cumulus_library/studies/core/builder_prereq_tables.py

+        ]
+        for sql_file in prereq_sql:
+            with open(dir_path / sql_file) as file:
+                queries = sqlparse.split(file.read())


Ah OK so this is the sqlparse use. This logic of "read multiple queries from a file" is something I wouldn't mind being added as syntactic sugar for jinja templates too. I had to split some files up in the quality study to accommodate the one-query-one-file requirement

Table infrastructure, condition refactor

7689c81

dogversioning commented Jan 9, 2024

View reviewed changes

cumulus_library/statistics/counts.py Show resolved Hide resolved

mikix approved these changes Jan 9, 2024

View reviewed changes

mikix reviewed Jan 10, 2024

View reviewed changes

Redo PR feedback & regression updates

d2189f6

dogversioning force-pushed the mg/core_study_refactor branch from fbd7f47 to d2189f6 Compare January 10, 2024 16:44

dogversioning added 3 commits January 10, 2024 14:14

template utils cleanup

25d6dce

updated regression data

1b25f03

import style cleanup, some doc impovements

258dbfe

dogversioning merged commit 660800f into main Jan 10, 2024
3 checks passed

dogversioning deleted the mg/core_study_refactor branch January 10, 2024 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table infrastructure, condition refactor #163

Table infrastructure, condition refactor #163

dogversioning commented Jan 9, 2024

mikix left a comment

mikix Jan 9, 2024

dogversioning Jan 9, 2024

mikix Jan 9, 2024

dogversioning Jan 9, 2024

dogversioning Jan 10, 2024

mikix Jan 9, 2024 •

edited

Loading

dogversioning Jan 9, 2024

mikix Jan 9, 2024

dogversioning Jan 9, 2024

mikix Jan 10, 2024

mikix Jan 10, 2024

		TODO: on a per database instance, consider a more nuanced approach
		if needed

Table infrastructure, condition refactor #163

Table infrastructure, condition refactor #163

Conversation

dogversioning commented Jan 9, 2024

Checklist

mikix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikix Jan 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikix Jan 9, 2024 •

edited

Loading