-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Function to join data tibbles #199
Comments
Never got around to giving you a good response, but looking over this now with some thoughts. API:So far as the API goes, originally I was going to say return a supertibble since that's what the new IMO the easiest thing to start with here is returning just a data tibble. I think that's the easiest for a user to visually confirm and do what they wish with, and if we do name the function NR/RT
This is pseudo-non standard if I understand right. My hope would be if we approach this as tackling each relationship one at a time then we can just build this up piece by piece. |
@ezraporter Started working on this a little over here. When you have a moment care to take a look and see if it's structured in a way that makes sense so far, borrowing from your initial suggestions? My hope and first milestone I want to reach is that this works for all non-mixed join relationships (I believe it does) and that the next piece will be to insert logic where |
This looks like a good start! Just a few random thoughts:
join_data_tibbles <- function(...) {
join_fn <- get_join_fn(type) # returns a function, ex. `left_join`
tbl_x <- extract_tibble(supertbl, x)
tbl_x_type <- get_structure(tbl_x) # returns a string with the structure, ex. "mixed"
tbl_y <- extract_tibble(supertbl, y)
tbl_y_type <- get_structure(tbl_y)
by <- build_by(tbl_x, tbl_x_type, tbl_y, tbl_y_type) # returns a vector of keys, ex. c(`record_id_field`, "redcap_event")
# do the join!
join_fn(tbl_x, tbl_y, by = by)
} |
For sure, and I'm sure we can get there but for now, at least the way my brain works, I wanted to approach things stepwise.
Yea I need to think about this more since between this and
I like this set up, will restructure to accommodate 👍 |
Coming back to this, I'm wondering if this can actually be made simpler? During the joins if we artificially create a Going to test this a bit more and hope to update the branch. |
@ezraporter When you have a moment, can you trial out the code on the join-tibbles-branch? If you want I can write up a test suite & PR if that makes it easier to review, but thought it might still be too soon to do that. The main changes I made for mixed structure handling were to check the x and y tables for the existence of As I mentioned above, when performing the mixed structure joins I think it's more common that users will need to select Try out below and playing with the sprtbl <- read_redcap(Sys.getenv("REDCAP_URI"), token = Sys.getenv("REDCAPTIDIER_MIXED_STRUCTURE_API"),
allow_mixed_structure = TRUE)
join_data_tibbles(suprtbl = sprtbl, x = "nonrepeat_form", y = "mixed_structure_form")
join_data_tibbles(suprtbl = sprtbl, x = "nonrepeat_form", y = "repeat_form")
join_data_tibbles(suprtbl = sprtbl, x = "repeat_form", y = "mixed_structure_form") At the moment if you join the RT/RS tables (the third join) there will be an empty |
Feature Request Description
Function to join together data tibbles that picks smart default keys to join on based on the events each tibble is mapped to.
Proposed Solution
Some possible APIs:
Additional Context
Data tibble types
API issues aside, I think the key here is that we'd need a way to determine for any two data tibbles which keys to use in a join by default.
Each instrument-event pair can have one of 3 types: Non Repeating, Repeating Separately, or Repeating Together. When I talk about the "type" of an instrument, I mean the aggregation of all the types of instrument-event pairs the instrument is part of. There are 7 options which I mocked up in an example REDCap.
"Type" here is slightly more specific than "structure". I think for joining any two non-mixed instruments the default should be to join on all common keys. The one complication I see is the
NA
redcap_event_instance
in the "NR, RT" case.Presumably if you join "NR, RT" to "NR, RT" you want the
NA
s to join in the non repeating event which is a little non-standard.Things get a little more complicated when considering the mixed structure instruments. For example, suppose you're joining "RS, RT" to "RS, RT". Presumably you'd want to join on all the keys for records in the RT event but on only
record_id
andredcap_event
for the RS records.If we wanted to cover all our bases we could try to enumerate the keys we'd expect to join on for all 28 possible combinations.
Missing keys
The above assumes that all instrument-event pairs have data. I noticed that when this isn't true the keys that come back for each data tibble aren't consistent. For example, in the "NR, RS" instrument
redcap_form_instance
is only included if data is filled out for that instrument in the RS event. Not sure how much this matters for the joins but it might be an additional consideration.Checklist
The text was updated successfully, but these errors were encountered: