-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible regression from joins refactor in 2.3.0 #1346
Comments
I tried this with SQLite and the current dev version and it works (with some minor changes: There are a couple of difficulties regarding different ways to specify a table identifier ( Can you try this out with the current dev version? |
Thanks for the tips. Working on using the updated version now. Initial observations:
escape(as.POSIXct("2020-01-01"), con = con) (`actual`) not equal to sql("'2020-01-01 00:00:00'") (`expected`).
`actual`: "<SQL> '2020-01-01 08:00:00'"
`expected`: "<SQL> '2020-01-01 00:00:00'" |
Yes, the 2nd observation is missing to explicitly set a timezone. This was done in other tests but missed there... |
The 2nd issue should be fixed now. |
Next up:
I see NVM, I see that was just a placeholder method. I can simply remove the call to |
Mild breakage worth noting in the NEWS: we were using: sql_not_supported("median()") Which now produces the ugly message |
Thanks, I added the breaking change for |
I am stuck trying to update to dev. It breaks dozens of existing tests and I don't have any more bandwidth to investigate. There's also an interaction with our local patch to fix #1016 that's being further broken by the update. Looks like we will have to live with dbplyr 2.2.1 for the foreseeable future. |
Is this a public package? It would be interesting to see what tests are broken now and whether dbplyr can help to avoid this for now. |
Unfortunately it's an internal package. If you're OK to work slowly through shareable tidbits, I can try to share failures / relevant bits of our source. But that's likely to be somewhat painful. WDYT, is there any more productive way to debug here? |
I would expect that many of the failing tests boil down to only a couple of changes. Unfortunately, I couldn't see that coming as your package is internal and therefore didn't pop up in the revdep checks. |
That's almost certainly right. Let's see what kind of progress we can make. Our package is {f1} (see paper for the SQL backend). One crucial thing here is we're maintaining a patch in light of #1016: Line 91 in 244ea87
The I updated our
But now the next call fails:
I see some Lines 71 to 81 in 244ea87
Could you recommend what such a method might look like?
Maybe we should work around this on our end because it's our patch mucking things up. Instead I tried: if (rlang::inherits_any(table_name, c("SQL", "sql", "Id", "ident_q"))) {
return(table_name)
}
+ if (inherits(table_name, "dbplyr_table_ident")) {
+ table_name <- unclass(table_name)
+ return(DBI::Id(schema = table_name$schema, table = table_name$table))
+ } This made progress, but a new error I'm not sure how to deal with:
|
Another thing I tried: Lines 210 to 218 in 244ea87
But that (or maybe some other change) leads to the wrong SQL being generated:
So the table name has been wrapped in |
To me the first question is whether you really need Regarding wrapping: this probably comes from |
I think there's some history there; certainly a lot of tests fail without them. Here are their current implementations: .IdentifierNames <- function(len) {
if (len == 0) {
return(NULL)
}
if (len == 1) {
return("table")
}
if (len == 2) {
return(c("schema", "table"))
}
c("schema", "table", paste0("suffix_", seq_len(len - 2)))
}
.TableNameToIdentifier <- function(table_name) {
if (rlang::inherits_any(table_name, c("SQL", "sql", "Id", "ident_q"))) {
return(table_name)
}
if (inherits(table_name, "dbplyr_table_ident")) {
table_name <- unclass(table_name)
return(DBI::Id(schema = table_name$schema, table = table_name$table))
}
stopifnot(rlang::is_string(table_name))
parts <- strsplit(table_name, ".", fixed = TRUE)[[1]]
names(parts) <- .IdentifierNames(length(parts))
do.call(DBI::Id, as.list(parts))
}
tbl.F1DBIConnection <- function(src, from, ..., vars = NULL) {
from <- dbplyr::as.sql(.TableNameToIdentifier(from), con = src)
vars <- vars %||% dplyr::db_query_fields(src, from, ...)
dbi_src <- dbplyr::src_dbi(src, auto_disconnect = FALSE)
tbl <- dplyr::make_tbl(
c("F1DBIConnection", "googlesql", "dbi", "sql", "lazy"),
src = dbi_src,
lazy_query = dbplyr::lazy_base_query(from, vars, class = "remote")
)
tbl
}
tbl.src_F1DBIConnection <- function(src, from, ...) {
tbl(src$con, from, ...)
}
Oh, nice lead. We have sql_query_wrap.GoogleSQLDBIConnection <- function(
con, from,
name = .UniqueName("subquery"),
...,
lvl = 0) {
if (is.ident(from)) {
return(setNames(from, name))
}
if (is.null(name)) {
return(build_sql("(", from, ")", con = con))
}
if (!is.ident(name)) {
name <- ident(name)
}
build_sql("(", from, ") AS ", name, con = con)
} I added a case for sql_query_wrap.GoogleSQLDBIConnection <- function(
con, from,
name = .UniqueName("subquery"),
...,
lvl = 0) {
+ if (inherits(from, "dbplyr_table_ident")) {
+ return(from)
+ }
if (is.ident(from)) {
return(setNames(from, name))
}
if (is.null(name)) {
return(build_sql("(", from, ")", con = con))
}
if (!is.ident(name)) {
name <- ident(name)
}
build_sql("(", from, ") AS ", name, con = con)
} That gets us to a different SQL generation error: SELECT `datascape`.`mtcars_x`.*
FROM `datascape`.`mtcars_x`
LIMIT 6 The preamble to that looks like: f1.mtcars <- copy_to(dbi_con,
mtcars2,
name = "datascape.mtcars_x",
temporary = FALSE
)
tbl_mtcars <- tbl(dbi_con, "datascape.mtcars_x")
head(tbl_mtcars) Looks like SELECT ALIAS.*
FROM `datascape`.`mtcars_x` AS ALIAS
LIMIT 6 |
I'm also noticing that inside name <-
structure("`datascape`.`mtcars_x`", class = c("ident_q", "ident",
"character"))
from <-
structure(list(table = "`datascape`.`mtcars_x`", schema = NA_character_,
catalog = NA_character_, quoted = TRUE, alias = NA_character_), class = c("dbplyr_table_ident",
"vctrs_rcrd", "vctrs_vctr")) That looks a bit off as I'd expect |
I think you can simplify the whole setup a lot # remove `tbl.F1DBIConnection()`
tbl.src_F1DBIConnection <- function(src, from, ...) {
# maybe you don't really want to try to extract the schema automatically.
# the dev version of dbplyr checks if there is a `.` in the name and informs the user that they
# probably wanted to use `in_schema()` or `in_catalog()`
from <- .TableNameToIdentifier(from)
tbl_sql(
c("F1DBIConnection", "googlesql", "dbi"),
src = src,
from = from,
...
)
} Either remove your sql_query_wrap.GoogleSQLDBIConnection <- function(
con, from,
name = NULL,
...,
lvl = 0) {
if (is.null(name)) {
out <- sql_query_wrap(
con = con,
from = from,
name = .UniqueName("subquery"),
...,
lvl = lvl
)
}
NextMethod("sql_query_wrap")
} |
@MichaelChirico I'm closing the issue for now. Let me know if you need more help. |
Our custom backend is still deeply broken and nobody has any bandwidth to invest in writing the package from scratch. We are stuck with an old version of {dbplyr} and will be for the foreseeable future. This is already causing issues as the cutting-edge {dplyr} implementation gets further from what's available in our old {dbplyr}. Alas. |
I'm having a bear of a time trying to debug this issue that's arising when updating 2.2.1 -> 2.3.3. Genuinely hard to disentangle where the issue's coming from since there's so many layers where things may have gone wrong.
The query in the test being broken is pretty simple:
inner_join()
on some 3-row input tables:This test (which compares this join's output to the local dplyr equivalent) works as expected on 2.2.1 but breaks on 2.3.3:
The issue is the already-escaped name
`datascape`.`r_tmp_1`
is re-escaped unsuccessfully.Poking around in debugging I'm not able to tell what went wrong. It's possible our own connection methods are doing something unexpected, for example.
Just one observation:
Here, IIUC, we should respect the pre-escaped nature of the input when constructing
by$x_as
:dbplyr/R/lazy-join-query.R
Lines 171 to 178 in 5fa4410
Debugging, I see this around that step:
Perhaps
table_names_out
should beident_q
at this step, but even if so, it would have the same result:Should
ident()
have an escape for"ident"
input? And then we should make suretable_names_out
reflects the sameident_q
class as the inputx$x
andjoins$table
?Maybe I'm barking up the wrong tree.
The text was updated successfully, but these errors were encountered: