You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: This isn't simply a result of different rows being returned (which might be expected due to the distributed nature of the query). The query was returning the wrong columns and the wrong number of rows
The text was updated successfully, but these errors were encountered:
I was running into this issue again today. This time I was getting the right columns but in the wrong order.
Again, running the query generated by pydbtools in Athena gave the right result:
select c.postcode, d.pc_nospace, d.count_distinct_person, c.codes from mojap_de_temp_1613136638555168000.pc_with_distinct_persons as d left join mojap_de_temp_1613136638555168000.pc_with_codes as c on d.pc_nospace = c.pc_nospace
order by d.count_distinct_person desc;
Further inspection suggests I was getting all the right data but in the wrong order. (using ctas_approach =True)
This makes sense if using the arrow reader, because presumably it reads the parquet files into pandas in any order.
I also ran the query directly in wranger.athena.read_sql_query and got the same result i.e. this doesn't seem to be an issue with the manipulations pydbtools does.
A good start to further troubleshoot this would be to put some logging in to wrangler at the DEBUG level which outputs
1.. the SQL statement executed and
2. the args which are submitted to AWS wrangler
I've run into an issue a few times where I run a query usin
read_sql_query
and get completely different results frmo the Athena GUI.Restarting kernel and re-running fixes it.
I think this is likely to be a result of caching behaviour - possibly from
awswrangler
.I haven't yet been able to create a fully reproducible example. The query I was running this morning was:
Note: This isn't simply a result of different rows being returned (which might be expected due to the distributed nature of the query). The query was returning the wrong columns and the wrong number of rows
The text was updated successfully, but these errors were encountered: