-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
st_read
continually fails if first attempt was unable to connect to target host
#221
Comments
Hi! Thanks for reporting this issue! You can get the latest spatial version (built every time a PR is merged to main in this repo) for DuckDB 0.9.1 by executing:
Note that this is a newer version than the one you get if you use DuckDB 0.9.3. Also, do you have the |
Based on the docs I didn't think there was a need to install it explicitly. Is that incorrect? Besides this issue, I am in general able to connect to S3 using I went ahead and explicitly installed import sqlalchemy
connect_args = {'config': {'s3_region': 'myregion', 's3_access_key_id': 'mykey', 's3_secret_access_key': 'mysecret', 's3_endpoint': 'hostname:port', 's3_url_style': 'path', 's3_use_ssl': False}}
engine = sqlalchemy.create_engine('duckdb:///:memory:', connect_args=connect_args)
try:
with engine.connect() as conn:
conn.execute("INSTALL httpfs; LOAD httpfs;")
conn.execute("FORCE INSTALL spatial FROM 'http://nightly-extensions.duckdb.org'; LOAD spatial;")
conn.execute("SELECT * FROM st_read('http://hostname:port/bucket/file.xlsx', layer='sheet1')") # IO Error: Unknown file type
conn.execute("SELECT * FROM st_read('http://hostname:port/bucket/file.xlsx', layer='sheet1')") # IO Error: GDAL Error (1): GDALOpen() called on http://hostname:port/bucket/file.xlsx recursively
finally:
engine.dispose() |
Alright, thanks for the thorough investigation! In general you do/will need the httpfs extension afaik it doesnt autoload properly when spatial tries to issue a remote call, but that might also have changed. Ill look into this properly when im back from vacation in a couple of weeks! |
So im investigating this more now. My working theory is that the "called recursively" errors is due to us just being a little too aggressive when erroring and throwing exceptions when we can't determine the file type, which doesn't allow GDAL to do the proper cleaning up, which causes the "called recursively" errors. I think I have a fix for this as its pretty simple. What Im still trying to figure out is why you get the |
No, I am running Arch Linux with kernel version "Linux 6.6.10-arch1-1". |
Also, looking into this again just now, I still see the "unknown file type" error on This may be expected as you work on a resolution, but I just thought I'd point it out. E sqlalchemy.exc.OperationalError: (duckdb.duckdb.HTTPException) HTTP Error: Failed to download extension "spatial" at URL "http://nightly-extensions.duckdb.org/8793ff3a2e/linux_amd64_gcc4/spatial.duckdb_extension.gz"
E Extension "spatial" is an existing extension.
E
E Are you using a development build? In this case, extensions might not (yet) be uploaded.
E [SQL: FORCE INSTALL spatial FROM 'http://nightly-extensions.duckdb.org'; LOAD spatial;]
E (Background on this error at: https://sqlalche.me/e/14/e3q8) |
One more happier observation. The same set of steps actually worked without issue on Edit: it even works with the default version of |
Hmm, I think maybe you needed to reload to have the new extension apply. Nevertheless I've fixed the issue with getting stuck on the recursive errors in #227, so if this doesn't occur again I think we can close this as fixed. |
Not sure if I totally understand what you mean by needing to reload, but I still see the same error about |
Below is a minimal repro of the issue I am seeing with the Python SQLAlchemy DBAPI when attempting to use
st_read
to load Excel files. In this case I'm using in-memory DuckDB, with a local SeaweedFS S3 store which runs as a webservice in a Docker container.What I am trying to illustrate here is that, if
st_read
fails to connect to the target host on the first attempt, all subsequent attempts will also fail even if the host becomes responsive and the file exists in the location specified. In other words, without simulating a network outage by manually bringing down my S3 service and bringing it back up after the first failed attempt, all of thesest_read
calls work fine.To me it's particularly problematic that this still occurs if I call
engine.dispose()
because in that case I expect a "fresh" connection without knowledge of the past failure. But even withoutengine.dispose()
I would preferst_read
to try again and not throw the "called recursively" error in order to handle (common) real-world cases where connectivity was temporarily lost and we want to try again.I'm seeing this on
duckdb
version 0.9.1, but have observed the same behavior on 0.9.3-dev1996.The text was updated successfully, but these errors were encountered: