Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] st_read fails to read Excel file from an http URL #181

Closed
tboddyspargo opened this issue Nov 16, 2023 · 5 comments · Fixed by #187
Closed

[BUG] st_read fails to read Excel file from an http URL #181

tboddyspargo opened this issue Nov 16, 2023 · 5 comments · Fixed by #187

Comments

@tboddyspargo
Copy link

tboddyspargo commented Nov 16, 2023

In duckdb v0.9.2, the following code fails (where it succeeded in 0.9.1):

import duckdb

conn = duckdb.connect(database=':memory:')
conn.execute('SET s3_endpoint="localhost:8333";')
conn.execute('SET s3_url_style="path";')
conn.execute('SET s3_use_ssl=false;')
conn.execute('INSTALL spatial; LOAD spatial;')
r = conn.execute(
    "SELECT * FROM st_read('http://localhost:8333/appdata/global/sba_sample.xlsx', layer='sba_sample')"
)

print(r.fetchdf())

In this case, http://localhost:8333 is a local seaweedfs container compliant with the S3 API (and sba_sample.xlsx exists at that path).

With this exception:

Traceback (most recent call last):
  File "tmp/excel_example.py", line 7, in <module>
    r = conn.execute(
duckdb.duckdb.IOException: IO Error: Unknown file type
@tboddyspargo tboddyspargo changed the title [BUG] st_read fails to read Excel file from URL [BUG] st_read fails to read Excel file from an http URL Nov 16, 2023
@tboddyspargo
Copy link
Author

I'll note that this was working in duckdb 0.9.1 and it has been our work-around due to the other failure mentioned in #182.

Thanks in advance for your help!

@tboddyspargo
Copy link
Author

I suspect this may have something to do with #168, given that the error message matches.

@Maxxen
Copy link
Member

Maxxen commented Nov 22, 2023

Hi! I think this has been fixed in #187.

Note: for XLSX in particular using DuckDB's filesystem with httpfs has a slower initial load time than /vsis3/ or /vsicurl/. Im aware of the issue and will look into improving it more in the future.

In a couple of hours (when the CI finishes) you should be able to install the extension with this fix for 0.9.2 by running:

FORCE INSTALL spatial FROM 'http://nightly-extensions.duckdb.org';

@tboddyspargo
Copy link
Author

Thanks @Maxxen - Can fix be made available in the default 0.9.2 spatial extension or in a 0.9.3 fix release?

@Maxxen
Copy link
Member

Maxxen commented Nov 22, 2023

We may eventually replace the default 0.9.2 install version with this fixed version, but I can't give a timeline. If more issues occur it may be delayed further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants