You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SQLAlchemy reflection on an Iceberg table should not emit warnings.
partition should not be returned as an index on an Iceberg table.
Actual behavior
When performing reflection on an Iceberg table, SQLAlchemy's Inspector._reflect_indexes() method displays a series of warnings, one for each of the columns in the Iceberg table's $partitions metadata table:
/Users/ppatterson/src/trino-indexes/test.py:8: SAWarning: index key 'partition' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
/Users/ppatterson/src/trino-indexes/test.py:8: SAWarning: index key 'record_count' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
/Users/ppatterson/src/trino-indexes/test.py:8: SAWarning: index key 'file_count' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
/Users/ppatterson/src/trino-indexes/test.py:8: SAWarning: index key 'total_size' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
/Users/ppatterson/src/trino-indexes/test.py:8: SAWarning: index key 'data' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
After reflection, the table object for the Iceberg table contains an index named partition with no columns.
The above problems are caused by _get_partitions() (called by TrinoDialect.get_indexes()), which assumes that, if SELECT * FROM {schema}."{table_name}$partitions" succeeds, then the table is a Hive table.
This assumption is not correct - the query also succeeds for Iceberg tables, but the returned columns are not indexes on the Iceberg table. The client should probably query system.metadata.catalogs to get the connector_name for the catalog and check that it is "hive".
Steps To Reproduce
The following test app causes the warnings to be shown. pip install trino[sqlalchemy] before running it.
fromsqlalchemyimportcreate_engine, MetaData# "iceberg" is an Iceberg catalog containing a single namespace, "drivestats",# which contains a single table, "drivestats".engine=create_engine('trino://admin@localhost:8080/iceberg/drivestats')
metadata=MetaData()
print('Calling reflect()')
metadata.reflect(engine)
print('\nListing tables:')
# What tables can we see?fortinmetadata.sorted_tables:
print(f'Table name: {t.name}')
# What indexes are on the table?foridxinmetadata.tables[t.name].indexes:
print(f'\tIndex name: {idx.name}')
# What are the index columns?forcinidx.columns:
print(f'\t\tColumn name: {c}')
[Note that the warnings are sent to stderr, so they typically appear out of sequence with the app output, which is sent to stdout.]
Calling reflect()
Listing tables:
Table name: drivestats
Index name: partition
/Users/ppatterson/src/trino-indexes/test.py:10: SAWarning: index key 'partition' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
/Users/ppatterson/src/trino-indexes/test.py:10: SAWarning: index key 'record_count' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
/Users/ppatterson/src/trino-indexes/test.py:10: SAWarning: index key 'file_count' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
/Users/ppatterson/src/trino-indexes/test.py:10: SAWarning: index key 'total_size' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
/Users/ppatterson/src/trino-indexes/test.py:10: SAWarning: index key 'data' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
Log output
n/a
Operating System
RHEL 9.5 (Plow) in Docker
Trino Python client version
0.332.0
Trino Server version
467
Python version
Python 3.12.2
Are you willing to submit PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
Expected behavior
partition
should not be returned as an index on an Iceberg table.Actual behavior
When performing reflection on an Iceberg table, SQLAlchemy's
Inspector._reflect_indexes()
method displays a series of warnings, one for each of the columns in the Iceberg table's$partitions
metadata table:After reflection, the table object for the Iceberg table contains an index named
partition
with no columns.The above problems are caused by
_get_partitions()
(called byTrinoDialect.get_indexes()
), which assumes that, ifSELECT * FROM {schema}."{table_name}$partitions"
succeeds, then the table is a Hive table.This assumption is not correct - the query also succeeds for Iceberg tables, but the returned columns are not indexes on the Iceberg table. The client should probably query
system.metadata.catalogs
to get theconnector_name
for the catalog and check that it is"hive"
.Steps To Reproduce
The following test app causes the warnings to be shown.
pip install trino[sqlalchemy]
before running it.Expected output:
Current output:
[Note that the warnings are sent to
stderr
, so they typically appear out of sequence with the app output, which is sent tostdout
.]Log output
n/a
Operating System
RHEL 9.5 (Plow) in Docker
Trino Python client version
0.332.0
Trino Server version
467
Python version
Python 3.12.2
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: