Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQueryCreateExternalTableOperator doesn't trigger schema discovery #45512

Open
2 tasks done
EugeneYushin opened this issue Jan 9, 2025 · 3 comments
Open
2 tasks done
Labels
area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:google Google (including GCP) related issues

Comments

@EugeneYushin
Copy link

EugeneYushin commented Jan 9, 2025

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

No response

Apache Airflow version

2.7.3

Operating System

Darwin Kernel Version 23.6.0

Deployment

Google Cloud Composer

Deployment details

No response

What happened

Using BigQueryCreateExternalTableOperator with autodetect=True doesn't trigger schema to be created in BQ:
image
Screenshot 2025-01-09 at 17 10 16
Screenshot 2025-01-09 at 17 14 58

What you think should happen instead

Running the similar command from CLI leads to correct schema discovery processing:

bq mk --table \
  --external_table_definition=@CSV=gs://.../test.csv \
 <dataset_name>.airflow_test_2
image

How to reproduce

Upload CSV file to GCS:

name,age
Bill,42
Sally,15

Run BigQueryCreateExternalTableOperator:

BigQueryCreateExternalTableOperator(
        task_id="bq_op",
        bucket="***",
        source_objects=["test.csv"],
        destination_project_dataset_table="***",
        source_format="CSV",
        autodetect=True,
        skip_leading_rows=1,
        location="us-east4",
    )

Anything else

Most probably switching from deprecated API bq_hook.create_external_table to create_empty_table led to that behavior
#24363

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@EugeneYushin EugeneYushin added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jan 9, 2025
Copy link

boring-cyborg bot commented Jan 9, 2025

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dosubot dosubot bot added the provider:google Google (including GCP) related issues label Jan 9, 2025
@EugeneYushin
Copy link
Author

I was able to trace it to that line:

Providing empty schema_fields explicitly intervenes with BQ logic for schema auto-detection. Nominal check to pass it to BQ client if only this field is set should fix the issue.

@EugeneYushin
Copy link
Author

Although, it looks like the related part of the API is deprecated. Hence, it might not make much sense to fix it while it's up to be deleted soon. It will be a sole responsibility of the caller to provide valid table definition and the API will just pass it through to GCP:

warnings.warn(
"Passing table parameters via keywords arguments will be deprecated. "
"Please provide table definition using `table_resource` parameter.",
AirflowProviderDeprecationWarning,
stacklevel=2,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:google Google (including GCP) related issues
Projects
None yet
Development

No branches or pull requests

1 participant