Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][need discussion] Pydantic Transformer guess python type #3060

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Future-Outlier
Copy link
Member

@Future-Outlier Future-Outlier commented Jan 15, 2025

Need Discussion

[need discussion]
I found a problem when supporting guess python type for pydantic basemodel.
json schema -> python class
Since the JSON schema can be converted into either a dataclass or a Pydantic BaseModel, we need to include additional metadata in the literal type’s metadata.
This ensures we know whether to generate a Pydantic BaseModel or a dataclass from the JSON schema.

Tracking issue

flyteorg/flyte#5318

Related issue

pydantic/pydantic#643
flyteorg/flyte#6081

Why are the changes needed?

People want to use remote api to execute workflow with pydantic basemodel input.
This work in single binary and Union Cluster.

from enum import Enum
from typing import Optional
from pydantic import BaseModel
from flytekit import task, workflow, ImageSpec

image = ImageSpec(
    name="glossai-flyte-example",
    packages=["pydantic > 2"],
    registry="localhost:30000",
)

# Define the Enum
class ContributionScoreType(Enum):
    DISABLED = "disabled"
    ENABLED = "enabled"
    PARTIAL = "partial"

# Define the nested Pydantic models
class EnrichmentData(BaseModel):
    title: str
    score: float
    is_valid: bool
    count: int
    metadata: dict = {}

class TransporterInput(BaseModel):
    source: str
    destination: str
    batch_size: int = 100
    retry_count: int = 3
    timeout: float = 30.0
    enabled: bool = True

# Define the main input model
class EnrichmentWorkflowInput(BaseModel):
    post_id: str = "123"
    job_id: str = "456"
    enrichment: EnrichmentData = EnrichmentData(
        title="Test Title",
        score=0.9,
        is_valid=True,
        count=100,
        metadata={"key": "value"}
    )
    transporter: Optional[TransporterInput] = None
    contribution_score_type: ContributionScoreType = ContributionScoreType.DISABLED

# ==================

from flytekit.remote.remote import FlyteRemote
from flytekit.configuration import Config



remote = FlyteRemote(
    Config.for_endpoint("localhost:30080", True),
)
# i = remote.get("flyte://v1/flytesnacks/development/ab5nmkg6fhxjvmprs9gv/n0/i")

previous_execution = remote.fetch_execution(
    project="flytesnacks",
    domain="development",
    name="ab5nmkg6fhxjvmprs9gv")
input_data = previous_execution.inputs
print("previous_execution:", previous_execution)

workflow = remote.fetch_workflow(
    project="flytesnacks",
    domain="development",
    name="flyte_example.enrichment_workflow",  
    version="oSqsuSJKfiZuSRduQ7ODGw",
)

# # Execute the workflow with the same inputs
new_execution = remote.execute(
    workflow,
    inputs={"input_data": EnrichmentWorkflowInput()},
    project="flytesnacks",  # Use the same project as the original execution
    domain="development",   # Use the same domain as the original execution
    name="new_execution",
    version="oSqsuSJKfiZuSRduQ7ODGw",
    wait=True  # Set to True if you want to wait for execution to complete
)

print(f"New execution launched: {new_execution.id.name}")

What changes were proposed in this pull request?

Implement a non-perfect guess_python_type in pydantic basemodel transformer.

How was this patch tested?

integration test and remote execution.
It's hard to test in unit test because we make lots of type Any.

Setup process

Screenshots

single binary

image image

Union cluster

image image

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 15, 2025

Code Review Agent Run #b2a53c

Actionable Suggestions - 2
  • flytekit/extras/pydantic_transformer/transformer.py - 2
Review Details
  • Files reviewed - 1 · Commit Range: 39a4f3e..39a4f3e
    • flytekit/extras/pydantic_transformer/transformer.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Feature Improvement - Enhanced Pydantic Type Transformation

transformer.py - Added functionality to reconstruct Pydantic models from JSON schema metadata

Comment on lines +128 to +129
except Exception as e:
raise TypeTransformerFailedError(f"Failed to create Pydantic model from schema: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve exception handling patterns

Exception handling can be improved by using 'raise ... from err' pattern and avoiding f-strings in exceptions.

Code suggestion
Check the AI-generated fix before applying
Suggested change
except Exception as e:
raise TypeTransformerFailedError(f"Failed to create Pydantic model from schema: {e}")
except Exception as err:
error_msg = f"Failed to create Pydantic model from schema: {err}"
raise TypeTransformerFailedError(error_msg) from err

Code Review Run #b2a53c


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

json_type = json_type[0]
else:
# More complex unions can be handled here if needed
json_type = "string" # default fallback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider robust union type handling

Consider implementing more robust handling of complex union types instead of defaulting to string. The current implementation may lead to data loss or incorrect type conversions for union types with multiple variants.

Code suggestion
Check the AI-generated fix before applying
Suggested change
json_type = "string" # default fallback
variants = [type_mapping.get(t, Any) for t in json_type]
if len(variants) > 1:
return Union[tuple(variants)] # type: ignore
return variants[0]

Code Review Run #b2a53c


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Copy link

codecov bot commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 10.20408% with 44 lines in your changes missing coverage. Please review.

Project coverage is 36.01%. Comparing base (a2bcf95) to head (39a4f3e).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...lytekit/extras/pydantic_transformer/transformer.py 10.20% 44 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (a2bcf95) and HEAD (39a4f3e). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (a2bcf95) HEAD (39a4f3e)
3 1
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #3060       +/-   ##
===========================================
- Coverage   72.87%   36.01%   -36.86%     
===========================================
  Files         205      202        -3     
  Lines       21553    21414      -139     
  Branches     2746     2752        +6     
===========================================
- Hits        15707     7713     -7994     
- Misses       5062    13593     +8531     
+ Partials      784      108      -676     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Future-Outlier Future-Outlier changed the title [WIP] Pydantic Transformer guess python type [WIP][need discussion] Pydantic Transformer guess python type Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants