Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenerateContentConfig cannot create response schemas from nested Pydantic BaseModel classes #60

Open
seb-andr345 opened this issue Dec 30, 2024 · 2 comments
Assignees
Labels
priority: p0 Highest priority. Critical issue. P0 implies highest priority. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@seb-andr345
Copy link

Environment details

  • Programming language: Python
  • OS: Ubuntu 22.04.5 LTS
  • Language runtime version: 3.12.7
  • Package version: 0.3.0

Steps to reproduce

Set your project id in the example below, and run.

Actual vs expected behavior:

  • Actual: GenerateContentConfig(response_schema=...) fails when the response schema is nested BaseModel classes
  • Expected: Passing nested BaseModel classes should work as genai already calls BaseModel.model_json_schema() under the hood.

Description

Passing a Pydantic BaseModel class with other classes nested inside, in the response_schema argument to GenerateContentConfig(), fails. However, it can easily be made to work.

I tested this with genai v0.3.0

The root cause is that the function t_schema() in google.genai._transformers.py does not properly handle the output of BaseModel.model_json_schema() when there are nested classes. See here:

schema = process_schema(origin.model_json_schema())

In this case, BaseModel.model_json_schema() places definitions of inner classes in a "$defs" section of the schema by default. They are then pointed to with "$ref" as needed elsewhere. (This behavior can be customized. See pydantic.json_schema)

A working solution is to replace these references with the actual definitions:

def get_schema(cls: BaseModel):

    schema = cls.model_json_schema()
    if "$defs" not in schema:
        return schema

    defs = schema.pop("$defs")

    def _resolve(schema):
        if "$ref" in schema:
            ref = schema.pop("$ref")
            schema.update(defs[ref.split("/")[-1]])
        if "properties" in schema:
            for prop in schema["properties"].values():
                _resolve(prop)
        if "items" in schema:
            _resolve(schema["items"])

    _resolve(schema)
    return schema

Example

This example demonstrates the problem and the workaround. Set the project id before trying it.

import json
from enum import Enum
from pprint import pprint

import yaml
from google import genai
from google.genai.types import GenerateContentConfig
from pydantic import BaseModel, Field, TypeAdapter

# Class definitions ###

class Grade(Enum):
    A_PLUS = "a+"
    A = "a"
    B = "b"
    C = "c"
    D = "d"
    F = "f"

class Recipe(BaseModel):
    recipe_name: str
    grade: Grade

class RecipeList(BaseModel):
    recipes: list[Recipe] = Field(..., max_length=10)

# Equivalent yaml schema ###

recipe_schema = \
"""type: OBJECT
properties:
  recipes:
    type: ARRAY
    items:
      type: OBJECT
      properties:
        recipe_name:
          type: STRING
        grade:
          title: Grade
          type: STRING
          enum:
            - a+
            - a
            - b
            - c
            - d
            - f
      required:
        - recipe_name
        - grade
    maxItems: 10
required:
  - recipes
"""

# Code ###

def get_schema(cls: BaseModel):

    schema = cls.model_json_schema()
    if "$defs" not in schema:
        return schema

    defs = schema.pop("$defs")

    def _resolve(schema):
        if "$ref" in schema:
            ref = schema.pop("$ref")
            schema.update(defs[ref.split("/")[-1]])
        if "properties" in schema:
            for prop in schema["properties"].values():
                _resolve(prop)
        if "items" in schema:
            _resolve(schema["items"])

    _resolve(schema)
    return schema


def query_llm(project, contents, response_schema):

    client = genai.Client(vertexai=True, project=project, location="us-central1")
    response = client.models.generate_content(
        model="gemini-2.0-flash-exp",
        contents=contents,
        config=GenerateContentConfig(
            response_mime_type="application/json",
            response_schema=response_schema,
            )
        )
    obj = TypeAdapter(RecipeList).validate_python(json.loads(response.text))

    return obj


if __name__ == "__main__":

    contents = "List about 10 cookie recipes, grade them based on popularity"
    project = "YOUR_PROJECT_ID"  # <- replace this

    # Variant 1: This works as as expected
    # recipes = query_llm(project, contents, yaml.safe_load(recipe_schema))

    # Variant 2: This does not work, but probably should
    recipes = query_llm(project, contents, RecipeList)

    # Variant 3: The workaround
    # recipes = query_llm(project, contents, get_schema(RecipeList))

    print(recipes)
@seb-andr345 seb-andr345 added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Dec 30, 2024
@Ben-Epstein
Copy link

Same issue for me, get_schema is working thanks!

@Giom-V
Copy link

Giom-V commented Jan 8, 2025

There's an issue with the list support with JSON mode. We're working on it.

@sasha-gitg sasha-gitg added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p0 Highest priority. Critical issue. P0 implies highest priority. and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p0 Highest priority. Critical issue. P0 implies highest priority. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

5 participants