DEV Community

Connie Leung for Google Developer Experts

Posted on • Edited on • Originally published at blueskyconnie.com

[GDE] Fetching Live Sports Data with Gemini 3: A Guide to Grounded, Structured JSON

Retrieving accurate, up-to-date sports statistics with LLMs is notoriously difficult due to hallucinations and outdated training data. This post explores extracting the Premier League 2025/2026 player statistics using the Gemini 3 Flash Preview model, URL Context, Grounding with Google Search, and structured output. I will describe the lessons learned, how I extracted data from the official Premier League player profile pages when available, and how I used Google Search to find missing data on the websites.


Introduction

The Gemini 3 Flash Preview model supports structured output, URL Context, and Grounding with Google Search. When the response MIME type is application/json and the response JSON schema is provided, Gemini 3 is remarkably consistent at returning valid JSON objects. Moreover, the Pydantic library has functions to convert a generic JSON object to a Pydantic model.

The post walks through the code in the Colab notebook, and demonstrate how to engineer a prompt that uses the provided URLs as the primary source. When the URL is invalid or the data is missing (e.g., net worth, preferred foot, and height), the model uses the the Google Search tool. When an answer is generated for each field, I apply grounding to verify that the model does not hallucinate and correctly cites reputable sources.


Prerequisites

To run the provided Colab notebook, you will need:

  • Vertex AI in Express Mode: I utilize Gemini via Vertex AI due to regional availability (Hong Kong), but these features function identically in the public Gemini API. If you want to use Gemini in Vertex AI for free, you can sign up for Vertex AI in Express Mode using your Gmail account.
  • Google Cloud API Key: Ensure the API key is properly configured within your environment.
  • Google Colab VS Code Extension - Visit the Visual Studio Marketplace to install the extension to run the demo in the VS Code environment.
  • Python: Have Python 3.12+ and the google-genai SDK installed.

Demo Overview

We will see how to force the model to prioritize retrieval over generation. The demo attempts to retrieve player statistics for the Premier League 2025/2026 season.

The Gemini 3 Flash Preview model is given some pages:

 url_list = [
    "https://www.premierleague.com/en/players/141746/bruno-fernandes/stats",
    "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "https://www.premierleague.com/en/players/97032/virgil-van-dijk/stats",
    "https://www.premierleague.com/en/players/244851/cole-palmer/stats"
]

urls = "\n".join(url_list)
Enter fullscreen mode Exit fullscreen mode

When the requested player is Bruno Fernandes, Erling Haaland, Virgil Van Dijk, or Cole Palmer, the model will execute the URL Context tool to extract data in the above URLs. Otherwise, the model will execute the Google Search tool to find the facts from reputable sources.

The system performs name verification before data retrieval. If the subject is not a player or not a Premier League player in the 2025/2026 season, the structured output assigns false to is_professional_player and an explanation to verification_status. When it is a Premier League player, the prompt instructs the model to return a JSON object containing the name, net worth, is_professional_player, verification_status, height, shirt_number, preferred_foot, goals, goal_assists, appearances, and minutes_played. Then, the notebook converts the generic JSON object to a Pydantic PlayerStats model, and displays the value, source quotation, uri, and source type to ensure accuracy.


Architecture

High level architecture of Retrieving Premier League Player Stats with URL Context and Google Search Tools

The URL Context and Google Search tools enable the Gemini 3 Flash Preview model to extract the performance metrics for the Premier League 2025/2026 season from the Premier League player profile pages, while retrieving personal information, such as net worth, preferred foot, and height from the web. Even though the model's internal data is outdated, it can access an external system to obtain up-to-date information.


Environment Variable

Copy .env.example to .env and replace the GOOGLE_CLOUD_API_KEY with your API key.

GOOGLE_CLOUD_API_KEY=<GOOGLE CLOUD API KEY>
Enter fullscreen mode Exit fullscreen mode

The create_vertexai_client function constructs and returns a Gemini client. I use the client to call the Gemini 3 Flash Preview model to retrieve data from the provided URLs, and fall back to the Google Search tool when a URL is not provided or if it does not provide the details after deep search.

The create_vertexai_client result is assigned to a global client variable.

genai is a unified SDK for Gemini API and Gemini in Vertex AI. The vertexai=True parameter specifies usage of a Gemini in Vertex AI in the Colab notebook.

from google import genai
from dotenv import load_dotenv
from google.genai import types
from pydantic import BaseModel, Field
from typing import Literal
import os

def create_vertexai_client():

    cloud_api_key = os.getenv("GOOGLE_CLOUD_API_KEY")
    if not cloud_api_key:
        raise ValueError("GOOGLE_CLOUD_API_KEY not found in .env file")

    # Configure the client with your API key
    client = genai.Client(
        vertexai=True, 
        api_key=cloud_api_key, 
    )

    return client
Enter fullscreen mode Exit fullscreen mode
load_dotenv()

client = create_vertexai_client()
Enter fullscreen mode Exit fullscreen mode

Installation

The demo uses the newer Google Gen AI SDK, so I install the google-genai library.

%pip install google-genai
%pip install dotenv
%pip install pydantic
Enter fullscreen mode Exit fullscreen mode

Lessons Learned

I will describe the challenges faced in this demo and how I tackled each one.

1. Lazy Behavior of Gemini 3 Flash Preview Model

Problem:

Gemini prioritized its internal training data (or generic search) over the specific url_context tool unless explicitly constrained. It relied on Google Search or internal knowledge to find facts for each field instead of using the given Premier League player profile pages.

Solution:

I added a DYNAMIC SOURCE IDENTIFICATION section to the prompt to always execute the URL Context tool when a player's profile page is available. When a player's profile page is not available, the model uses the Grounding with Google Search tool to retrieve information for each field in the JSON object. Finally, I emphasized the priority of the official URL, web citations, and internal training data.

### **1. DYNAMIC SOURCE IDENTIFICATION**
1.  **IF a Premier League URL is provided:**
    *   You **MUST** execute the `url_context` tool first. This is your **Primary Source**.
2.  **IF NO URL is provided (or if the player is non-Premier League):**
    *   The **Web Citations** (Google Search results) become your **Primary Source**. 
3.  **PRIORITY:** Official URL > Web Citations > Internal Training Data.
Enter fullscreen mode Exit fullscreen mode

2. Chunk ID Hallucination and Low Confidence in Grounding Metadata

Problem:

I wanted to verify that the AI did not hallucinate and assign incorrect value to the fields in the JSON object. Therefore, I declared a Pydantic model, PlayerField, to store the chunk id that provided the answer for the field.

class PlayerField(BaseModel):
    value: str | int | float
    chunk_id: int = Field(..., description="Index of the chunk in the grounding metadata that provide the value")

class PlayerStats(BaseModel):
    name: str
    net_worth: PlayerField | None = Field(..., description="Net worth of the Premier League player.")
    is_professional_player: bool = Field(..., description="Must be True if found in Premier League records, False otherwise")
    verification_status: str = Field(..., description="Explanation of where the data was found or why it failed")
    height: PlayerField
    shirt_number: PlayerField
    preferred_foot: PlayerField
    goals: PlayerField
    goal_assists: PlayerField
    appearances: PlayerField
    minutes_played: PlayerField
Enter fullscreen mode Exit fullscreen mode

Moreover, I wrote functions to obtain the web and retrieved context URIs from the grounding metadata to verify the Chunk ID.

def get_citations(response: types.GenerateContentResponse) -> tuple[int, list[dict]]:

    citations: list[dict] = []
    if response.candidates is not None and len(response.candidates) > 0:
        candidate = response.candidates[0]
        if candidate.grounding_metadata:
            grounding_chunks = candidate.grounding_metadata.grounding_chunks or []
            num_chunks = len(grounding_chunks)
            for i, chunk in enumerate(grounding_chunks):
                if chunk and chunk.web and chunk.web.uri:
                    citations.append({
                        "chunk_id": i,
                        "uri": chunk.web.uri,
                        "title": chunk.web.title
                    })
                elif chunk and chunk.retrieved_context and chunk.retrieved_context.uri:
                    citations.append({
                        "chunk_id": i,
                        "uri": chunk.retrieved_context.uri,
                        "title": chunk.retrieved_context.text
                    })

    return num_chunks, citations

def print_citations_by_response(response: types.GenerateContentResponse):
    num_chunks, citations = get_citations(response)

    print("num_chunks ->", num_chunks)
    for i, citation in enumerate(citations):
        print(f"Citation {i}: {citation}")
Enter fullscreen mode Exit fullscreen mode

The chunk_id in the PlayerField was larger than num_chunks, which indicated that the model hallucinated and created a chunk_id that did not exist in the grounding metadata.

Bad Chunk ID

Number of chunks: 5. Chunk IDs range from 0 to 4.

{
  "value":  "Right",
  "chunk_id": 10    (Incorrect chunk id)
}
Enter fullscreen mode Exit fullscreen mode

Good Chunk ID

Number of chunks: 5. Chunk IDs range from 0 to 4.

{
  "value":  "Right",
  "chunk_id": 2    (Correct chunk id)
}
Enter fullscreen mode Exit fullscreen mode

Solution:

I considered the grounding metadata to be an unreliable source of truth, and did not use it to verify the chunk id. I modified the PlayerField model, removed the chunk_id field, and added the source_quote, uri, and source_type fields.

This architectural shift involved moving from relying on grounding metadata to asking the model to generate the citation inline within the uri field.

class PlayerField(BaseModel):
    value: str | int | float
    source_quote: str
    uri: str | None = Field(None, description="The EXACT, UNEDITED URL provided by the tool. Do not guess or shorten. None if the source is internal training data")
    source_type: Literal["GOOGLE_SEARCH", "URL_CONTEXT", "INTERNAL_KNOWLEDGE"] = Field(None, description="Categorize the source of this specific field. Must not be None when uri is not null.")
Enter fullscreen mode Exit fullscreen mode

This forces Gemini to read the pages to cite specific information rather than hallucinating an answer for each field. The uri field stores either one of the provided URLs or a URL of the Google Search result. If Gemini possesses the fact and provides the answer, the uri field is None. When uri is None, the source_type must be INTERNAL_KNOWLEDGE. When uri contains vertexaisearch, the source_type is GOOGLE_SEARCH. Otherwise, the source_type is URL_CONTEXT.

3. Broken or Invalid URIs in the JSON object

Problem:

The uri in the JSON object was often a vertexaisearch proxy link or a hallucinated string, rather than the direct source URL. When uri is neither None nor one of the provided Premier League player profile pages, it is a Google Search result. Since I am using Gemini in Vertex AI, the base URL for search results is https://vertexaisearch.cloud.google.com. If the URI has a different base URL, it is very likely broken or invalid.

Solution:

I added a URI EXTRACTION RULES (STRICT) section to specify the rules to determine the uri field in the PlayerField model.

### **4. URI EXTRACTION RULES (STRICT):**
1.  **NO GUESSING:** You are strictly forbidden from constructing, autocompleting, or guessing a URL based on the website name. 
2.  **LITERAL COPY:** You must copy the `uri` exactly as it appears in the search result that provided the `source_quote`. 
3.  **THE JOIN RULE:** Before finalizing the JSON, verify that the `source_quote` actually appears in the content/snippet associated with the `uri` you provided.
4.  **IF IN DOUBT:** If you found a fact in your training data but cannot find a specific, working URI for it in the search results, you MUST set `source_type` to `INTERNAL_KNOWLEDGE` and `uri` to `null`.
Enter fullscreen mode Exit fullscreen mode

Upon re-running the Colab notebook, the uri contained vertexaisearch. When I pasted a vertexaisearch URI into the browser, it redirected me to the real page.

Pro Tip: The vertexaisearch.cloud.google.com links are specific to Grounding with Google Search in Vertex AI and act as a bridge to the original source.

This was the result of Erling Haaland (Incorrect Output before the next fix).

{
  "name": "Erling Haaland",
  "net_worth": {
    "value": "$80 million",
    "source_quote": "Haaland's net worth, which according to Forbes is $80m (£59.5m)",
    "uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwe5-b036A0QaSzU9G0uAgOVZs1OJ0NMe0MmnMy-wRh4Qh9PtR_PhO6UV4S4-6ZJXJL4cicMEt0CFm_hwEER-d5WI5uY3H6DySLRcQZOmbeXF0PQU68ny6xyWF-64jJ_2Jnht_hkr7Kk3FasVjBki_2Q-n8jvr6PIcYUkTrFyJa2wZjPt4jkkdrFDo4Y6A2_OahUrg7unsUbzWJBxPp6e52tzg9w==",
    "source_type": null
  },
  "is_professional_player": true,
  "verification_status": "Confirmed active for Manchester City in the Premier League for the 2025/26 Season via official Premier League statistics.",
  "height": {
    "value": 1.95,
    "source_quote": "Height, 1.95 m (6 ft 5 in).",
    "uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHFUkt6uBYUgbC5fgMofMPjKps_9QszoJClzsHz1zkpovNWK1-OueQEj_2oFq--1dOaL7L4X9Aef7iLamfT1f4iN2aH3BxaYfJUznngJ2vXuawp-SAWl4x3R1nxHpiSCKF_pjnPg-rL",
    "source_type": null
  },
  "shirt_number": {
    "value": 9,
    "source_quote": "Man City• 9",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": null
  },
  "preferred_foot": {
    "value": "left",
    "source_quote": "Foot: left",
    "uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHf-kM-at4W05tsLLdo3dzthpWxntkWhnfwd4fWKBHxvl8qWHioz9jaDrI5ZkmhVlo71D7RQhctvDTXFqywdvl40_Q6CC2S24FujOcmdg1rhVGJiJudqtzVqLtpQ-kIWlJIkevddD68Fe40se8gRmljvgmqx4O5ATDve4F23gRmljvgmqx4O5ATDve4F23g==",
    "source_type": null
  },
  "goals": {
    "value": 20,
    "source_quote": "Goals 20",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": null
  },
  "goal_assists": {
    "value": 4,
    "source_quote": "Assists 4",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": null
  },
  "appearances": {
    "value": 22,
    "source_quote": "Appearances 22",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": null
  },
  "minutes_played": {
    "value": 1906,
    "source_quote": "Minutes Played 1,906",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": null
  }
}
Enter fullscreen mode Exit fullscreen mode

One of the redirected URLs led me to https://www.sofascore.com/football/player/erling-haaland/839956 to view the preferred foot of the football player.

4. Valid URI has null Source Type

Problem:

When the player is Erling Haaland, the source_type is null even though the uri is https://www.premierleague.com/en/players/223094/erling-haaland/stats or a valid Google Search result.

When the player is Kaoru Mitoma, the model worked correctly and assigned GOOGLE_SEARCH to the source_type field of all the URIs.

This occurred because the model did not classify the source_type after successfully using the URL Context tool. Therefore, the source type is null for Erling Haaland.

When the player was Kaoru Mitoma, the model exhaustively queried Google Search and classified the source_type as GOOGLE_SEARCH.

Solution:

The solution was to provide instructions for categorizing the source_type field in the PlayerField model.

The prompt had a new MANDATORY SOURCE_TYPE CLASSIFICATION RULES section to derive the source_type field based on the uri field.

### **2. MANDATORY SOURCE_TYPE CLASSIFICATION RULES**
You are strictly forbidden from returning `null` for `source_type` if a `uri` is present.
*   **MATCHING RULE:** If the `uri` matches one of the URLs provided below, you MUST use "URL_CONTEXT".
*   **SEARCH RULE:** If the `uri` is a search result (e.g., Transfermarkt, Wikipedia, vertexaisearch links), you MUST use "GOOGLE_SEARCH".
*   **FALLBACK RULE:** If no tool found the data and you use internal memory, `uri` must be `null` and `source_type` must be "INTERNAL_KNOWLEDGE".
Enter fullscreen mode Exit fullscreen mode

I re-ran the Colab notebook, and the source_type field was None when the uri field was None.

{
  "name": "Erling Haaland",
  "net_worth": {
    "value": "$80 million",
    "source_quote": "Haaland's net worth, which according to Forbes is $80m (£59.5m)",
    "uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwe5-b036A0QaSzU9G0uAgOVZs1OJ0NMe0MmnMy-wRh4Qh9PtR_PhO6UV4S4-6ZJXJL4cicMEt0CFm_hwEER-d5WI5uY3H6DySLRcQZOmbeXF0PQU68ny6xyWF-64jJ_2Jnht_hkr7Kk3FasVjBki_2Q-n8jvr6PIcYUkTrFyJa2wZjPt4jkkdrFDo4Y6A2_OahUrg7unsUbzWJBxPp6e52tzg9w==",
    "source_type": "GOOGLE_SEARCH"
  },
  "is_professional_player": true,
  "verification_status": "Confirmed active for Manchester City in the Premier League for the 2025/26 season via official Premier League statistics.",
  "height": {
    "value": 1.95,
    "source_quote": "Height, 1.95 m (6 ft 5 in).",
    "uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHFUkt6uBYUgbC5fgMofMPjKps_9QszoJClzsHz1zkpovNWK1-OueQEj_2oFq--1dOaL7L4X9Aef7iLamfT1f4iN2aH3BxaYfJUznngJ2vXuawp-SAWl4x3R1nxHpiSCKF_pjnPg-rL",
    "source_type": "GOOGLE_SEARCH"
  },
  "shirt_number": {
    "value": 9,
    "source_quote": "Man City• 9",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": "URL_CONTEXT"
  },
  "preferred_foot": {
    "value": "left",
    "source_quote": "Foot: left",
    "uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHf-kM-at4W05tsLLdo3dzthpWxntkWhnfwd4fWKBHxvl8qWHioz9jaDrI5ZkmhVlo71D7RQhctvDTXFqywdvl40_Q6CC2S24FujOcmdg1rhVGJiJudqtzVqLtpQ-kIWlJIkevddD68Fe40se8gRmljvgmqx4O5ATDve4F23gRmljvgmqx4O5ATDve4F23g==",
    "source_type": "GOOGLE_SEARCH"
  },
  "goals": {
    "value": 20,
    "source_quote": "Goals 20",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": "URL_CONTEXT"
  },
  "goal_assists": {
    "value": 4,
    "source_quote": "Assists 4",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": "URL_CONTEXT"
  },
  "appearances": {
    "value": 22,
    "source_quote": "Appearances 22",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": "URL_CONTEXT"
  },
  "minutes_played": {
    "value": 1906,
    "source_quote": "Minutes Played 1,906",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": "URL_CONTEXT"
  }
}
Enter fullscreen mode Exit fullscreen mode

The Final Prompt Template

This is the complete prompt after incorporating these lessons. It has the objective (Retrieve player statistics), instructions (execute URL context tool first, and Google search second), constraints (must be an active Premier League player in 2025/2026 Season), context (the provided URLS and player name), and the output format (JSON object).

**OBJECTIVE:**
Search and identify the Premier League 2025/2026 Player Statistics of {player}.

---

### **1. DYNAMIC SOURCE IDENTIFICATION**
1.  **IF a Premier League URL is provided:**
    *   You **MUST** execute the `url_context` tool first. This is your **Primary Source**.
2.  **IF NO URL is provided (or if the player is non-PL):**
    *   The **Web Citations** (Google Search results) become your **Primary Source**. 
3.  **PRIORITY:** Official URL > Web Citations > Internal Training Data.

### **2. MANDATORY SOURCE_TYPE CLASSIFICATION RULES**
You are strictly forbidden from returning `null` for `source_type` if a `uri` is present.
*   **MATCHING RULE:** If the `uri` matches one of the URLs provided below, you MUST use "URL_CONTEXT".
*   **SEARCH RULE:** If the `uri` is a search result (e.g., Transfermarkt, Wikipedia, vertexaisearch links), you MUST use "GOOGLE_SEARCH".
*   **FALLBACK RULE:** If no tool found the data and you use internal memory, `uri` must be `null` and `source_type` must be "INTERNAL_KNOWLEDGE".

### **3. INACTIVE / NON-PROFESSIONAL PLAYER LOGIC**
If the player cannot be found in active professional records for the 2025/26 Season:
*   `is_professional_player`: `false`.
*   **All Numeric Fields:** `{{ "value": 0, "source_quote": null, "uri": null, "source_type": null }}`.
*   **All String Fields:** `{{ "value": "n/a", "source_quote": null, "uri": null, "source_type": null }}`.
*   **Verification Status:** "Player not found in active professional databases."

### **4. URI EXTRACTION RULES (STRICT):**
1.  **NO GUESSING:** You are strictly forbidden from constructing, autocompleting, or guessing a URL based on the website name. 
2.  **LITERAL COPY:** You must copy the `uri` exactly as it appears in the search result that provided the `source_quote`. 
3.  **THE JOIN RULE:** Before finalizing the JSON, verify that the `source_quote` actually appears in the content/snippet associated with the `uri` you provided.
4.  **IF IN DOUBT:** If you found a fact in your training data but cannot find a specific, working URI for it in the search results, you MUST set `source_type` to `INTERNAL_KNOWLEDGE` and `uri` to `null`.

### **5. DATA VALIDATION & AUDIT**
*   **`net_worth`**: Must be a string (e.g., `100 million dollars`).
*   **`height`**: Must be a float (e.g., `1.85`).

### PROVIDED URLS:
{ urls }

### OUTPUT FORMAT:
Return a JSON object exactly as follows:
{{
    "name": "string",
    "net_worth": {{ "value": "string", "source_quote": "...", "uri": "...", "source_type": "Google Search" }},
    "is_professional_player": boolean,
    "verification_status": "Detailed confirmation of Premier League status for 2025/26",
    "height": {{ "value": float, "source_quote": "...", "uri": "...", "source_type": "URL Context" }},
    "shirt_number": {{ "value": int, "source_quote": "...", "uri": "...", "source_type": "URL Context" }},
    "preferred_foot": {{ "value": "string", "source_quote": "...", "uri": "...", "source_type": "URL Context" }},
    "goals": {{ "value": int, "source_quote": "...", "uri": "...", "source_type": "URL Context" }},
    "goal_assists": {{ "value": int, "source_quote": "...", "uri": "...", "source_type": "URL Context" }},
    "appearances": {{ "value": int, "source_quote": "...", "uri": "...", "source_type": "URL Context" }},
    "minutes_played": {{ "value": int, "source_quote": "...", "uri": "...", "source_type": "URL Context" }}
}}
Enter fullscreen mode Exit fullscreen mode

The above is a Python F-string, where the double braces {{ }} escape the JSON structures so that Python does not treat them as variables.

After formulating a prompt that generates structured output, I move on to defining Pydantic models that align with it.


Final Pydantic Models

The PlayerField model has a value field that is either a string, an integer, or a floating-point number.

The PlayerStats model represents the structured output containing performance metrics and personal information such as net worth, height, and name. The is_professional_player and verification_status fields confirm whether the player is playing in the Premier League in the 2025/2026 season.

These field descriptions are passed to the model as part of the JSON schema and they act as "micro-prompts".

class PlayerField(BaseModel):
    value: str | int | float
    source_quote: str
    uri: str | None = Field(None, description="The EXACT, UNEDITED URL provided by the tool. Do not guess or shorten. None if the source is internal training data")
    source_type: Literal["GOOGLE_SEARCH", "URL_CONTEXT", "INTERNAL_KNOWLEDGE"] = Field(None, description="Categorize the source of this specific field. Must not be None when uri is not null.")

class PlayerStats(BaseModel):
    name: str
    net_worth: PlayerField = Field(..., description="Net worth of the PL player.")
    is_professional_player: bool = Field(..., description="Must be True if found in PL records, False otherwise")
    verification_status: str = Field(..., description="Explanation of where the data was found or why it failed")
    height: PlayerField
    shirt_number: PlayerField
    preferred_foot: PlayerField
    goals: PlayerField
    goal_assists: PlayerField
    appearances: PlayerField
    minutes_played: PlayerField
Enter fullscreen mode Exit fullscreen mode

Generate Structured Output with Tools

The get_player_stats function substitutes the urls and player variables in the prompt with the concatenated URLs and player name. Then, I pass the prompt to the Gemini 3 Flash Preview model to obtain a response.

The configuration specifies that the response_mime_type is application/json and the response_json_schema is the JSON schema of the PlayerStats model. The thinking level is set to High. The model also includes the URL Context tool to read Premier League player pages and the Google Search tool to query for missing details on web pages.

ThinkingLevel.High also adds latency and cost. A high thinking level causes the model to take more reasoning steps and use more thinking tokens before generating the JSON object.

from google.genai import types

def get_player_stats(player: str) -> types.GenerateContentResponse:

    prompt = f"""<original prompt from the above section>"""

    response = client.models.generate_content(
        model='gemini-3-flash-preview',
        contents=types.Content(
            role="user",
            parts=[types.Part(text=prompt)]
        ),
        config=types.GenerateContentConfig(
            response_mime_type="application/json",
            response_json_schema=PlayerStats.model_json_schema(),
            thinking_config=types.ThinkingConfig(
                thinking_level=types.ThinkingLevel.HIGH,
            ),
            tools=[
                types.Tool(url_context=types.UrlContext()),
                types.Tool(google_search=types.GoogleSearch()),
            ]
        )
    )

    return response
Enter fullscreen mode Exit fullscreen mode

Get the Results

The model is proficient at returning a JSON object and it can be found in response.parsed. Then, I use PlayerStats.model_validate(...) to parse response.parsed into a player_stats instance.

When response_mime_type="application/json" is specified, the parsed field is the expected path, and parsing text is a fallback for edge cases.

If response.parsed is None, the fallback is to parse the response.text with PlayerStats.model_validate_json(...). Gemini in Vertex AI wraps the text response in a Markdown JSON code block, so the clean_json_string function removes the enclosed code block to ensure proper parsing.

def clean_json_string(raw_string):
    # Remove the markdown code blocks
    clean_str = raw_string.strip()
    if clean_str.startswith("<JSON code block>"):
        clean_str = clean_str[7:]
    if clean_str.endswith("<JSON code block>"):
        clean_str = clean_str[:-3]
    return clean_str.strip()

def print_player_stats(response: types.GenerateContentResponse):
    if response.parsed:
        player_stats = PlayerStats.model_validate(response.parsed)
    else:
        player_stats = PlayerStats.model_validate_json(clean_json_string(response.text))

    print(player_stats.model_dump_json(indent=2))
Enter fullscreen mode Exit fullscreen mode

When the get_player_stats function is called with Erling Haaland, the model triggers the URL Context tool first, and then the Google Search tool. In contrast, the model triggers the Google Search tool when the player is Kaoru Mitoma and his profile page is not included in the provided URL list.

response = get_player_stats(player="Erling Haaland")
print_player_stats(response=response)

response = get_player_stats(player="Kaoru Mitoma")
print_player_stats(response)
Enter fullscreen mode Exit fullscreen mode

Conclusion

I provided the URL Context and Google Search tools to allow Gemini 3 to interact with external knowledge. The internal knowledge of the Gemini 3 Flash Preview model lacks the latest player statistics of the Premier League 2025/2026 season, so it must call tools to find the statistics from the official Premier League player profile pages, and the missing net worth, height, and preferred foot from other sports pages.

I hope you find this content useful and appreciate the capabilities of the Gemini API.

Thank you.


Resources

  1. GitHub Example: Retrieve Premier League Player Stats Colab.
  2. Use Vertex AI for free: Vertex AI in express mode.
  3. Run the demo in VS Code: Google Colab VS Code Extension

Top comments (0)