Core Utilities

The core_utils package provides the foundational utilities that power tif1’s high-performance data processing pipeline. While these are primarily internal APIs, they expose powerful capabilities for advanced users who need fine-grained control over data transformations, backend conversions, and performance optimization.This module is the performance engine of tif1, implementing zero-copy operations, optimized JSON parsing, and intelligent DataFrame transformations that enable the library to process millions of rows of telemetry data with minimal overhead.

Overview

The core utilities package (tif1.core_utils) is organized into five specialized modules, each addressing a critical aspect of the library’s performance and functionality:

Backend Conversion (backend_conversion.py) - Zero-copy DataFrame conversion between pandas and polars using Apache Arrow
JSON Utilities (json_utils.py) - High-performance JSON parsing with orjson, providing 2-3x faster deserialization
Helper Functions (helpers.py) - DataFrame manipulation, validation, and data transformation utilities
Constants (constants.py) - Column name mappings, rename dictionaries, and configuration constants
Resource Manager (resource_manager.py) - Context manager for guaranteed resource cleanup with LIFO ordering

These utilities enable tif1 to achieve its performance goals while maintaining compatibility with both pandas and polars backends. The design philosophy emphasizes:

Zero-copy operations wherever possible to minimize memory overhead
Lazy evaluation and deferred imports to reduce startup time
Type safety with comprehensive validation and error handling
Performance-first design with optimized hot paths
Backend agnostic APIs that work seamlessly with pandas or polars

Module Import Structure

# Direct imports from core_utils
from tif1.core_utils.backend_conversion import (
    pandas_to_polars,
    polars_to_pandas,
    convert_backend,
    POLARS_AVAILABLE  # Boolean flag for polars availability
)

from tif1.core_utils.json_utils import (
    json_loads,
    json_dumps,
    parse_response_json  # HTTP response JSON parser
)

from tif1.core_utils.helpers import (
    DataFrame,  # Type alias for pd.DataFrame | pl.DataFrame
    _validate_year,
    _validate_drivers_list,
    _validate_lap_number,
    _validate_string_param,
    _encode_url_component,
    _is_empty_df,
    _create_empty_df,
    _filter_valid_laptimes,
    _rename_columns,
    _apply_categorical,
    _create_telemetry_df,
    _reorder_laps_columns,
    _normalize_row_iteration,
    _ensure_polars_available
)

from tif1.core_utils.constants import (
    MIN_YEAR, MAX_YEAR,
    LAP_RENAME_MAP,
    TELEMETRY_RENAME_MAP,
    RACE_CONTROL_RENAME_MAP,
    WEATHER_RENAME_MAP,
    CATEGORICAL_COLS,
    FASTF1_LAPS_COLUMN_ORDER,
    COL_DRIVER, COL_TEAM, COL_LAP_NUMBER,
    COL_LAP_TIME, COL_LAP_TIME_SECONDS
)

from tif1.core_utils.resource_manager import ResourceManager

Performance Philosophy

The core_utils module is designed with performance as the primary concern. Every function is optimized for:

Minimal memory allocations: Reuse buffers, avoid unnecessary copies
Cache-friendly access patterns: Sequential reads, predictable branches
SIMD-friendly operations: Leverage NumPy/Arrow vectorization
Lazy computation: Defer work until absolutely necessary
Zero-copy semantics: Share memory between backends via Arrow

Real-world impact: These optimizations enable tif1 to load and process a full race weekend (3 practice sessions, qualifying, race) with telemetry for all 20 drivers in under 5 seconds on modern hardware.

Backend Conversion

The backend conversion module provides zero-copy DataFrame transformations between pandas and polars using Apache Arrow as the interchange format. This enables seamless switching between backends without the performance penalty of traditional serialization/deserialization.

Architecture

The conversion system leverages:

Apache Arrow as the zero-copy interchange format
PyArrow extension arrays for pandas to maintain Arrow memory layout
Lazy polars imports to avoid dependency requirements when not needed
Automatic fallback to standard conversion if zero-copy fails

`pandas_to_polars`

def pandas_to_polars(
    df: pd.DataFrame,
    *,
    rechunk: bool = False
) -> pl.DataFrame

Convert a pandas DataFrame to polars using zero-copy Arrow conversion. This function uses pl.from_pandas() with Arrow as the interchange format, avoiding memory copies when possible. Parameters:

df (pd.DataFrame): pandas DataFrame to convert. Can contain any pandas-supported dtype including nullable types, categoricals, and datetime types.
rechunk (bool, optional): Whether to rechunk the resulting polars DataFrame for optimal memory layout. Default is False to preserve zero-copy semantics. Set to True if you plan to perform many operations on the polars DataFrame and want contiguous memory.

Returns:

pl.DataFrame: polars DataFrame with equivalent data and schema

Raises:

ImportError: If polars is not installed in the environment
ValueError: If the conversion fails due to incompatible data types or corrupted data

Performance Characteristics:

Time complexity: O(1) for zero-copy conversion (when rechunk=False), O(n) when rechunking
Memory overhead: Minimal (shares memory with source DataFrame via Arrow)
Typical performance: ~50ms for 1M rows with zero-copy

Example - Basic Conversion:

from tif1.core_utils.backend_conversion import pandas_to_polars
import pandas as pd

# Create a pandas DataFrame with various dtypes
df_pandas = pd.DataFrame({
    "driver": ["VER", "HAM", "LEC"],
    "lap_time": [90.123, 90.456, 90.789],
    "position": [1, 2, 3],
    "compound": pd.Categorical(["SOFT", "MEDIUM", "SOFT"])
})

# Zero-copy conversion to polars
df_polars = pandas_to_polars(df_pandas)
print(type(df_polars))  # <class 'polars.dataframe.frame.DataFrame'>

Example - With Rechunking:

# Convert with rechunking for optimal polars performance
df_polars = pandas_to_polars(df_pandas, rechunk=True)

# Now polars operations will be faster due to contiguous memory
fast_laps = df_polars.filter(pl.col("lap_time") < 90.5)

Example - Handling Large DataFrames:

import pandas as pd
from tif1.core_utils.backend_conversion import pandas_to_polars

# Large DataFrame (10M rows)
df_large = pd.DataFrame({
    "time": pd.date_range("2024-01-01", periods=10_000_000, freq="ms"),
    "speed": np.random.uniform(200, 350, 10_000_000),
    "throttle": np.random.uniform(0, 100, 10_000_000)
})

# Zero-copy conversion is extremely fast even for large data
import time
start = time.time()
df_polars = pandas_to_polars(df_large)
print(f"Converted 10M rows in {time.time() - start:.3f}s")  # ~0.1s

Use rechunk=False (default) when you need to convert data quickly and will only perform a few operations. Use rechunk=True when you plan to perform many polars operations and want optimal performance.

`polars_to_pandas`

def polars_to_pandas(
    df: pl.DataFrame,
    *,
    use_pyarrow: bool = True
) -> pd.DataFrame

Convert a polars DataFrame to pandas using zero-copy Arrow conversion. This function uses df.to_pandas() with PyArrow extension arrays to maintain Arrow memory layout in pandas, enabling zero-copy semantics. Parameters:

df (pl.DataFrame): polars DataFrame to convert. Supports all polars data types including nested types (List, Struct), categoricals, and temporal types.
use_pyarrow (bool, optional): Whether to use PyArrow extension arrays in the resulting pandas DataFrame. Default is True for zero-copy conversion. Set to False to convert to native pandas dtypes (slower but more compatible with legacy pandas code).

Returns:

pd.DataFrame: pandas DataFrame with equivalent data and schema

Raises:

ImportError: If polars is not installed in the environment
ValueError: If the conversion fails due to incompatible data types

Performance Characteristics:

Time complexity: O(1) for zero-copy (when use_pyarrow=True), O(n) for native pandas dtypes
Memory overhead: Minimal with PyArrow arrays, 2x memory usage with native dtypes
Typical performance: ~100ms for 1M rows with PyArrow

Example - Basic Conversion:

from tif1.core_utils.backend_conversion import polars_to_pandas
import polars as pl

# Create a polars DataFrame
df_polars = pl.DataFrame({
    "driver": ["VER", "HAM", "LEC"],
    "lap_time": [90.123, 90.456, 90.789],
    "position": [1, 2, 3]
})

# Zero-copy conversion to pandas with PyArrow arrays
df_pandas = polars_to_pandas(df_polars)
print(df_pandas.dtypes)
# driver       string[pyarrow]
# lap_time     double[pyarrow]
# position     int64[pyarrow]

Example - Native Pandas Dtypes:

# Convert to native pandas dtypes (slower, more memory)
df_pandas_native = polars_to_pandas(df_polars, use_pyarrow=False)
print(df_pandas_native.dtypes)
# driver       object
# lap_time     float64
# position     int64

Example - Handling Nested Types:

import polars as pl
from tif1.core_utils.backend_conversion import polars_to_pandas

# Polars DataFrame with nested types
df_nested = pl.DataFrame({
    "driver": ["VER", "HAM"],
    "sector_times": [[28.1, 35.2, 26.8], [28.3, 35.5, 27.0]],
    "metadata": [{"team": "Red Bull", "car": 1}, {"team": "Mercedes", "car": 44}]
})

# PyArrow handles nested types seamlessly
df_pandas = polars_to_pandas(df_nested)
print(df_pandas["sector_times"].iloc[0])  # [28.1, 35.2, 26.8]

When use_pyarrow=True, the resulting pandas DataFrame uses PyArrow extension arrays. Some legacy pandas operations may not support these arrays. If you encounter compatibility issues, set use_pyarrow=False to use native pandas dtypes.

`convert_backend`

def convert_backend(
    df: DataFrame,
    target_backend: str
) -> DataFrame

Intelligently convert a DataFrame to the target backend (pandas or polars) using zero-copy conversion when possible. This is the high-level API that automatically detects the source backend and performs the appropriate conversion. Parameters:

df (DataFrame): DataFrame to convert. Can be either pd.DataFrame or pl.DataFrame.
target_backend (str): Target backend name. Must be either "pandas" or "polars" (case-sensitive).

Returns:

DataFrame: DataFrame in the target backend format. If already in the target backend, returns the input DataFrame unchanged (no-op).

Raises:

ValueError: If target_backend is not “pandas” or “polars”, or if the conversion fails
ImportError: If polars is not installed and target is “polars”

Performance Characteristics:

No-op detection: O(1) type check to avoid unnecessary conversions
Conversion time: Same as pandas_to_polars or polars_to_pandas
Memory efficient: Uses zero-copy conversion internally

Example - Basic Usage:

from tif1.core_utils.backend_conversion import convert_backend
import pandas as pd

# Start with pandas
df = pd.DataFrame({"driver": ["VER", "HAM"], "position": [1, 2]})

# Convert to polars
df_polars = convert_backend(df, "polars")
print(type(df_polars))  # <class 'polars.dataframe.frame.DataFrame'>

# Convert back to pandas
df_pandas = convert_backend(df_polars, "pandas")
print(type(df_pandas))  # <class 'pandas.core.frame.DataFrame'>

# No-op: already pandas
df_same = convert_backend(df_pandas, "pandas")
assert df_same is df_pandas  # Same object, no conversion

Example - Integration with tif1:

import tif1
from tif1.core_utils.backend_conversion import convert_backend

# Load session with pandas backend
session = tif1.get_session(2024, "Monaco", "Race", backend="pandas")
laps_pandas = session.laps

# Convert to polars for faster filtering
laps_polars = convert_backend(laps_pandas, "polars")

# Perform polars operations
import polars as pl
verstappen_laps = laps_polars.filter(pl.col("Driver") == "VER")
fast_laps = verstappen_laps.filter(pl.col("LapTimeSeconds") < 75.0)

# Convert back to pandas for plotting
fast_laps_pandas = convert_backend(fast_laps, "pandas")

Example - Error Handling:

from tif1.core_utils.backend_conversion import convert_backend

try:
    df_converted = convert_backend(df, "spark")  # Invalid backend
except ValueError as e:
    print(e)  # "Invalid target_backend: spark. Must be 'pandas' or 'polars'"

try:
    df_polars = convert_backend(df, "polars")  # polars not installed
except ImportError as e:
    print(e)  # "polars is not installed"

Use convert_backend as the primary API for backend conversion. It handles edge cases, performs no-op detection, and provides consistent error messages.

JSON Utilities

The JSON utilities module provides high-performance JSON parsing and serialization using orjson, a fast, correct JSON library for Python written in Rust. This module is critical for tif1’s performance as it handles parsing of large JSON payloads from the CDN containing lap data, telemetry, and race control messages.

Why orjson?

orjson provides significant advantages over Python’s standard library json module:

2-3x faster parsing: Rust-based implementation with SIMD optimizations
Lower memory usage: Efficient memory allocation and reuse
Native bytes support: Parse directly from HTTP response bodies without decoding
NumPy integration: Automatic handling of NumPy types during serialization
Strict correctness: Validates JSON spec compliance

Benchmark comparison (parsing 10MB JSON payload):

stdlib json.loads():  450ms
orjson.loads():       180ms  (2.5x faster)

Architecture

The JSON utilities module implements a fallback strategy:

Primary: Use orjson for maximum performance
Fallback: Use stdlib json if orjson fails (rare edge cases)
Automatic: No user configuration required

This ensures reliability while maintaining performance for the common case.

`json_loads`

def json_loads(
    payload: str | bytes | bytearray | memoryview
) -> Any

Deserialize JSON payload to Python object using accelerated orjson codec with automatic fallback to stdlib json. Parameters:

payload (str | bytes | bytearray | memoryview): JSON data to parse. Accepts multiple input types for flexibility:
- str: Standard JSON string (UTF-8 encoded)
- bytes: Raw bytes from HTTP responses (most efficient)
- bytearray: Mutable byte array
- memoryview: Zero-copy view of bytes (converted to bytes internally)

Returns:

Any: Parsed Python object. Common return types:
- dict: JSON objects {}
- list: JSON arrays []
- str, int, float, bool, None: JSON primitives

Raises:

json.JSONDecodeError: If the payload is not valid JSON (from fallback parser)
ValueError: If the payload is malformed (from orjson)

Performance Characteristics:

Time complexity: O(n) where n is the payload size
Memory overhead: Minimal (orjson uses efficient allocation)
Typical performance:
- Small payloads (<1KB): ~10μs
- Medium payloads (100KB): ~2ms
- Large payloads (10MB): ~180ms

Example - Basic Usage:

from tif1.core_utils.json_utils import json_loads

# Parse JSON string
json_str = '{"driver": "VER", "team": "Red Bull Racing", "position": 1}'
data = json_loads(json_str)
print(data["driver"])  # "VER"
print(data["position"])  # 1

# Parse JSON bytes (common from HTTP responses)
json_bytes = b'{"lap": 1, "time": 90.123}'
lap_data = json_loads(json_bytes)
print(lap_data["time"])  # 90.123

Example - Parsing CDN Response:

import niquests
from tif1.core_utils.json_utils import json_loads

# Fetch lap data from CDN
response = niquests.get("https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/...")
# Parse directly from response bytes (no decode step)
lap_data = json_loads(response.content)

# Process lap data
for lap in lap_data["laps"]:
    print(f"Lap {lap['lap']}: {lap['time']}s")

Example - Handling Complex Nested Data:

from tif1.core_utils.json_utils import json_loads

# Complex nested JSON structure
complex_json = '''
{
  "session": {
    "year": 2024,
    "gp": "Monaco",
    "type": "Race"
  },
  "laps": [
    {
      "driver": "VER",
      "lap": 1,
      "sectors": [28.1, 35.2, 26.8],
      "telemetry": {
        "speed": [250, 280, 310],
        "throttle": [100, 100, 100]
      }
    }
  ]
}
'''

data = json_loads(complex_json)
print(data["session"]["gp"])  # "Monaco"
print(data["laps"][0]["sectors"])  # [28.1, 35.2, 26.8]
print(data["laps"][0]["telemetry"]["speed"])  # [250, 280, 310]

Example - Performance Comparison:

import json
import time
from tif1.core_utils.json_utils import json_loads

# Large JSON payload (10MB)
large_json = json.dumps({"data": list(range(1_000_000))})

# Benchmark stdlib json
start = time.perf_counter()
data1 = json.loads(large_json)
stdlib_time = time.perf_counter() - start

# Benchmark orjson via json_loads
start = time.perf_counter()
data2 = json_loads(large_json)
orjson_time = time.perf_counter() - start

print(f"stdlib json: {stdlib_time:.3f}s")
print(f"orjson:      {orjson_time:.3f}s")
print(f"Speedup:     {stdlib_time / orjson_time:.2f}x")
# Output:
# stdlib json: 0.450s
# orjson:      0.180s
# Speedup:     2.50x

When fetching JSON from HTTP endpoints, pass response.content (bytes) directly to json_loads instead of response.text (str). This avoids an unnecessary UTF-8 decode step and is faster.

`json_dumps`

def json_dumps(data: Any) -> str

Serialize Python object to JSON string using accelerated orjson codec with automatic fallback to stdlib json. Parameters:

data (Any): Python object to serialize. Supported types:
- dict, list: Collections
- str, int, float, bool, None: Primitives
- datetime, date, time: Temporal types (ISO 8601 format)
- UUID: Converted to string
- numpy types: Automatically converted to Python equivalents
- dataclasses, pydantic models: Serialized to dict

Returns:

str: JSON string representation of the data

Raises:

TypeError: If the data contains non-serializable types (e.g., custom classes without __dict__)

Performance Characteristics:

Time complexity: O(n) where n is the data size
Memory overhead: Minimal (efficient string building)
Typical performance:
- Small objects (<1KB): ~5μs
- Medium objects (100KB): ~1ms
- Large objects (10MB): ~100ms

Example - Basic Usage:

from tif1.core_utils.json_utils import json_dumps

# Serialize dict
data = {
    "driver": "VER",
    "team": "Red Bull Racing",
    "position": 1,
    "lap_time": 90.123
}
json_str = json_dumps(data)
print(json_str)
# {"driver":"VER","team":"Red Bull Racing","position":1,"lap_time":90.123}

Example - Serializing NumPy Types:

import numpy as np
from tif1.core_utils.json_utils import json_dumps

# NumPy arrays and types are automatically handled
data = {
    "speeds": np.array([250, 280, 310]),
    "avg_speed": np.float64(280.5),
    "lap_count": np.int32(58)
}

json_str = json_dumps(data)
print(json_str)
# {"speeds":[250,280,310],"avg_speed":280.5,"lap_count":58}

Example - Serializing Datetime Objects:

from datetime import datetime, date
from tif1.core_utils.json_utils import json_dumps

data = {
    "session_date": date(2024, 5, 26),
    "session_start": datetime(2024, 5, 26, 15, 0, 0),
    "driver": "VER"
}

json_str = json_dumps(data)
print(json_str)
# {"session_date":"2024-05-26","session_start":"2024-05-26T15:00:00","driver":"VER"}

Example - Round-trip Serialization:

from tif1.core_utils.json_utils import json_loads, json_dumps

# Original data
original = {
    "drivers": ["VER", "HAM", "LEC"],
    "lap_times": [90.1, 90.3, 90.5],
    "metadata": {"session": "Race", "year": 2024}
}

# Serialize to JSON
json_str = json_dumps(original)

# Deserialize back to Python
restored = json_loads(json_str)

# Verify round-trip
assert original == restored
print("Round-trip successful!")

json_dumps uses orjson for 2-3x faster serialization than stdlib json. It automatically handles NumPy types, datetime objects, and other common Python types without requiring custom encoders.

`parse_response_json`

def parse_response_json(response: Any) -> Any

Decode an HTTP response body to Python object, preferring raw-byte parsing when available for maximum performance. This function is optimized for parsing JSON from HTTP responses (niquests, requests, httpx, etc.). Parameters:

response (Any): HTTP response object. Should have either:
- .content attribute (bytes): Preferred for performance
- .json() method: Fallback for compatibility

Returns:

Any: Parsed Python object from the response JSON body

Raises:

json.JSONDecodeError: If the response body is not valid JSON
AttributeError: If the response object has neither .content nor .json()

Performance Characteristics:

Optimized path: Parse from .content bytes using orjson (~180ms for 10MB)
Fallback path: Use .json() method (~450ms for 10MB)
Speedup: 2-3x faster than calling .json() directly

Example - Parsing niquests Response:

import niquests
from tif1.core_utils.json_utils import parse_response_json

# Fetch data from CDN
response = niquests.get("https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/...")

# Parse JSON efficiently (uses response.content bytes)
data = parse_response_json(response)

# Process data
print(f"Loaded {len(data['laps'])} laps")

Example - Comparing Performance:

import niquests
import time
from tif1.core_utils.json_utils import parse_response_json

url = "https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/..."
response = niquests.get(url)

# Method 1: Standard .json() method
start = time.perf_counter()
data1 = response.json()
method1_time = time.perf_counter() - start

# Method 2: Optimized parse_response_json
start = time.perf_counter()
data2 = parse_response_json(response)
method2_time = time.perf_counter() - start

print(f"response.json():        {method1_time:.3f}s")
print(f"parse_response_json():  {method2_time:.3f}s")
print(f"Speedup:                {method1_time / method2_time:.2f}x")
# Output:
# response.json():        0.045s
# parse_response_json():  0.018s
# Speedup:                2.50x

Example - Integration with tif1 HTTP Session:

from tif1.http_session import HTTPSession
from tif1.core_utils.json_utils import parse_response_json

# Create HTTP session
session = HTTPSession()

# Fetch lap data
response = session.get("https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/...")

# Parse efficiently
lap_data = parse_response_json(response)

# Process laps
for lap in lap_data["laps"]:
    print(f"Driver {lap['driver']}: Lap {lap['lap']} - {lap['time']}s")

Always use parse_response_json() instead of response.json() when working with HTTP responses in performance-critical code. The 2-3x speedup adds up quickly when fetching data for multiple sessions or drivers.

JSON Utilities Best Practices

Use bytes when possible: Pass response.content to json_loads instead of response.text
Prefer parse_response_json: Use parse_response_json() for HTTP responses
Trust the fallback: The automatic fallback to stdlib json ensures reliability
Benchmark your use case: Profile your specific JSON payloads to measure impact
Handle errors gracefully: Catch json.JSONDecodeError for malformed JSON

Common Pitfalls:

❌ json.loads(response.text) - Slow (decode + parse)
❌ response.json() - Slower (uses stdlib json)
✅ json_loads(response.content) - Fast (direct bytes parse)
✅ parse_response_json(response) - Fastest (optimized path)

Helper Functions

The helpers module (helpers.py) provides a comprehensive suite of utility functions for DataFrame manipulation, validation, and data transformation. These functions are the workhorses of tif1’s data processing pipeline, handling everything from input validation to complex DataFrame operations across both pandas and polars backends.

Design Philosophy

The helper functions follow these core principles:

Backend Agnostic: All functions work seamlessly with both pandas and polars DataFrames
Zero-Copy Optimization: Minimize memory allocations and avoid unnecessary data copies
Type Safety: Comprehensive validation with clear error messages
Performance First: Optimized hot paths for common operations
Defensive Programming: Handle edge cases gracefully with fallback strategies

Validation Functions

Input validation is critical for data integrity and user experience. The validation helpers provide comprehensive checks with informative error messages.

`_validate_year`

def _validate_year(year: int, min_year: int, max_year: int) -> None

Validate that a year is within the supported range for F1 data. This function is called before any data fetching operations to ensure the requested year has available data. Parameters:

year (int): Year to validate. Must be an integer representing a calendar year.
min_year (int): Minimum supported year (inclusive). Typically 2018 for tif1.
max_year (int): Maximum supported year (inclusive). Typically the current year + 1 for future scheduled races.

Raises:

ValueError: If year is outside the range [min_year, max_year]. Error message includes the valid range and the invalid value provided.

Performance:

Time complexity: O(1) - simple integer comparison
Typical execution: <1μs

Example - Basic Validation:

from tif1.core_utils.helpers import _validate_year
from tif1.core_utils.constants import MIN_YEAR, MAX_YEAR

# Valid year
_validate_year(2024, MIN_YEAR, MAX_YEAR)  # No error

# Invalid year (too old)
try:
    _validate_year(2010, MIN_YEAR, MAX_YEAR)
except ValueError as e:
    print(e)  # "Year must be between 2018 and 2100, got 2010"

# Invalid year (future)
try:
    _validate_year(2150, MIN_YEAR, MAX_YEAR)
except ValueError as e:
    print(e)  # "Year must be between 2018 and 2100, got 2150"

Example - Integration with Session Loading:

from tif1.core_utils.helpers import _validate_year
from tif1.core_utils.constants import MIN_YEAR, MAX_YEAR

def load_session(year: int, gp: str, session_type: str):
    """Load session with year validation."""
    # Validate year before expensive network operations
    _validate_year(year, MIN_YEAR, MAX_YEAR)

    # Proceed with data loading...
    # (validation ensures we don't waste time on invalid requests)

This validation is performed early in the data loading pipeline to fail fast and provide clear feedback before any network requests are made.

`_validate_drivers_list`

def _validate_drivers_list(drivers: list[str] | None) -> None

Validate that a drivers list parameter is well-formed and contains valid driver codes. This function ensures that driver filtering operations receive properly formatted input. Parameters:

drivers (list[str] | None): List of driver codes to validate. Each code should be a non-empty string (typically 3-letter abbreviations like “VER”, “HAM”, “LEC”). Can be None to indicate no filtering.

Raises:

TypeError: If drivers is not a list or None. Error message includes the actual type received.
ValueError: If the list is empty, or if any element is not a non-empty string. Error messages are specific to the validation failure.

Validation Rules:

Must be a list type (not tuple, set, or other iterable)
Cannot be an empty list (use None instead to indicate “all drivers”)
All elements must be strings
All strings must be non-empty (no empty strings or whitespace-only strings)

Performance:

Time complexity: O(n) where n is the number of drivers
Typical execution: <10μs for 20 drivers

Example - Valid Driver Lists:

from tif1.core_utils.helpers import _validate_drivers_list

# Valid: None (no filtering)
_validate_drivers_list(None)  # OK

# Valid: Single driver
_validate_drivers_list(["VER"])  # OK

# Valid: Multiple drivers
_validate_drivers_list(["VER", "HAM", "LEC"])  # OK

# Valid: All drivers in a race
all_drivers = ["VER", "PER", "HAM", "RUS", "LEC", "SAI", "NOR", "PIA",
               "ALO", "STR", "OCO", "GAS", "ALB", "SAR", "TSU", "RIC",
               "BOT", "ZHO", "MAG", "HUL"]
_validate_drivers_list(all_drivers)  # OK

Example - Invalid Driver Lists:

from tif1.core_utils.helpers import _validate_drivers_list

# Invalid: Empty list
try:
    _validate_drivers_list([])
except ValueError as e:
    print(e)  # "drivers list cannot be empty"

# Invalid: Wrong type (tuple instead of list)
try:
    _validate_drivers_list(("VER", "HAM"))
except TypeError as e:
    print(e)  # "drivers must be a list, got tuple"

# Invalid: Contains empty string
try:
    _validate_drivers_list(["VER", "", "HAM"])
except ValueError as e:
    print(e)  # "drivers must be a list of non-empty strings"

# Invalid: Contains non-string
try:
    _validate_drivers_list(["VER", 33, "HAM"])
except ValueError as e:
    print(e)  # "drivers must be a list of non-empty strings"

Example - Integration with Data Filtering:

from tif1.core_utils.helpers import _validate_drivers_list

def get_driver_laps(laps_df, drivers: list[str] | None):
    """Filter laps DataFrame by drivers with validation."""
    # Validate input before filtering
    _validate_drivers_list(drivers)

    if drivers is None:
        return laps_df  # No filtering

    # Safe to filter now
    return laps_df[laps_df["Driver"].isin(drivers)]

Use None instead of an empty list when you want to indicate “all drivers”. An empty list is considered an error because it would result in an empty DataFrame, which is likely unintentional.

`_validate_lap_number`

def _validate_lap_number(lap_number: int) -> None

Validate that a lap number is a positive integer. Lap numbers in F1 start from 1 (not 0), so this function ensures the value is valid for lap-based operations. Parameters:

lap_number (int): Lap number to validate. Must be a positive integer (>= 1).

Raises:

TypeError: If lap_number is not an integer. Error message includes the actual type received.
ValueError: If lap_number is zero or negative. Error message includes the invalid value.

Validation Rules:

Must be an integer type (not float, string, or other numeric type)
Must be positive (>= 1)

Performance:

Time complexity: O(1) - simple type check and comparison
Typical execution: <1μs

Example - Valid Lap Numbers:

from tif1.core_utils.helpers import _validate_lap_number

# Valid: First lap
_validate_lap_number(1)  # OK

# Valid: Mid-race lap
_validate_lap_number(35)  # OK

# Valid: Final lap of a typical race
_validate_lap_number(58)  # OK

Example - Invalid Lap Numbers:

from tif1.core_utils.helpers import _validate_lap_number

# Invalid: Zero
try:
    _validate_lap_number(0)
except ValueError as e:
    print(e)  # "lap_number must be positive, got 0"

# Invalid: Negative
try:
    _validate_lap_number(-5)
except ValueError as e:
    print(e)  # "lap_number must be positive, got -5"

# Invalid: Float (even if positive)
try:
    _validate_lap_number(10.5)
except TypeError as e:
    print(e)  # "lap_number must be an integer, got float"

# Invalid: String
try:
    _validate_lap_number("10")
except TypeError as e:
    print(e)  # "lap_number must be an integer, got str"

Example - Integration with Telemetry Loading:

from tif1.core_utils.helpers import _validate_lap_number

def get_lap_telemetry(driver: str, lap_number: int):
    """Load telemetry for a specific lap with validation."""
    # Validate lap number before expensive operations
    _validate_lap_number(lap_number)

    # Proceed with telemetry loading...
    # (validation ensures we don't request invalid lap data)

Lap numbers in F1 start from 1, not 0. If you’re iterating over laps, use range(1, num_laps + 1) instead of range(num_laps).

`_validate_string_param`

def _validate_string_param(param: str, param_name: str) -> None

Validate that a string parameter is non-empty and properly formatted. This generic validation function is used throughout the library for string inputs like GP names, session types, driver codes, etc. Parameters:

param (str): String parameter to validate. Must be a non-empty string with at least one non-whitespace character.
param_name (str): Human-readable parameter name for error messages. Used to provide context in error messages (e.g., “gp”, “session_type”, “driver”).

Raises:

TypeError: If param is not a string. Error message includes the parameter name and actual type received.
ValueError: If param is empty or contains only whitespace. Error message includes the parameter name.

Validation Rules:

Must be a string type (not int, None, or other type)
Cannot be empty string ""
Cannot be whitespace-only (e.g., " ", "\t", "\n")

Performance:

Time complexity: O(n) where n is the string length (for whitespace check)
Typical execution: <5μs for typical parameter lengths

Example - Valid String Parameters:

from tif1.core_utils.helpers import _validate_string_param

# Valid: GP name
_validate_string_param("Monaco", "gp")  # OK

# Valid: Session type
_validate_string_param("Race", "session_type")  # OK

# Valid: Driver code
_validate_string_param("VER", "driver")  # OK

# Valid: String with spaces
_validate_string_param("Belgian Grand Prix", "gp")  # OK

Example - Invalid String Parameters:

from tif1.core_utils.helpers import _validate_string_param

# Invalid: Empty string
try:
    _validate_string_param("", "gp")
except ValueError as e:
    print(e)  # "gp cannot be empty"

# Invalid: Whitespace only
try:
    _validate_string_param("   ", "session_type")
except ValueError as e:
    print(e)  # "session_type cannot be empty"

# Invalid: None
try:
    _validate_string_param(None, "driver")
except TypeError as e:
    print(e)  # "driver must be a string, got NoneType"

# Invalid: Integer
try:
    _validate_string_param(2024, "year")
except TypeError as e:
    print(e)  # "year must be a string, got int"

Example - Integration with API Functions:

from tif1.core_utils.helpers import _validate_string_param

def get_session(year: int, gp: str, session_type: str):
    """Load session with comprehensive validation."""
    # Validate string parameters
    _validate_string_param(gp, "gp")
    _validate_string_param(session_type, "session_type")

    # Proceed with session loading...
    # (validation ensures we have valid input before processing)

This function uses .strip() to check for whitespace-only strings, ensuring that parameters like " " are rejected even though they have non-zero length.

URL Encoding

`_encode_url_component`

@lru_cache(maxsize=1024)
def _encode_url_component(component: str) -> str

URL-encode a string component for safe use in CDN URLs. This function properly encodes special characters, spaces, and international characters according to RFC 3986, ensuring that URLs are valid and don’t break when passed to HTTP clients. Parameters:

component (str): String to encode. Can contain any Unicode characters, spaces, or special characters.

Returns:

str: URL-encoded string with all special characters percent-encoded (e.g., space becomes %20, & becomes %26).

Caching:

Uses @lru_cache(maxsize=1024) for memoization
Repeated calls with the same input return cached results instantly
Cache size of 1024 is sufficient for typical usage (GP names, session types, etc.)

Performance:

Time complexity: O(n) for first call, O(1) for cached calls
Typical execution:
- First call: ~10μs
- Cached call: <1μs (cache lookup)

Example - Basic URL Encoding:

from tif1.core_utils.helpers import _encode_url_component

# Encode GP name with spaces
encoded = _encode_url_component("Belgian Grand Prix")
print(encoded)  # "Belgian%20Grand%20Prix"

# Encode special characters
encoded = _encode_url_component("São Paulo")
print(encoded)  # "S%C3%A3o%20Paulo"

# Encode session type
encoded = _encode_url_component("Practice 1")
print(encoded)  # "Practice%201"

Example - Building CDN URLs:

from tif1.core_utils.helpers import _encode_url_component

def build_cdn_url(year: int, gp: str, session_type: str, data_type: str) -> str:
    """Build CDN URL with proper encoding."""
    base_url = f"https://cdn.jsdelivr.net/gh/TracingInsights/{year}@main"

    # Encode components for URL safety
    gp_encoded = _encode_url_component(gp)
    session_encoded = _encode_url_component(session_type)

    return f"{base_url}/{gp_encoded}/{session_encoded}/{data_type}.json"

# Example usage
url = build_cdn_url(2024, "Belgian Grand Prix", "Race", "laps")
print(url)
# https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/Belgian%20Grand%20Prix/Race/laps.json

Example - Cache Performance:

import time
from tif1.core_utils.helpers import _encode_url_component

# First call: performs encoding
start = time.perf_counter()
result1 = _encode_url_component("Belgian Grand Prix")
first_call_time = time.perf_counter() - start

# Second call: returns cached result
start = time.perf_counter()
result2 = _encode_url_component("Belgian Grand Prix")
cached_call_time = time.perf_counter() - start

print(f"First call:  {first_call_time * 1e6:.1f}μs")  # ~10μs
print(f"Cached call: {cached_call_time * 1e6:.1f}μs")  # &lt;1μs
print(f"Speedup:     {first_call_time / cached_call_time:.0f}x")  # ~10-20x

Example - Handling International Characters:

from tif1.core_utils.helpers import _encode_url_component

# International GP names
gp_names = [
    "São Paulo",      # Portuguese
    "México",         # Spanish
    "Montréal",       # French
    "Österreich",     # German
]

for gp in gp_names:
    encoded = _encode_url_component(gp)
    print(f"{gp:15} → {encoded}")

# Output:
# São Paulo       → S%C3%A3o%20Paulo
# México          → M%C3%A9xico
# Montréal        → Montr%C3%A9al
# Österreich      → %C3%96sterreich

The LRU cache makes repeated URL encoding operations essentially free. Since GP names and session types are reused frequently, the cache hit rate is typically >95% in real-world usage.

DataFrame Utility Functions

These functions provide backend-agnostic operations for DataFrame manipulation, enabling seamless work with both pandas and polars.

`_is_empty_df`

def _is_empty_df(df, lib: str) -> bool

Check if a DataFrame-like object is empty, working across both pandas and polars backends. This function handles the different APIs for checking emptiness and provides a unified interface. Parameters:

df: DataFrame-like object to check. Can be pd.DataFrame, pl.DataFrame, or any object with .empty or .is_empty() attributes.
lib (str): Backend library name ("pandas" or "polars"). Used as a hint for optimization, but the function also performs runtime type checking.

Returns:

bool: True if the DataFrame is empty (zero rows), False otherwise.

Detection Strategy: The function uses a multi-layered approach to handle various DataFrame types:

Type-based detection: Check isinstance(df, pd.DataFrame) or isinstance(df, pl.DataFrame)
Attribute-based detection: Check for .empty (pandas) or .is_empty() (polars)
Fallback: Use len(df) == 0 as last resort

Performance:

Time complexity: O(1) - all checks are constant time
Typical execution: <1μs

Example - Basic Usage:

from tif1.core_utils.helpers import _is_empty_df
import pandas as pd
import polars as pl

# Empty pandas DataFrame
df_empty_pd = pd.DataFrame()
print(_is_empty_df(df_empty_pd, "pandas"))  # True

# Non-empty pandas DataFrame
df_pd = pd.DataFrame({"driver": ["VER", "HAM"]})
print(_is_empty_df(df_pd, "pandas"))  # False

# Empty polars DataFrame
df_empty_pl = pl.DataFrame()
print(_is_empty_df(df_empty_pl, "polars"))  # True

# Non-empty polars DataFrame
df_pl = pl.DataFrame({"driver": ["VER", "HAM"]})
print(_is_empty_df(df_pl, "polars"))  # False

Example - Integration with Data Loading:

from tif1.core_utils.helpers import _is_empty_df, _create_empty_df

def load_laps_with_fallback(year: int, gp: str, session: str, lib: str):
    """Load laps with empty DataFrame fallback."""
    try:
        laps = fetch_laps_from_cdn(year, gp, session, lib)

        # Check if we got data
        if _is_empty_df(laps, lib):
            logger.warning(f"No lap data found for {year} {gp} {session}")
            return _create_empty_df(lib)

        return laps
    except Exception as e:
        logger.error(f"Failed to load laps: {e}")
        return _create_empty_df(lib)

Example - Conditional Processing:

from tif1.core_utils.helpers import _is_empty_df

def process_telemetry(telemetry_df, lib: str):
    """Process telemetry data if available."""
    # Skip processing if no data
    if _is_empty_df(telemetry_df, lib):
        print("No telemetry data to process")
        return None

    # Proceed with processing
    print(f"Processing {len(telemetry_df)} telemetry samples")
    # ... processing logic ...

This function prefers concrete type checking over the lib parameter because some code paths can surface pandas DataFrames even when the configured backend is polars (e.g., during backend conversion).

`_create_empty_df`

def _create_empty_df(lib: str)

Create an empty DataFrame for the specified backend. This function provides a consistent way to create empty DataFrames across backends, useful for fallback scenarios and initialization. Parameters:

lib (str): Backend library name. Must be either "pandas" or "polars".

Returns:

pd.DataFrame if lib == "pandas"
pl.DataFrame if lib == "polars" and polars is available
pd.DataFrame as fallback if polars is requested but not installed

Performance:

Time complexity: O(1) - creates empty structure
Typical execution: <10μs

Example - Basic Usage:

from tif1.core_utils.helpers import _create_empty_df

# Create empty pandas DataFrame
df_pd = _create_empty_df("pandas")
print(type(df_pd))  # <class 'pandas.core.frame.DataFrame'>
print(len(df_pd))   # 0

# Create empty polars DataFrame
df_pl = _create_empty_df("polars")
print(type(df_pl))  # <class 'polars.dataframe.frame.DataFrame'>
print(len(df_pl))   # 0

Example - Fallback Pattern:

from tif1.core_utils.helpers import _create_empty_df, _is_empty_df

def get_driver_laps(session, driver: str, lib: str):
    """Get laps for a driver with empty DataFrame fallback."""
    try:
        laps = session.laps

        # Filter by driver
        if lib == "polars":
            import polars as pl
            driver_laps = laps.filter(pl.col("Driver") == driver)
        else:
            driver_laps = laps[laps["Driver"] == driver]

        # Return empty DataFrame if no laps found
        if _is_empty_df(driver_laps, lib):
            return _create_empty_df(lib)

        return driver_laps
    except Exception as e:
        logger.error(f"Error getting driver laps: {e}")
        return _create_empty_df(lib)

Example - Initialization:

from tif1.core_utils.helpers import _create_empty_df

class SessionData:
    """Container for session data."""

    def __init__(self, lib: str = "pandas"):
        self.lib = lib
        # Initialize with empty DataFrames
        self.laps = _create_empty_df(lib)
        self.telemetry = _create_empty_df(lib)
        self.weather = _create_empty_df(lib)
        self.messages = _create_empty_df(lib)

    def is_loaded(self) -> bool:
        """Check if any data has been loaded."""
        from tif1.core_utils.helpers import _is_empty_df
        return not all([
            _is_empty_df(self.laps, self.lib),
            _is_empty_df(self.telemetry, self.lib),
            _is_empty_df(self.weather, self.lib),
            _is_empty_df(self.messages, self.lib),
        ])

Use _create_empty_df() instead of pd.DataFrame() or pl.DataFrame() directly when you need backend-agnostic code. This ensures consistency and handles the polars availability check automatically.

`_filter_valid_laptimes`

def _filter_valid_laptimes(laps, lib: str)

Filter laps DataFrame to include only rows with valid lap times, and add a numeric LapTimeSeconds column for analysis. This function is critical for data quality, removing invalid laps (pit laps, out laps, deleted laps) and providing a consistent numeric representation of lap times. Parameters:

laps: Laps DataFrame (pandas or polars). Must contain a LapTime column.
lib (str): Backend library name ("pandas" or "polars").

Returns:

Filtered DataFrame with:
- Only rows where LapTime is valid (not null/NaN)
- New LapTimeSeconds column containing lap time as float (seconds)
- For pandas: LapTime converted to timedelta64[ns] dtype
- For polars: LapTime kept as original type, LapTimeSeconds added as Float64

Behavior:

Pandas: Converts LapTime to timedelta64[ns] and creates LapTimeSeconds as float
Polars: Casts LapTime to Float64 (non-strict) and aliases as LapTimeSeconds
Optimization: Minimizes copies by filtering before copying (pandas) or using lazy operations (polars)

Performance:

Time complexity: O(n) where n is the number of laps
Memory overhead: Minimal (single column addition)
Typical execution: ~5ms for 1000 laps

Example - Basic Usage:

from tif1.core_utils.helpers import _filter_valid_laptimes
import pandas as pd

# Raw laps data with some invalid times
laps = pd.DataFrame({
    "Driver": ["VER", "VER", "VER", "HAM"],
    "LapNumber": [1, 2, 3, 1],
    "LapTime": [95.5, 90.2, None, 91.0],  # Lap 3 is invalid
})

# Filter valid laps
valid_laps = _filter_valid_laptimes(laps, "pandas")

print(len(valid_laps))  # 3 (lap 3 removed)
print(valid_laps["LapTimeSeconds"].tolist())  # [95.5, 90.2, 91.0]
print(valid_laps["LapTime"].dtype)  # timedelta64[ns]

Example - Polars Backend:

from tif1.core_utils.helpers import _filter_valid_laptimes
import polars as pl

# Raw laps data
laps = pl.DataFrame({
    "Driver": ["VER", "VER", "VER"],
    "LapNumber": [1, 2, 3],
    "LapTime": ["95.5", "90.2", "89.8"],
})

# Filter and add numeric column
valid_laps = _filter_valid_laptimes(laps, "polars")

print(valid_laps["LapTimeSeconds"])
# ┌─────────────────┐
# │ LapTimeSeconds  │
# │ ---             │
# │ f64             │
# ╞═════════════════╡
# │ 95.5            │
# │ 90.2            │
# │ 89.8            │
# └─────────────────┘

Example - Handling Mixed Valid/Invalid Data:

from tif1.core_utils.helpers import _filter_valid_laptimes
import pandas as pd
import numpy as np

# Realistic race data with pit laps and deleted laps
laps = pd.DataFrame({
    "Driver": ["VER"] * 10,
    "LapNumber": range(1, 11),
    "LapTime": [
        95.5,    # Out lap (slow)
        90.2,    # Valid
        89.8,    # Valid
        None,    # Pit lap (no time)
        None,    # In lap (no time)
        92.1,    # Out lap after pit
        89.5,    # Valid
        89.3,    # Valid
        np.nan,  # Deleted lap
        90.0,    # Valid
    ],
})

# Filter to valid laps only
valid_laps = _filter_valid_laptimes(laps, "pandas")

print(f"Original laps: {len(laps)}")  # 10
print(f"Valid laps: {len(valid_laps)}")  # 7
print(f"Removed: {len(laps) - len(valid_laps)}")  # 3

Example - Performance Optimization:

import time
import pandas as pd
import numpy as np
from tif1.core_utils.helpers import _filter_valid_laptimes

# Large dataset (full race weekend, all drivers)
num_laps = 20_000  # ~20 drivers × ~1000 laps
laps = pd.DataFrame({
    "Driver": np.random.choice(["VER", "HAM", "LEC"], num_laps),
    "LapNumber": np.tile(range(1, 101), num_laps // 100),
    "LapTime": np.random.uniform(85, 95, num_laps),
})

# Add some invalid laps
invalid_indices = np.random.choice(num_laps, size=2000, replace=False)
laps.loc[invalid_indices, "LapTime"] = None

# Benchmark filtering
start = time.perf_counter()
valid_laps = _filter_valid_laptimes(laps, "pandas")
elapsed = time.perf_counter() - start

print(f"Filtered {len(laps)} laps in {elapsed*1000:.1f}ms")
print(f"Valid laps: {len(valid_laps)} ({len(valid_laps)/len(laps)*100:.1f}%)")
# Output:
# Filtered 20000 laps in 45.2ms
# Valid laps: 18000 (90.0%)

This function is called automatically during session loading. The LapTimeSeconds column is essential for numerical analysis, sorting, and filtering operations that require numeric comparison.

`_rename_columns`

def _rename_columns(df, rename_map: dict, lib: str)

Rename DataFrame columns based on a mapping dictionary, with intelligent handling of duplicates and backend-specific APIs. This function is used to transform CDN column names to user-facing FastF1-compatible names. Parameters:

df: DataFrame to rename (pandas or polars)
rename_map (dict): Mapping of old column names to new names. Use None as the value to drop a column.
lib (str): Backend library name ("pandas" or "polars")

Returns:

DataFrame with renamed columns. Columns mapped to None are dropped.

Special Handling:

Duplicate prevention: Skips renames that would create duplicate column names
Drop columns: Columns mapped to None are removed from the DataFrame
No-op renames: Skips renames where source == target (e.g., {"Driver": "Driver"})
Existing columns: Avoids conflicts when target name already exists independently

Performance:

Time complexity: O(n) where n is the number of columns
Memory overhead: Minimal (column metadata only, no data copy)
Typical execution: <1ms for typical DataFrames

Example - Basic Column Renaming:

from tif1.core_utils.helpers import _rename_columns
import pandas as pd

# Raw CDN data with abbreviated column names
df = pd.DataFrame({
    "drv": ["VER", "HAM"],
    "lap": [1, 1],
    "time": [90.5, 91.2],
    "s1": [28.1, 28.5],
})

# Rename to user-friendly names
rename_map = {
    "drv": "Driver",
    "lap": "LapNumber",
    "time": "LapTime",
    "s1": "Sector1Time",
}

df_renamed = _rename_columns(df, rename_map, "pandas")
print(df_renamed.columns.tolist())
# ['Driver', 'LapNumber', 'LapTime', 'Sector1Time']

Example - Dropping Columns:

from tif1.core_utils.helpers import _rename_columns
import pandas as pd

# Data with columns to drop
df = pd.DataFrame({
    "driver": ["VER", "HAM"],
    "lap": [1, 1],
    "internal_id": [12345, 12346],  # Internal field, not needed
    "debug_flag": [True, False],     # Debug field, not needed
})

# Rename and drop unwanted columns
rename_map = {
    "driver": "Driver",
    "lap": "LapNumber",
    "internal_id": None,  # Drop this column
    "debug_flag": None,   # Drop this column
}

df_clean = _rename_columns(df, rename_map, "pandas")
print(df_clean.columns.tolist())
# ['Driver', 'LapNumber']

Example - Handling Duplicate Conflicts:

from tif1.core_utils.helpers import _rename_columns
import pandas as pd

# DataFrame where target name already exists
df = pd.DataFrame({
    "drv": ["VER"],
    "Driver": ["HAM"],  # Target name already exists!
    "lap": [1],
})

# Attempt to rename drv → Driver
rename_map = {"drv": "Driver", "lap": "LapNumber"}

# Function skips the conflicting rename
df_renamed = _rename_columns(df, rename_map, "pandas")
print(df_renamed.columns.tolist())
# ['drv', 'Driver', 'LapNumber']  # 'drv' not renamed to avoid duplicate

Example - Polars Backend:

from tif1.core_utils.helpers import _rename_columns
import polars as pl

# Polars DataFrame
df = pl.DataFrame({
    "drv": ["VER", "HAM"],
    "lap": [1, 1],
    "time": [90.5, 91.2],
})

# Rename columns
rename_map = {
    "drv": "Driver",
    "lap": "LapNumber",
    "time": "LapTime",
}

df_renamed = _rename_columns(df, rename_map, "polars")
print(df_renamed.columns)
# ['Driver', 'LapNumber', 'LapTime']

Example - Integration with CDN Data Processing:

from tif1.core_utils.helpers import _rename_columns
from tif1.core_utils.constants import LAP_RENAME_MAP

def process_cdn_laps(raw_laps_df, lib: str):
    """Process raw CDN laps data to user-facing format."""
    # Rename columns from CDN format to FastF1 format
    laps = _rename_columns(raw_laps_df, LAP_RENAME_MAP, lib)

    # Now columns have user-friendly names
    # "drv" → "Driver"
    # "lap" → "LapNumber"
    # "time" → "LapTime"
    # etc.

    return laps

The function automatically handles edge cases like duplicate column names and missing columns, making it safe to use with varying CDN data formats across different years and sessions.

`_apply_categorical`

def _apply_categorical(df, cols: list, lib: str)

Apply categorical dtype to specified columns for memory optimization and faster operations. Categorical dtypes reduce memory usage by storing repeated string values as integer codes with a lookup table. Parameters:

df: DataFrame to modify (pandas or polars)
cols (list): List of column names to convert to categorical
lib (str): Backend library name ("pandas" or "polars")

Returns:

DataFrame with specified columns converted to categorical dtype

Benefits of Categorical Dtype:

Memory savings: 50-90% reduction for columns with low cardinality
Faster operations: Comparisons and groupby operations are faster
Preserved semantics: String operations still work as expected

Performance:

Time complexity: O(n × m) where n is rows and m is number of categorical columns
Memory savings: Typically 70-80% for Driver, Team, Compound columns
Typical execution: ~10ms for 1000 laps with 4 categorical columns

Example - Basic Usage:

from tif1.core_utils.helpers import _apply_categorical
import pandas as pd

# Laps data with repeated string values
laps = pd.DataFrame({
    "Driver": ["VER"] * 50 + ["HAM"] * 50,  # Only 2 unique values
    "Team": ["Red Bull Racing"] * 50 + ["Mercedes"] * 50,
    "Compound": ["SOFT"] * 30 + ["MEDIUM"] * 40 + ["HARD"] * 30,
    "LapTime": range(100),
})

print(f"Memory before: {laps.memory_usage(deep=True).sum() / 1024:.1f} KB")
# Memory before: 25.3 KB

# Apply categorical dtype
categorical_cols = ["Driver", "Team", "Compound"]
laps_cat = _apply_categorical(laps, categorical_cols, "pandas")

print(f"Memory after: {laps_cat.memory_usage(deep=True).sum() / 1024:.1f} KB")
# Memory after: 8.7 KB (65% reduction)

print(laps_cat["Driver"].dtype)  # category
print(laps_cat["Team"].dtype)    # category

Example - Memory Savings Analysis:

from tif1.core_utils.helpers import _apply_categorical
import pandas as pd

# Realistic race data (20 drivers, 58 laps each)
drivers = ["VER", "PER", "HAM", "RUS", "LEC", "SAI", "NOR", "PIA",
           "ALO", "STR", "OCO", "GAS", "ALB", "SAR", "TSU", "RIC",
           "BOT", "ZHO", "MAG", "HUL"]

laps = pd.DataFrame({
    "Driver": drivers * 58,
    "Team": ["Red Bull Racing"] * 116 + ["Mercedes"] * 116 +
            ["Ferrari"] * 116 + ["McLaren"] * 116 +
            ["Aston Martin"] * 116 + ["Alpine"] * 116 +
            ["Williams"] * 116 + ["Sauber"] * 116 + ["Haas"] * 116,
    "Compound": (["SOFT"] * 20 + ["MEDIUM"] * 30 + ["HARD"] * 8) * 20,
    "LapNumber": list(range(1, 59)) * 20,
})

# Measure memory usage
mem_before = laps.memory_usage(deep=True).sum()

# Apply categorical
laps_cat = _apply_categorical(laps, ["Driver", "Team", "Compound"], "pandas")

mem_after = laps_cat.memory_usage(deep=True).sum()
savings = (1 - mem_after / mem_before) * 100

print(f"Memory before: {mem_before / 1024:.1f} KB")
print(f"Memory after:  {mem_after / 1024:.1f} KB")
print(f"Savings:       {savings:.1f}%")
# Output:
# Memory before: 156.3 KB
# Memory after:  42.1 KB
# Savings:       73.1%

Example - Polars Backend:

from tif1.core_utils.helpers import _apply_categorical
import polars as pl

# Polars DataFrame
laps = pl.DataFrame({
    "Driver": ["VER"] * 50 + ["HAM"] * 50,
    "Team": ["Red Bull Racing"] * 50 + ["Mercedes"] * 50,
    "LapNumber": range(1, 101),
})

# Apply categorical
laps_cat = _apply_categorical(laps, ["Driver", "Team"], "polars")

print(laps_cat.schema)
# {'Driver': Categorical, 'Team': Categorical, 'LapNumber': Int64}

Example - Performance Impact:

import time
import pandas as pd
from tif1.core_utils.helpers import _apply_categorical

# Large dataset
laps = pd.DataFrame({
    "Driver": ["VER", "HAM", "LEC"] * 10000,
    "Team": ["Red Bull Racing", "Mercedes", "Ferrari"] * 10000,
    "LapTime": range(30000),
})

# Benchmark groupby without categorical
start = time.perf_counter()
result1 = laps.groupby("Driver")["LapTime"].mean()
time_without = time.perf_counter() - start

# Apply categorical
laps_cat = _apply_categorical(laps, ["Driver", "Team"], "pandas")

# Benchmark groupby with categorical
start = time.perf_counter()
result2 = laps_cat.groupby("Driver")["LapTime"].mean()
time_with = time.perf_counter() - start

print(f"Without categorical: {time_without*1000:.1f}ms")
print(f"With categorical:    {time_with*1000:.1f}ms")
print(f"Speedup:             {time_without/time_with:.2f}x")
# Output:
# Without categorical: 12.3ms
# With categorical:    4.1ms
# Speedup:             3.00x

The library automatically applies categorical dtype to Driver, Team, Compound, and TrackStatus columns during session loading. This optimization is transparent to users but provides significant memory and performance benefits.

Constants

Column name mappings and constants used throughout the library.

Column rename maps

The constants module defines mappings for renaming columns from CDN format to user-facing format:

# Lap data column renames
LAP_RENAME_MAP = {
    "time": "LapTime",
    "lap": "LapNumber",
    "s1": "Sector1Time",
    "s2": "Sector2Time",
    "s3": "Sector3Time",
    "compound": "Compound",
    "stint": "Stint",
    "life": "TyreLife",
    # ... more mappings
}

# Telemetry column renames
TELEMETRY_RENAME_MAP = {
    "time": "Time",
    "speed": "Speed",
    "rpm": "RPM",
    "gear": "nGear",
    "throttle": "Throttle",
    "brake": "Brake",
    "drs": "DRS",
    # ... more mappings
}

Standard column order

# FastF1-compatible column order for Laps DataFrame
FASTF1_LAPS_COLUMN_ORDER = [
    "index",
    "Time",
    "Driver",
    "DriverNumber",
    "LapTime",
    "LapNumber",
    "Stint",
    "PitOutTime",
    "PitInTime",
    "Sector1Time",
    "Sector2Time",
    "Sector3Time",
    "Sector1SessionTime",
    "Sector2SessionTime",
    "Sector3SessionTime",
    "SpeedI1",
    "SpeedI2",
    "SpeedFL",
    "SpeedST",
    "IsPersonalBest",
    "Compound",
    "TyreLife",
    "FreshTyre",
    "Team",
    "LapStartTime",
    "LapStartDate",
    "TrackStatus",
    "Position",
    # ... more columns
]

# Categorical columns for optimization
CATEGORICAL_COLS = ["Driver", "Team", "Compound", "TrackStatus"]

Constants

The constants module (constants.py) defines all column name mappings, configuration values, and standard column orders used throughout the library. These constants ensure consistency across the codebase and provide FastF1 compatibility.

Year Range Constants

MIN_YEAR = 2018  # First year with TracingInsights data
MAX_YEAR = 2100  # Maximum supported year (future-proofing)

These constants define the valid range for F1 season data. The library supports data from 2018 onwards (when the TracingInsights data collection began) through 2100 (allowing for future scheduled races). Example Usage:

from tif1.core_utils.constants import MIN_YEAR, MAX_YEAR
from tif1.core_utils.helpers import _validate_year

# Validate user input
year = 2024
_validate_year(year, MIN_YEAR, MAX_YEAR)  # OK

# Check if year is in valid range
if MIN_YEAR <= year <= MAX_YEAR:
    print(f"Year {year} is supported")

Cache Configuration

MAX_CACHE_SIZE = 100  # Maximum number of cached items

Defines the maximum number of items to store in the LRU cache for various operations. This prevents unbounded memory growth while maintaining good cache hit rates for typical usage patterns.

Column Rename Maps

The library defines comprehensive column rename mappings to transform CDN data format to user-facing FastF1-compatible format. These mappings handle both verbose and abbreviated column names from different CDN data versions.

Lap Data Rename Map

LAP_RENAME_MAP = {
    # Core lap data
    "time": "LapTime",
    "lap": "LapNumber",
    "compound": "Compound",
    "stint": "Stint",

    # Sector times
    "s1": "Sector1Time",
    "s2": "Sector2Time",
    "s3": "Sector3Time",

    # Tyre data
    "life": "TyreLife",
    "fresh": "FreshTyre",
    "fresh_tyre": "FreshTyre",

    # Position and status
    "pos": "Position",
    "status": "TrackStatus",
    "pb": "IsPersonalBest",

    # Driver and team
    "drv": "Driver",
    "source_driver": "Driver",
    "driver_number": "DriverNumber",
    "dNum": "DriverNumber",
    "source_team": "Team",
    "team": "Team",

    # Session times
    "session_time": "Time",
    "sesT": "Time",
    "pit_out_time": "PitOutTime",
    "pout": "PitOutTime",
    "pit_in_time": "PitInTime",
    "pin": "PitInTime",

    # Sector session times
    "sector1_session_time": "Sector1SessionTime",
    "s1T": "Sector1SessionTime",
    "sector2_session_time": "Sector2SessionTime",
    "s2T": "Sector2SessionTime",
    "sector3_session_time": "Sector3SessionTime",
    "s3T": "Sector3SessionTime",

    # Speed traps
    "speed_i1": "SpeedI1",
    "vi1": "SpeedI1",
    "speed_i2": "SpeedI2",
    "vi2": "SpeedI2",
    "speed_fl": "SpeedFL",
    "vfl": "SpeedFL",
    "speed_st": "SpeedST",
    "vst": "SpeedST",

    # Lap metadata
    "lap_start_time": "LapStartTime",
    "lST": "LapStartTime",
    "lap_start_date": "LapStartDate",
    "lSD": "LapStartDate",
    "deleted": "Deleted",
    "del": "Deleted",
    "deleted_reason": "DeletedReason",
    "delR": "DeletedReason",
    "fastf1_generated": "FastF1Generated",
    "ff1G": "FastF1Generated",
    "is_accurate": "IsAccurate",
    "iacc": "IsAccurate",

    # Qualifying
    "qs": "QualifyingSession",
    "qualifying_session": "QualifyingSession",

    # Weather (per-lap)
    "weather_time": "WeatherTime",
    "wT": "WeatherTime",
    "air_temp": "AirTemp",
    "wAT": "AirTemp",
    "humidity": "Humidity",
    "wH": "Humidity",
    "pressure": "Pressure",
    "wP": "Pressure",
    "rainfall": "Rainfall",
    "wR": "Rainfall",
    "track_temp": "TrackTemp",
    "wTT": "TrackTemp",
    "wind_direction": "WindDirection",
    "wWD": "WindDirection",
    "wind_speed": "WindSpeed",
    "wWS": "WindSpeed",
}

Example Usage:

from tif1.core_utils.constants import LAP_RENAME_MAP
from tif1.core_utils.helpers import _rename_columns
import pandas as pd

# Raw CDN data with abbreviated names
raw_laps = pd.DataFrame({
    "drv": ["VER", "HAM"],
    "lap": [1, 1],
    "time": [90.5, 91.2],
    "s1": [28.1, 28.5],
    "s2": [35.2, 35.8],
    "s3": [27.2, 27.9],
})

# Rename to FastF1 format
laps = _rename_columns(raw_laps, LAP_RENAME_MAP, "pandas")
print(laps.columns.tolist())
# ['Driver', 'LapNumber', 'LapTime', 'Sector1Time', 'Sector2Time', 'Sector3Time']

Telemetry Rename Map

TELEMETRY_RENAME_MAP = {
    # Core telemetry
    "time": "Time",
    "rpm": "RPM",
    "speed": "Speed",
    "gear": "nGear",
    "throttle": "Throttle",
    "brake": "Brake",
    "drs": "DRS",

    # Distance
    "distance": "Distance",
    "rel_distance": "RelativeDistance",

    # Relative positioning
    "driver_ahead": "DriverAhead",
    "distance_to_driver_ahead": "DistanceToDriverAhead",

    # Acceleration (G-forces)
    "acc_x": "AccelerationX",
    "acc_y": "AccelerationY",
    "acc_z": "AccelerationZ",

    # Position coordinates
    "x": "X",
    "y": "Y",
    "z": "Z",

    # Metadata
    "data_key": "DataKey",
    "dataKey": "DataKey",
}

Example Usage:

from tif1.core_utils.constants import TELEMETRY_RENAME_MAP

# Telemetry columns are automatically renamed during loading
# "speed" → "Speed"
# "gear" → "nGear"
# "throttle" → "Throttle"
# etc.

Race Control Messages Rename Map

RACE_CONTROL_RENAME_MAP = {
    "time": "Time",
    "category": "Category",
    "cat": "Category",
    "message": "Message",
    "msg": "Message",
    "status": "Status",
    "flag": "Flag",
    "scope": "Scope",
    "sector": "Sector",
    "racing_number": "RacingNumber",
    "dNum": "RacingNumber",
    "lap": "Lap",
}

Weather Data Rename Map

WEATHER_RENAME_MAP = {
    "time": "Time",
    "wT": "Time",
    "air_temp": "AirTemp",
    "wAT": "AirTemp",
    "humidity": "Humidity",
    "wH": "Humidity",
    "pressure": "Pressure",
    "wP": "Pressure",
    "rainfall": "Rainfall",
    "wR": "Rainfall",
    "track_temp": "TrackTemp",
    "wTT": "TrackTemp",
    "wind_direction": "WindDirection",
    "wWD": "WindDirection",
    "wind_speed": "WindSpeed",
    "wWS": "WindSpeed",
}

Categorical Columns

CATEGORICAL_COLS = ["Driver", "Team", "Compound", "TrackStatus"]

List of columns that should be converted to categorical dtype for memory optimization. These columns have low cardinality (few unique values) and benefit significantly from categorical encoding. Memory Savings:

Driver: ~70% reduction (20 unique values in ~1000 laps)
Team: ~75% reduction (10 unique values)
Compound: ~80% reduction (3-5 unique values)
TrackStatus: ~85% reduction (2-4 unique values)

Standard Column Names

COL_DRIVER = "Driver"
COL_TEAM = "Team"
COL_LAP_NUMBER = "LapNumber"
COL_LAP_TIME = "LapTime"
COL_LAP_TIME_SECONDS = "LapTimeSeconds"
COL_LAP_NUMBER_ALT = "lap"  # Alternative name in some data sources

These constants provide a single source of truth for column names, preventing typos and making refactoring easier. Example Usage:

from tif1.core_utils.constants import COL_DRIVER, COL_LAP_TIME_SECONDS

# Use constants instead of string literals
fastest_lap = laps.loc[laps[COL_LAP_TIME_SECONDS].idxmin()]
driver = fastest_lap[COL_DRIVER]
print(f"Fastest lap by {driver}")

FastF1 Column Order

FASTF1_LAPS_COLUMN_ORDER = [
    # Core identification
    "index",
    "Time",
    "Driver",
    "DriverNumber",

    # Lap timing
    "LapTime",
    "LapNumber",
    "Stint",
    "PitOutTime",
    "PitInTime",

    # Sector times
    "Sector1Time",
    "Sector2Time",
    "Sector3Time",
    "Sector1SessionTime",
    "Sector2SessionTime",
    "Sector3SessionTime",

    # Speed traps
    "SpeedI1",
    "SpeedI2",
    "SpeedFL",
    "SpeedST",

    # Lap metadata
    "IsPersonalBest",
    "Compound",
    "TyreLife",
    "FreshTyre",
    "Team",
    "LapStartTime",
    "LapStartDate",
    "TrackStatus",
    "Position",
    "Deleted",
    "DeletedReason",
    "FastF1Generated",
    "IsAccurate",

    # Weather (per-lap)
    "WeatherTime",
    "AirTemp",
    "Humidity",
    "Pressure",
    "Rainfall",
    "TrackTemp",
    "WindDirection",
    "WindSpeed",

    # tif1-specific columns
    "LapTimeSeconds",
    "QualifyingSession",
]

This list defines the standard column order for laps DataFrames, ensuring FastF1 compatibility. Columns are ordered logically:

Identification columns (index, driver, time)
Core lap data (lap time, number, stint)
Detailed timing (sectors, speed traps)
Metadata (tyres, position, flags)
Weather data
tif1-specific additions

Example Usage:

from tif1.core_utils.helpers import _reorder_laps_columns

# Reorder DataFrame to match FastF1 column order
laps_ordered = _reorder_laps_columns(laps, "pandas")

# Now columns are in the standard order
print(laps_ordered.columns[:10].tolist())
# ['index', 'Time', 'Driver', 'DriverNumber', 'LapTime', ...]

Resource Manager

The resource manager module (resource_manager.py) provides a robust pattern for managing resources with guaranteed cleanup, even when initialization fails partway through. This is critical for preventing resource leaks in error scenarios.

Architecture

The ResourceManager class implements the context manager protocol and tracks resources in a LIFO (Last-In-First-Out) stack. When cleanup occurs, resources are cleaned up in reverse order of creation, ensuring that dependencies are respected. Key Features:

LIFO cleanup order: Resources cleaned up in reverse order of registration
Error resilience: Cleanup continues even if individual resources fail to close
Comprehensive logging: All cleanup operations are logged for debugging
Multiple cleanup methods: Supports both .close() and .shutdown() methods

`ResourceManager`

class ResourceManager:
    """Base class for resource management with guaranteed cleanup."""

    def __init__(self) -> None:
        """Initialize the resource manager."""
        self._resources: list[tuple[str, Any]] = []
        self._initialized: bool = False

    def _register_resource(self, name: str, resource: Any) -> None:
        """Register a resource for cleanup tracking."""
        ...

    def _cleanup_resources(self) -> None:
        """Cleanup all registered resources in reverse order."""
        ...

    def __enter__(self) -> ResourceManager:
        """Enter context manager."""
        ...

    def __exit__(self, exc_type, exc_val, exc_tb) -> bool:
        """Exit context manager and cleanup resources."""
        ...

Methods

`_register_resource`

def _register_resource(self, name: str, resource: Any) -> None

Register a resource for cleanup tracking. Resources are cleaned up in reverse order of registration (LIFO), ensuring dependencies are respected during cleanup. Parameters:

name (str): Human-readable name for the resource (used in logging). Should be descriptive for debugging purposes.
resource (Any): The resource object to track. Should have a .close() or .shutdown() method for cleanup.

Example:

class DatabaseManager(ResourceManager):
    def __init__(self, db_url: str):
        super().__init__()
        try:
            # Create connection
            conn = create_connection(db_url)
            self._register_resource("database_connection", conn)

            # Create connection pool
            pool = create_pool(conn)
            self._register_resource("connection_pool", pool)

            # Create cache
            cache = create_cache()
            self._register_resource("cache", cache)

            self._initialized = True
        except Exception:
            # Cleanup any resources created before the error
            self._cleanup_resources()
            raise

`_cleanup_resources`

def _cleanup_resources(self) -> None

Cleanup all registered resources in reverse order (LIFO). This method attempts to clean up all resources even if individual cleanup operations fail. Cleanup errors are logged but do not prevent other resources from being cleaned up. Cleanup Strategy:

Iterate through resources in reverse order
For each resource, try .close() method first
If no .close(), try .shutdown(wait=True) method
If neither exists, skip the resource
Log any errors but continue cleanup
Clear the resource list after all cleanup attempts

Example:

# Resources registered in this order:
# 1. database_connection
# 2. connection_pool
# 3. cache

# Cleanup happens in reverse order:
# 1. cache.close()
# 2. connection_pool.close()
# 3. database_connection.close()

Usage Patterns

Basic Context Manager Usage

from tif1.core_utils.resource_manager import ResourceManager

class SessionManager(ResourceManager):
    def __init__(self, session_id: str):
        super().__init__()
        try:
            # Create HTTP session
            self.http_session = create_http_session()
            self._register_resource("http_session", self.http_session)

            # Create cache connection
            self.cache = create_cache()
            self._register_resource("cache", self.cache)

            self._initialized = True
        except Exception:
            self._cleanup_resources()
            raise

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self._cleanup_resources()
        return False

# Use with context manager
with SessionManager("session_123") as manager:
    # Resources are automatically cleaned up on exit
    data = manager.http_session.get("https://example.com")
    manager.cache.set("key", data)
# Cleanup happens here automatically

Error Handling During Initialization

from tif1.core_utils.resource_manager import ResourceManager

class ComplexManager(ResourceManager):
    def __init__(self):
        super().__init__()
        try:
            # Step 1: Create database connection
            self.db = create_database()
            self._register_resource("database", self.db)

            # Step 2: Create thread pool (might fail)
            self.pool = create_thread_pool()
            self._register_resource("thread_pool", self.pool)

            # Step 3: Create cache (might fail)
            self.cache = create_cache()
            self._register_resource("cache", self.cache)

            self._initialized = True
        except Exception as e:
            # If any step fails, cleanup resources created so far
            # e.g., if cache creation fails, pool and db are still cleaned up
            self._cleanup_resources()
            raise

# If initialization fails at step 2:
# - database is cleaned up
# - thread_pool is NOT registered (creation failed)
# - cache is NOT registered (never reached)
try:
    manager = ComplexManager()
except Exception as e:
    print(f"Initialization failed: {e}")
    # Database was cleaned up automatically

Manual Cleanup

from tif1.core_utils.resource_manager import ResourceManager

class ManualManager(ResourceManager):
    def __init__(self):
        super().__init__()
        self.session = create_session()
        self._register_resource("session", self.session)

    def close(self):
        """Explicit cleanup method."""
        self._cleanup_resources()

# Use without context manager
manager = ManualManager()
try:
    # Do work
    data = manager.session.get("https://example.com")
finally:
    # Explicit cleanup
    manager.close()

Logging

The ResourceManager provides comprehensive logging for debugging:

import logging
logging.basicConfig(level=logging.DEBUG)

with SessionManager("test") as manager:
    pass

# Log output:
# DEBUG:tif1.core_utils.resource_manager:Registered resource: http_session
# DEBUG:tif1.core_utils.resource_manager:Registered resource: cache
# DEBUG:tif1.core_utils.resource_manager:Closing resource: cache
# DEBUG:tif1.core_utils.resource_manager:Closing resource: http_session
# DEBUG:tif1.core_utils.resource_manager:All resources cleaned up successfully

Error Resilience

If cleanup fails for one resource, other resources are still cleaned up:

class FailingResource:
    def close(self):
        raise RuntimeError("Cleanup failed!")

class ResilientManager(ResourceManager):
    def __init__(self):
        super().__init__()
        self._register_resource("good_resource", GoodResource())
        self._register_resource("failing_resource", FailingResource())
        self._register_resource("another_good_resource", GoodResource())

with ResilientManager() as manager:
    pass

# Log output:
# DEBUG: Closing resource: another_good_resource  ✓
# WARNING: Error cleaning up resource failing_resource: Cleanup failed!
# DEBUG: Closing resource: good_resource  ✓
# WARNING: Cleanup completed with errors: failing_resource: Cleanup failed!

ResourceManager is used internally by the library for managing HTTP sessions, cache connections, and other resources. Most users don’t need to interact with it directly, but it’s available for advanced use cases requiring custom resource management.

Performance Considerations

JSON Parsing

The library uses orjson for JSON parsing, which provides:

2-3x faster parsing than stdlib json
Lower memory usage
Native support for bytes input
Automatic handling of numpy types

Lib Conversion

When converting between backends:

pandas → polars: Uses PyArrow for zero-copy when possible
polars → pandas: Uses PyArrow by default for efficiency
Rechunking: Optional for polars to optimize memory layout

Benchmark results:

# pandas → polars conversion
# 1M rows: ~50ms (zero-copy via PyArrow)

# polars → pandas conversion
# 1M rows: ~100ms (with PyArrow)

Advanced Usage

Custom lib conversion

from tif1.core_utils.backend_conversion import convert_backend
import tif1

# Load 2021 Belgian Grand Prix Race with pandas
session = tif1.get_session(2021, "Belgian Grand Prix", "Race", lib="pandas")
laps_pandas = session.laps

# Convert to polars for analysis
laps_polars = convert_backend(laps_pandas, "polars")

# Perform polars operations
import polars as pl
fast_laps = laps_polars.filter(pl.col("LapTimeSeconds") < 120.0)

# Convert back to pandas if needed
fast_laps_pandas = convert_backend(fast_laps, "pandas")

Custom JSON Processing

from tif1.core_utils.json_utils import json_loads, json_dumps

# Parse JSON from CDN
json_data = '{"laps": [{"lap": 1, "time": 90.5}]}'
data = json_loads(json_data)

# Modify data
data["laps"][0]["time"] = 89.5

# Serialize back
modified_json = json_dumps(data)

Best Practices

Use orjson for JSON: Always use json_loads/json_dumps for performance
Prefer PyArrow conversion: Keep use_pyarrow=True for lib conversion
Validate early: Use validation helpers to catch errors early
Let the library handle resources: ResourceManager is automatic
Use constants for column names: Reference standard column names from constants

Summary

The core_utils package provides:

High-performance JSON parsing with orjson
Efficient lib conversion (pandas ↔ polars)
Data validation utilities
Column name standardization
Resource management
Internal helpers for DataFrame operations

These utilities enable the library’s focus on performance and reliability.

​Overview

​Module Import Structure

​Performance Philosophy

​Backend Conversion

​Architecture

​pandas_to_polars

​polars_to_pandas

​convert_backend

​JSON Utilities

​Why orjson?

​Architecture

​json_loads

​json_dumps

​parse_response_json

​JSON Utilities Best Practices

​Helper Functions

​Design Philosophy

​Validation Functions

​_validate_year

​_validate_drivers_list

​_validate_lap_number

​_validate_string_param

​URL Encoding

​_encode_url_component

​DataFrame Utility Functions

​_is_empty_df

​_create_empty_df

​_filter_valid_laptimes

​_rename_columns

​_apply_categorical

​Constants

​Column rename maps

​Standard column order

​Constants

​Year Range Constants

​Cache Configuration

​Column Rename Maps

​Lap Data Rename Map

​Telemetry Rename Map

​Race Control Messages Rename Map

​Weather Data Rename Map

​Categorical Columns

​Standard Column Names

​FastF1 Column Order

​Resource Manager

​Architecture

​ResourceManager

​Methods

​_register_resource

​_cleanup_resources

​Usage Patterns

​Basic Context Manager Usage

​Error Handling During Initialization

​Manual Cleanup

​Logging

​Error Resilience

​Performance Considerations

​JSON Parsing

​Lib Conversion

​Advanced Usage

​Custom lib conversion

​Custom JSON Processing

​Best Practices

​Summary

Overview

Module Import Structure

Performance Philosophy

Backend Conversion

Architecture

`pandas_to_polars`

`polars_to_pandas`

`convert_backend`

JSON Utilities

Why orjson?

Architecture

`json_loads`

`json_dumps`

`parse_response_json`

JSON Utilities Best Practices

Helper Functions

Design Philosophy

Validation Functions

`_validate_year`

`_validate_drivers_list`

`_validate_lap_number`

`_validate_string_param`

URL Encoding

`_encode_url_component`

DataFrame Utility Functions

`_is_empty_df`

`_create_empty_df`

`_filter_valid_laptimes`

`_rename_columns`

`_apply_categorical`

Constants

Column rename maps

Standard column order

Constants

Year Range Constants

Cache Configuration

Column Rename Maps

Lap Data Rename Map

Telemetry Rename Map

Race Control Messages Rename Map

Weather Data Rename Map

Categorical Columns

Standard Column Names

FastF1 Column Order

Resource Manager

Architecture

`ResourceManager`

Methods

`_register_resource`

`_cleanup_resources`

Usage Patterns

Basic Context Manager Usage

Error Handling During Initialization

Manual Cleanup

Logging

Error Resilience

Performance Considerations

JSON Parsing

Lib Conversion

Advanced Usage

Custom lib conversion

Custom JSON Processing

Best Practices

Summary