The core_utils package provides the foundational utilities that power tif1’s high-performance data processing pipeline. While these are primarily internal APIs, they expose powerful capabilities for advanced users who need fine-grained control over data transformations, backend conversions, and performance optimization.This module is the performance engine of tif1, implementing zero-copy operations, optimized JSON parsing, and intelligent DataFrame transformations that enable the library to process millions of rows of telemetry data with minimal overhead.
Overview
The core utilities package (tif1.core_utils) is organized into five specialized modules, each addressing a critical aspect of the library’s performance and functionality:
- Backend Conversion (
backend_conversion.py) - Zero-copy DataFrame conversion between pandas and polars using Apache Arrow
- JSON Utilities (
json_utils.py) - High-performance JSON parsing with orjson, providing 2-3x faster deserialization
- Helper Functions (
helpers.py) - DataFrame manipulation, validation, and data transformation utilities
- Constants (
constants.py) - Column name mappings, rename dictionaries, and configuration constants
- Resource Manager (
resource_manager.py) - Context manager for guaranteed resource cleanup with LIFO ordering
These utilities enable tif1 to achieve its performance goals while maintaining compatibility with both pandas and polars backends. The design philosophy emphasizes:
- Zero-copy operations wherever possible to minimize memory overhead
- Lazy evaluation and deferred imports to reduce startup time
- Type safety with comprehensive validation and error handling
- Performance-first design with optimized hot paths
- Backend agnostic APIs that work seamlessly with pandas or polars
Module Import Structure
# Direct imports from core_utils
from tif1.core_utils.backend_conversion import (
pandas_to_polars,
polars_to_pandas,
convert_backend,
POLARS_AVAILABLE # Boolean flag for polars availability
)
from tif1.core_utils.json_utils import (
json_loads,
json_dumps,
parse_response_json # HTTP response JSON parser
)
from tif1.core_utils.helpers import (
DataFrame, # Type alias for pd.DataFrame | pl.DataFrame
_validate_year,
_validate_drivers_list,
_validate_lap_number,
_validate_string_param,
_encode_url_component,
_is_empty_df,
_create_empty_df,
_filter_valid_laptimes,
_rename_columns,
_apply_categorical,
_create_telemetry_df,
_reorder_laps_columns,
_normalize_row_iteration,
_ensure_polars_available
)
from tif1.core_utils.constants import (
MIN_YEAR, MAX_YEAR,
LAP_RENAME_MAP,
TELEMETRY_RENAME_MAP,
RACE_CONTROL_RENAME_MAP,
WEATHER_RENAME_MAP,
CATEGORICAL_COLS,
FASTF1_LAPS_COLUMN_ORDER,
COL_DRIVER, COL_TEAM, COL_LAP_NUMBER,
COL_LAP_TIME, COL_LAP_TIME_SECONDS
)
from tif1.core_utils.resource_manager import ResourceManager
The core_utils module is designed with performance as the primary concern. Every function is optimized for:
- Minimal memory allocations: Reuse buffers, avoid unnecessary copies
- Cache-friendly access patterns: Sequential reads, predictable branches
- SIMD-friendly operations: Leverage NumPy/Arrow vectorization
- Lazy computation: Defer work until absolutely necessary
- Zero-copy semantics: Share memory between backends via Arrow
Real-world impact: These optimizations enable tif1 to load and process a full race weekend (3 practice sessions, qualifying, race) with telemetry for all 20 drivers in under 5 seconds on modern hardware.
Backend Conversion
The backend conversion module provides zero-copy DataFrame transformations between pandas and polars using Apache Arrow as the interchange format. This enables seamless switching between backends without the performance penalty of traditional serialization/deserialization.
Architecture
The conversion system leverages:
- Apache Arrow as the zero-copy interchange format
- PyArrow extension arrays for pandas to maintain Arrow memory layout
- Lazy polars imports to avoid dependency requirements when not needed
- Automatic fallback to standard conversion if zero-copy fails
pandas_to_polars
def pandas_to_polars(
df: pd.DataFrame,
*,
rechunk: bool = False
) -> pl.DataFrame
Convert a pandas DataFrame to polars using zero-copy Arrow conversion. This function uses pl.from_pandas() with Arrow as the interchange format, avoiding memory copies when possible.
Parameters:
df (pd.DataFrame): pandas DataFrame to convert. Can contain any pandas-supported dtype including nullable types, categoricals, and datetime types.
rechunk (bool, optional): Whether to rechunk the resulting polars DataFrame for optimal memory layout. Default is False to preserve zero-copy semantics. Set to True if you plan to perform many operations on the polars DataFrame and want contiguous memory.
Returns:
pl.DataFrame: polars DataFrame with equivalent data and schema
Raises:
ImportError: If polars is not installed in the environment
ValueError: If the conversion fails due to incompatible data types or corrupted data
Performance Characteristics:
- Time complexity: O(1) for zero-copy conversion (when rechunk=False), O(n) when rechunking
- Memory overhead: Minimal (shares memory with source DataFrame via Arrow)
- Typical performance: ~50ms for 1M rows with zero-copy
Example - Basic Conversion:
from tif1.core_utils.backend_conversion import pandas_to_polars
import pandas as pd
# Create a pandas DataFrame with various dtypes
df_pandas = pd.DataFrame({
"driver": ["VER", "HAM", "LEC"],
"lap_time": [90.123, 90.456, 90.789],
"position": [1, 2, 3],
"compound": pd.Categorical(["SOFT", "MEDIUM", "SOFT"])
})
# Zero-copy conversion to polars
df_polars = pandas_to_polars(df_pandas)
print(type(df_polars)) # <class 'polars.dataframe.frame.DataFrame'>
Example - With Rechunking:
# Convert with rechunking for optimal polars performance
df_polars = pandas_to_polars(df_pandas, rechunk=True)
# Now polars operations will be faster due to contiguous memory
fast_laps = df_polars.filter(pl.col("lap_time") < 90.5)
Example - Handling Large DataFrames:
import pandas as pd
from tif1.core_utils.backend_conversion import pandas_to_polars
# Large DataFrame (10M rows)
df_large = pd.DataFrame({
"time": pd.date_range("2024-01-01", periods=10_000_000, freq="ms"),
"speed": np.random.uniform(200, 350, 10_000_000),
"throttle": np.random.uniform(0, 100, 10_000_000)
})
# Zero-copy conversion is extremely fast even for large data
import time
start = time.time()
df_polars = pandas_to_polars(df_large)
print(f"Converted 10M rows in {time.time() - start:.3f}s") # ~0.1s
Use rechunk=False (default) when you need to convert data quickly and will only perform a few operations. Use rechunk=True when you plan to perform many polars operations and want optimal performance.
polars_to_pandas
def polars_to_pandas(
df: pl.DataFrame,
*,
use_pyarrow: bool = True
) -> pd.DataFrame
Convert a polars DataFrame to pandas using zero-copy Arrow conversion. This function uses df.to_pandas() with PyArrow extension arrays to maintain Arrow memory layout in pandas, enabling zero-copy semantics.
Parameters:
df (pl.DataFrame): polars DataFrame to convert. Supports all polars data types including nested types (List, Struct), categoricals, and temporal types.
use_pyarrow (bool, optional): Whether to use PyArrow extension arrays in the resulting pandas DataFrame. Default is True for zero-copy conversion. Set to False to convert to native pandas dtypes (slower but more compatible with legacy pandas code).
Returns:
pd.DataFrame: pandas DataFrame with equivalent data and schema
Raises:
ImportError: If polars is not installed in the environment
ValueError: If the conversion fails due to incompatible data types
Performance Characteristics:
- Time complexity: O(1) for zero-copy (when use_pyarrow=True), O(n) for native pandas dtypes
- Memory overhead: Minimal with PyArrow arrays, 2x memory usage with native dtypes
- Typical performance: ~100ms for 1M rows with PyArrow
Example - Basic Conversion:
from tif1.core_utils.backend_conversion import polars_to_pandas
import polars as pl
# Create a polars DataFrame
df_polars = pl.DataFrame({
"driver": ["VER", "HAM", "LEC"],
"lap_time": [90.123, 90.456, 90.789],
"position": [1, 2, 3]
})
# Zero-copy conversion to pandas with PyArrow arrays
df_pandas = polars_to_pandas(df_polars)
print(df_pandas.dtypes)
# driver string[pyarrow]
# lap_time double[pyarrow]
# position int64[pyarrow]
Example - Native Pandas Dtypes:
# Convert to native pandas dtypes (slower, more memory)
df_pandas_native = polars_to_pandas(df_polars, use_pyarrow=False)
print(df_pandas_native.dtypes)
# driver object
# lap_time float64
# position int64
Example - Handling Nested Types:
import polars as pl
from tif1.core_utils.backend_conversion import polars_to_pandas
# Polars DataFrame with nested types
df_nested = pl.DataFrame({
"driver": ["VER", "HAM"],
"sector_times": [[28.1, 35.2, 26.8], [28.3, 35.5, 27.0]],
"metadata": [{"team": "Red Bull", "car": 1}, {"team": "Mercedes", "car": 44}]
})
# PyArrow handles nested types seamlessly
df_pandas = polars_to_pandas(df_nested)
print(df_pandas["sector_times"].iloc[0]) # [28.1, 35.2, 26.8]
When use_pyarrow=True, the resulting pandas DataFrame uses PyArrow extension arrays. Some legacy pandas operations may not support these arrays. If you encounter compatibility issues, set use_pyarrow=False to use native pandas dtypes.
convert_backend
def convert_backend(
df: DataFrame,
target_backend: str
) -> DataFrame
Intelligently convert a DataFrame to the target backend (pandas or polars) using zero-copy conversion when possible. This is the high-level API that automatically detects the source backend and performs the appropriate conversion.
Parameters:
df (DataFrame): DataFrame to convert. Can be either pd.DataFrame or pl.DataFrame.
target_backend (str): Target backend name. Must be either "pandas" or "polars" (case-sensitive).
Returns:
DataFrame: DataFrame in the target backend format. If already in the target backend, returns the input DataFrame unchanged (no-op).
Raises:
ValueError: If target_backend is not “pandas” or “polars”, or if the conversion fails
ImportError: If polars is not installed and target is “polars”
Performance Characteristics:
- No-op detection: O(1) type check to avoid unnecessary conversions
- Conversion time: Same as
pandas_to_polars or polars_to_pandas
- Memory efficient: Uses zero-copy conversion internally
Example - Basic Usage:
from tif1.core_utils.backend_conversion import convert_backend
import pandas as pd
# Start with pandas
df = pd.DataFrame({"driver": ["VER", "HAM"], "position": [1, 2]})
# Convert to polars
df_polars = convert_backend(df, "polars")
print(type(df_polars)) # <class 'polars.dataframe.frame.DataFrame'>
# Convert back to pandas
df_pandas = convert_backend(df_polars, "pandas")
print(type(df_pandas)) # <class 'pandas.core.frame.DataFrame'>
# No-op: already pandas
df_same = convert_backend(df_pandas, "pandas")
assert df_same is df_pandas # Same object, no conversion
Example - Integration with tif1:
import tif1
from tif1.core_utils.backend_conversion import convert_backend
# Load session with pandas backend
session = tif1.get_session(2024, "Monaco", "Race", backend="pandas")
laps_pandas = session.laps
# Convert to polars for faster filtering
laps_polars = convert_backend(laps_pandas, "polars")
# Perform polars operations
import polars as pl
verstappen_laps = laps_polars.filter(pl.col("Driver") == "VER")
fast_laps = verstappen_laps.filter(pl.col("LapTimeSeconds") < 75.0)
# Convert back to pandas for plotting
fast_laps_pandas = convert_backend(fast_laps, "pandas")
Example - Error Handling:
from tif1.core_utils.backend_conversion import convert_backend
try:
df_converted = convert_backend(df, "spark") # Invalid backend
except ValueError as e:
print(e) # "Invalid target_backend: spark. Must be 'pandas' or 'polars'"
try:
df_polars = convert_backend(df, "polars") # polars not installed
except ImportError as e:
print(e) # "polars is not installed"
Use convert_backend as the primary API for backend conversion. It handles edge cases, performs no-op detection, and provides consistent error messages.
JSON Utilities
The JSON utilities module provides high-performance JSON parsing and serialization using orjson, a fast, correct JSON library for Python written in Rust. This module is critical for tif1’s performance as it handles parsing of large JSON payloads from the CDN containing lap data, telemetry, and race control messages.
Why orjson?
orjson provides significant advantages over Python’s standard library json module:
- 2-3x faster parsing: Rust-based implementation with SIMD optimizations
- Lower memory usage: Efficient memory allocation and reuse
- Native bytes support: Parse directly from HTTP response bodies without decoding
- NumPy integration: Automatic handling of NumPy types during serialization
- Strict correctness: Validates JSON spec compliance
Benchmark comparison (parsing 10MB JSON payload):
stdlib json.loads(): 450ms
orjson.loads(): 180ms (2.5x faster)
Architecture
The JSON utilities module implements a fallback strategy:
- Primary: Use orjson for maximum performance
- Fallback: Use stdlib json if orjson fails (rare edge cases)
- Automatic: No user configuration required
This ensures reliability while maintaining performance for the common case.
json_loads
def json_loads(
payload: str | bytes | bytearray | memoryview
) -> Any
Deserialize JSON payload to Python object using accelerated orjson codec with automatic fallback to stdlib json.
Parameters:
payload (str | bytes | bytearray | memoryview): JSON data to parse. Accepts multiple input types for flexibility:
str: Standard JSON string (UTF-8 encoded)
bytes: Raw bytes from HTTP responses (most efficient)
bytearray: Mutable byte array
memoryview: Zero-copy view of bytes (converted to bytes internally)
Returns:
Any: Parsed Python object. Common return types:
dict: JSON objects {}
list: JSON arrays []
str, int, float, bool, None: JSON primitives
Raises:
json.JSONDecodeError: If the payload is not valid JSON (from fallback parser)
ValueError: If the payload is malformed (from orjson)
Performance Characteristics:
- Time complexity: O(n) where n is the payload size
- Memory overhead: Minimal (orjson uses efficient allocation)
- Typical performance:
- Small payloads (<1KB): ~10μs
- Medium payloads (100KB): ~2ms
- Large payloads (10MB): ~180ms
Example - Basic Usage:
from tif1.core_utils.json_utils import json_loads
# Parse JSON string
json_str = '{"driver": "VER", "team": "Red Bull Racing", "position": 1}'
data = json_loads(json_str)
print(data["driver"]) # "VER"
print(data["position"]) # 1
# Parse JSON bytes (common from HTTP responses)
json_bytes = b'{"lap": 1, "time": 90.123}'
lap_data = json_loads(json_bytes)
print(lap_data["time"]) # 90.123
Example - Parsing CDN Response:
import niquests
from tif1.core_utils.json_utils import json_loads
# Fetch lap data from CDN
response = niquests.get("https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/...")
# Parse directly from response bytes (no decode step)
lap_data = json_loads(response.content)
# Process lap data
for lap in lap_data["laps"]:
print(f"Lap {lap['lap']}: {lap['time']}s")
Example - Handling Complex Nested Data:
from tif1.core_utils.json_utils import json_loads
# Complex nested JSON structure
complex_json = '''
{
"session": {
"year": 2024,
"gp": "Monaco",
"type": "Race"
},
"laps": [
{
"driver": "VER",
"lap": 1,
"sectors": [28.1, 35.2, 26.8],
"telemetry": {
"speed": [250, 280, 310],
"throttle": [100, 100, 100]
}
}
]
}
'''
data = json_loads(complex_json)
print(data["session"]["gp"]) # "Monaco"
print(data["laps"][0]["sectors"]) # [28.1, 35.2, 26.8]
print(data["laps"][0]["telemetry"]["speed"]) # [250, 280, 310]
Example - Performance Comparison:
import json
import time
from tif1.core_utils.json_utils import json_loads
# Large JSON payload (10MB)
large_json = json.dumps({"data": list(range(1_000_000))})
# Benchmark stdlib json
start = time.perf_counter()
data1 = json.loads(large_json)
stdlib_time = time.perf_counter() - start
# Benchmark orjson via json_loads
start = time.perf_counter()
data2 = json_loads(large_json)
orjson_time = time.perf_counter() - start
print(f"stdlib json: {stdlib_time:.3f}s")
print(f"orjson: {orjson_time:.3f}s")
print(f"Speedup: {stdlib_time / orjson_time:.2f}x")
# Output:
# stdlib json: 0.450s
# orjson: 0.180s
# Speedup: 2.50x
When fetching JSON from HTTP endpoints, pass response.content (bytes) directly to json_loads instead of response.text (str). This avoids an unnecessary UTF-8 decode step and is faster.
json_dumps
def json_dumps(data: Any) -> str
Serialize Python object to JSON string using accelerated orjson codec with automatic fallback to stdlib json.
Parameters:
data (Any): Python object to serialize. Supported types:
dict, list: Collections
str, int, float, bool, None: Primitives
datetime, date, time: Temporal types (ISO 8601 format)
UUID: Converted to string
numpy types: Automatically converted to Python equivalents
dataclasses, pydantic models: Serialized to dict
Returns:
str: JSON string representation of the data
Raises:
TypeError: If the data contains non-serializable types (e.g., custom classes without __dict__)
Performance Characteristics:
- Time complexity: O(n) where n is the data size
- Memory overhead: Minimal (efficient string building)
- Typical performance:
- Small objects (<1KB): ~5μs
- Medium objects (100KB): ~1ms
- Large objects (10MB): ~100ms
Example - Basic Usage:
from tif1.core_utils.json_utils import json_dumps
# Serialize dict
data = {
"driver": "VER",
"team": "Red Bull Racing",
"position": 1,
"lap_time": 90.123
}
json_str = json_dumps(data)
print(json_str)
# {"driver":"VER","team":"Red Bull Racing","position":1,"lap_time":90.123}
Example - Serializing NumPy Types:
import numpy as np
from tif1.core_utils.json_utils import json_dumps
# NumPy arrays and types are automatically handled
data = {
"speeds": np.array([250, 280, 310]),
"avg_speed": np.float64(280.5),
"lap_count": np.int32(58)
}
json_str = json_dumps(data)
print(json_str)
# {"speeds":[250,280,310],"avg_speed":280.5,"lap_count":58}
Example - Serializing Datetime Objects:
from datetime import datetime, date
from tif1.core_utils.json_utils import json_dumps
data = {
"session_date": date(2024, 5, 26),
"session_start": datetime(2024, 5, 26, 15, 0, 0),
"driver": "VER"
}
json_str = json_dumps(data)
print(json_str)
# {"session_date":"2024-05-26","session_start":"2024-05-26T15:00:00","driver":"VER"}
Example - Round-trip Serialization:
from tif1.core_utils.json_utils import json_loads, json_dumps
# Original data
original = {
"drivers": ["VER", "HAM", "LEC"],
"lap_times": [90.1, 90.3, 90.5],
"metadata": {"session": "Race", "year": 2024}
}
# Serialize to JSON
json_str = json_dumps(original)
# Deserialize back to Python
restored = json_loads(json_str)
# Verify round-trip
assert original == restored
print("Round-trip successful!")
json_dumps uses orjson for 2-3x faster serialization than stdlib json. It automatically handles NumPy types, datetime objects, and other common Python types without requiring custom encoders.
parse_response_json
def parse_response_json(response: Any) -> Any
Decode an HTTP response body to Python object, preferring raw-byte parsing when available for maximum performance. This function is optimized for parsing JSON from HTTP responses (niquests, requests, httpx, etc.).
Parameters:
response (Any): HTTP response object. Should have either:
.content attribute (bytes): Preferred for performance
.json() method: Fallback for compatibility
Returns:
Any: Parsed Python object from the response JSON body
Raises:
json.JSONDecodeError: If the response body is not valid JSON
AttributeError: If the response object has neither .content nor .json()
Performance Characteristics:
- Optimized path: Parse from
.content bytes using orjson (~180ms for 10MB)
- Fallback path: Use
.json() method (~450ms for 10MB)
- Speedup: 2-3x faster than calling
.json() directly
Example - Parsing niquests Response:
import niquests
from tif1.core_utils.json_utils import parse_response_json
# Fetch data from CDN
response = niquests.get("https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/...")
# Parse JSON efficiently (uses response.content bytes)
data = parse_response_json(response)
# Process data
print(f"Loaded {len(data['laps'])} laps")
Example - Comparing Performance:
import niquests
import time
from tif1.core_utils.json_utils import parse_response_json
url = "https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/..."
response = niquests.get(url)
# Method 1: Standard .json() method
start = time.perf_counter()
data1 = response.json()
method1_time = time.perf_counter() - start
# Method 2: Optimized parse_response_json
start = time.perf_counter()
data2 = parse_response_json(response)
method2_time = time.perf_counter() - start
print(f"response.json(): {method1_time:.3f}s")
print(f"parse_response_json(): {method2_time:.3f}s")
print(f"Speedup: {method1_time / method2_time:.2f}x")
# Output:
# response.json(): 0.045s
# parse_response_json(): 0.018s
# Speedup: 2.50x
Example - Integration with tif1 HTTP Session:
from tif1.http_session import HTTPSession
from tif1.core_utils.json_utils import parse_response_json
# Create HTTP session
session = HTTPSession()
# Fetch lap data
response = session.get("https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/...")
# Parse efficiently
lap_data = parse_response_json(response)
# Process laps
for lap in lap_data["laps"]:
print(f"Driver {lap['driver']}: Lap {lap['lap']} - {lap['time']}s")
Always use parse_response_json() instead of response.json() when working with HTTP responses in performance-critical code. The 2-3x speedup adds up quickly when fetching data for multiple sessions or drivers.
JSON Utilities Best Practices
- Use bytes when possible: Pass
response.content to json_loads instead of response.text
- Prefer parse_response_json: Use
parse_response_json() for HTTP responses
- Trust the fallback: The automatic fallback to stdlib json ensures reliability
- Benchmark your use case: Profile your specific JSON payloads to measure impact
- Handle errors gracefully: Catch
json.JSONDecodeError for malformed JSON
Common Pitfalls:
- ❌
json.loads(response.text) - Slow (decode + parse)
- ❌
response.json() - Slower (uses stdlib json)
- ✅
json_loads(response.content) - Fast (direct bytes parse)
- ✅
parse_response_json(response) - Fastest (optimized path)
Helper Functions
The helpers module (helpers.py) provides a comprehensive suite of utility functions for DataFrame manipulation, validation, and data transformation. These functions are the workhorses of tif1’s data processing pipeline, handling everything from input validation to complex DataFrame operations across both pandas and polars backends.
Design Philosophy
The helper functions follow these core principles:
- Backend Agnostic: All functions work seamlessly with both pandas and polars DataFrames
- Zero-Copy Optimization: Minimize memory allocations and avoid unnecessary data copies
- Type Safety: Comprehensive validation with clear error messages
- Performance First: Optimized hot paths for common operations
- Defensive Programming: Handle edge cases gracefully with fallback strategies
Validation Functions
Input validation is critical for data integrity and user experience. The validation helpers provide comprehensive checks with informative error messages.
_validate_year
def _validate_year(year: int, min_year: int, max_year: int) -> None
Validate that a year is within the supported range for F1 data. This function is called before any data fetching operations to ensure the requested year has available data.
Parameters:
year (int): Year to validate. Must be an integer representing a calendar year.
min_year (int): Minimum supported year (inclusive). Typically 2018 for tif1.
max_year (int): Maximum supported year (inclusive). Typically the current year + 1 for future scheduled races.
Raises:
ValueError: If year is outside the range [min_year, max_year]. Error message includes the valid range and the invalid value provided.
Performance:
- Time complexity: O(1) - simple integer comparison
- Typical execution: <1μs
Example - Basic Validation:
from tif1.core_utils.helpers import _validate_year
from tif1.core_utils.constants import MIN_YEAR, MAX_YEAR
# Valid year
_validate_year(2024, MIN_YEAR, MAX_YEAR) # No error
# Invalid year (too old)
try:
_validate_year(2010, MIN_YEAR, MAX_YEAR)
except ValueError as e:
print(e) # "Year must be between 2018 and 2100, got 2010"
# Invalid year (future)
try:
_validate_year(2150, MIN_YEAR, MAX_YEAR)
except ValueError as e:
print(e) # "Year must be between 2018 and 2100, got 2150"
Example - Integration with Session Loading:
from tif1.core_utils.helpers import _validate_year
from tif1.core_utils.constants import MIN_YEAR, MAX_YEAR
def load_session(year: int, gp: str, session_type: str):
"""Load session with year validation."""
# Validate year before expensive network operations
_validate_year(year, MIN_YEAR, MAX_YEAR)
# Proceed with data loading...
# (validation ensures we don't waste time on invalid requests)
This validation is performed early in the data loading pipeline to fail fast and provide clear feedback before any network requests are made.
_validate_drivers_list
def _validate_drivers_list(drivers: list[str] | None) -> None
Validate that a drivers list parameter is well-formed and contains valid driver codes. This function ensures that driver filtering operations receive properly formatted input.
Parameters:
drivers (list[str] | None): List of driver codes to validate. Each code should be a non-empty string (typically 3-letter abbreviations like “VER”, “HAM”, “LEC”). Can be None to indicate no filtering.
Raises:
TypeError: If drivers is not a list or None. Error message includes the actual type received.
ValueError: If the list is empty, or if any element is not a non-empty string. Error messages are specific to the validation failure.
Validation Rules:
- Must be a list type (not tuple, set, or other iterable)
- Cannot be an empty list (use
None instead to indicate “all drivers”)
- All elements must be strings
- All strings must be non-empty (no empty strings or whitespace-only strings)
Performance:
- Time complexity: O(n) where n is the number of drivers
- Typical execution: <10μs for 20 drivers
Example - Valid Driver Lists:
from tif1.core_utils.helpers import _validate_drivers_list
# Valid: None (no filtering)
_validate_drivers_list(None) # OK
# Valid: Single driver
_validate_drivers_list(["VER"]) # OK
# Valid: Multiple drivers
_validate_drivers_list(["VER", "HAM", "LEC"]) # OK
# Valid: All drivers in a race
all_drivers = ["VER", "PER", "HAM", "RUS", "LEC", "SAI", "NOR", "PIA",
"ALO", "STR", "OCO", "GAS", "ALB", "SAR", "TSU", "RIC",
"BOT", "ZHO", "MAG", "HUL"]
_validate_drivers_list(all_drivers) # OK
Example - Invalid Driver Lists:
from tif1.core_utils.helpers import _validate_drivers_list
# Invalid: Empty list
try:
_validate_drivers_list([])
except ValueError as e:
print(e) # "drivers list cannot be empty"
# Invalid: Wrong type (tuple instead of list)
try:
_validate_drivers_list(("VER", "HAM"))
except TypeError as e:
print(e) # "drivers must be a list, got tuple"
# Invalid: Contains empty string
try:
_validate_drivers_list(["VER", "", "HAM"])
except ValueError as e:
print(e) # "drivers must be a list of non-empty strings"
# Invalid: Contains non-string
try:
_validate_drivers_list(["VER", 33, "HAM"])
except ValueError as e:
print(e) # "drivers must be a list of non-empty strings"
Example - Integration with Data Filtering:
from tif1.core_utils.helpers import _validate_drivers_list
def get_driver_laps(laps_df, drivers: list[str] | None):
"""Filter laps DataFrame by drivers with validation."""
# Validate input before filtering
_validate_drivers_list(drivers)
if drivers is None:
return laps_df # No filtering
# Safe to filter now
return laps_df[laps_df["Driver"].isin(drivers)]
Use None instead of an empty list when you want to indicate “all drivers”. An empty list is considered an error because it would result in an empty DataFrame, which is likely unintentional.
_validate_lap_number
def _validate_lap_number(lap_number: int) -> None
Validate that a lap number is a positive integer. Lap numbers in F1 start from 1 (not 0), so this function ensures the value is valid for lap-based operations.
Parameters:
lap_number (int): Lap number to validate. Must be a positive integer (>= 1).
Raises:
TypeError: If lap_number is not an integer. Error message includes the actual type received.
ValueError: If lap_number is zero or negative. Error message includes the invalid value.
Validation Rules:
- Must be an integer type (not float, string, or other numeric type)
- Must be positive (>= 1)
Performance:
- Time complexity: O(1) - simple type check and comparison
- Typical execution: <1μs
Example - Valid Lap Numbers:
from tif1.core_utils.helpers import _validate_lap_number
# Valid: First lap
_validate_lap_number(1) # OK
# Valid: Mid-race lap
_validate_lap_number(35) # OK
# Valid: Final lap of a typical race
_validate_lap_number(58) # OK
Example - Invalid Lap Numbers:
from tif1.core_utils.helpers import _validate_lap_number
# Invalid: Zero
try:
_validate_lap_number(0)
except ValueError as e:
print(e) # "lap_number must be positive, got 0"
# Invalid: Negative
try:
_validate_lap_number(-5)
except ValueError as e:
print(e) # "lap_number must be positive, got -5"
# Invalid: Float (even if positive)
try:
_validate_lap_number(10.5)
except TypeError as e:
print(e) # "lap_number must be an integer, got float"
# Invalid: String
try:
_validate_lap_number("10")
except TypeError as e:
print(e) # "lap_number must be an integer, got str"
Example - Integration with Telemetry Loading:
from tif1.core_utils.helpers import _validate_lap_number
def get_lap_telemetry(driver: str, lap_number: int):
"""Load telemetry for a specific lap with validation."""
# Validate lap number before expensive operations
_validate_lap_number(lap_number)
# Proceed with telemetry loading...
# (validation ensures we don't request invalid lap data)
Lap numbers in F1 start from 1, not 0. If you’re iterating over laps, use range(1, num_laps + 1) instead of range(num_laps).
_validate_string_param
def _validate_string_param(param: str, param_name: str) -> None
Validate that a string parameter is non-empty and properly formatted. This generic validation function is used throughout the library for string inputs like GP names, session types, driver codes, etc.
Parameters:
param (str): String parameter to validate. Must be a non-empty string with at least one non-whitespace character.
param_name (str): Human-readable parameter name for error messages. Used to provide context in error messages (e.g., “gp”, “session_type”, “driver”).
Raises:
TypeError: If param is not a string. Error message includes the parameter name and actual type received.
ValueError: If param is empty or contains only whitespace. Error message includes the parameter name.
Validation Rules:
- Must be a string type (not int, None, or other type)
- Cannot be empty string
""
- Cannot be whitespace-only (e.g.,
" ", "\t", "\n")
Performance:
- Time complexity: O(n) where n is the string length (for whitespace check)
- Typical execution: <5μs for typical parameter lengths
Example - Valid String Parameters:
from tif1.core_utils.helpers import _validate_string_param
# Valid: GP name
_validate_string_param("Monaco", "gp") # OK
# Valid: Session type
_validate_string_param("Race", "session_type") # OK
# Valid: Driver code
_validate_string_param("VER", "driver") # OK
# Valid: String with spaces
_validate_string_param("Belgian Grand Prix", "gp") # OK
Example - Invalid String Parameters:
from tif1.core_utils.helpers import _validate_string_param
# Invalid: Empty string
try:
_validate_string_param("", "gp")
except ValueError as e:
print(e) # "gp cannot be empty"
# Invalid: Whitespace only
try:
_validate_string_param(" ", "session_type")
except ValueError as e:
print(e) # "session_type cannot be empty"
# Invalid: None
try:
_validate_string_param(None, "driver")
except TypeError as e:
print(e) # "driver must be a string, got NoneType"
# Invalid: Integer
try:
_validate_string_param(2024, "year")
except TypeError as e:
print(e) # "year must be a string, got int"
Example - Integration with API Functions:
from tif1.core_utils.helpers import _validate_string_param
def get_session(year: int, gp: str, session_type: str):
"""Load session with comprehensive validation."""
# Validate string parameters
_validate_string_param(gp, "gp")
_validate_string_param(session_type, "session_type")
# Proceed with session loading...
# (validation ensures we have valid input before processing)
This function uses .strip() to check for whitespace-only strings, ensuring that parameters like " " are rejected even though they have non-zero length.
URL Encoding
_encode_url_component
@lru_cache(maxsize=1024)
def _encode_url_component(component: str) -> str
URL-encode a string component for safe use in CDN URLs. This function properly encodes special characters, spaces, and international characters according to RFC 3986, ensuring that URLs are valid and don’t break when passed to HTTP clients.
Parameters:
component (str): String to encode. Can contain any Unicode characters, spaces, or special characters.
Returns:
str: URL-encoded string with all special characters percent-encoded (e.g., space becomes %20, & becomes %26).
Caching:
- Uses
@lru_cache(maxsize=1024) for memoization
- Repeated calls with the same input return cached results instantly
- Cache size of 1024 is sufficient for typical usage (GP names, session types, etc.)
Performance:
- Time complexity: O(n) for first call, O(1) for cached calls
- Typical execution:
- First call: ~10μs
- Cached call: <1μs (cache lookup)
Example - Basic URL Encoding:
from tif1.core_utils.helpers import _encode_url_component
# Encode GP name with spaces
encoded = _encode_url_component("Belgian Grand Prix")
print(encoded) # "Belgian%20Grand%20Prix"
# Encode special characters
encoded = _encode_url_component("São Paulo")
print(encoded) # "S%C3%A3o%20Paulo"
# Encode session type
encoded = _encode_url_component("Practice 1")
print(encoded) # "Practice%201"
Example - Building CDN URLs:
from tif1.core_utils.helpers import _encode_url_component
def build_cdn_url(year: int, gp: str, session_type: str, data_type: str) -> str:
"""Build CDN URL with proper encoding."""
base_url = f"https://cdn.jsdelivr.net/gh/TracingInsights/{year}@main"
# Encode components for URL safety
gp_encoded = _encode_url_component(gp)
session_encoded = _encode_url_component(session_type)
return f"{base_url}/{gp_encoded}/{session_encoded}/{data_type}.json"
# Example usage
url = build_cdn_url(2024, "Belgian Grand Prix", "Race", "laps")
print(url)
# https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/Belgian%20Grand%20Prix/Race/laps.json
Example - Cache Performance:
import time
from tif1.core_utils.helpers import _encode_url_component
# First call: performs encoding
start = time.perf_counter()
result1 = _encode_url_component("Belgian Grand Prix")
first_call_time = time.perf_counter() - start
# Second call: returns cached result
start = time.perf_counter()
result2 = _encode_url_component("Belgian Grand Prix")
cached_call_time = time.perf_counter() - start
print(f"First call: {first_call_time * 1e6:.1f}μs") # ~10μs
print(f"Cached call: {cached_call_time * 1e6:.1f}μs") # <1μs
print(f"Speedup: {first_call_time / cached_call_time:.0f}x") # ~10-20x
Example - Handling International Characters:
from tif1.core_utils.helpers import _encode_url_component
# International GP names
gp_names = [
"São Paulo", # Portuguese
"México", # Spanish
"Montréal", # French
"Österreich", # German
]
for gp in gp_names:
encoded = _encode_url_component(gp)
print(f"{gp:15} → {encoded}")
# Output:
# São Paulo → S%C3%A3o%20Paulo
# México → M%C3%A9xico
# Montréal → Montr%C3%A9al
# Österreich → %C3%96sterreich
The LRU cache makes repeated URL encoding operations essentially free. Since GP names and session types are reused frequently, the cache hit rate is typically >95% in real-world usage.
DataFrame Utility Functions
These functions provide backend-agnostic operations for DataFrame manipulation, enabling seamless work with both pandas and polars.
_is_empty_df
def _is_empty_df(df, lib: str) -> bool
Check if a DataFrame-like object is empty, working across both pandas and polars backends. This function handles the different APIs for checking emptiness and provides a unified interface.
Parameters:
df: DataFrame-like object to check. Can be pd.DataFrame, pl.DataFrame, or any object with .empty or .is_empty() attributes.
lib (str): Backend library name ("pandas" or "polars"). Used as a hint for optimization, but the function also performs runtime type checking.
Returns:
bool: True if the DataFrame is empty (zero rows), False otherwise.
Detection Strategy:
The function uses a multi-layered approach to handle various DataFrame types:
- Type-based detection: Check
isinstance(df, pd.DataFrame) or isinstance(df, pl.DataFrame)
- Attribute-based detection: Check for
.empty (pandas) or .is_empty() (polars)
- Fallback: Use
len(df) == 0 as last resort
Performance:
- Time complexity: O(1) - all checks are constant time
- Typical execution: <1μs
Example - Basic Usage:
from tif1.core_utils.helpers import _is_empty_df
import pandas as pd
import polars as pl
# Empty pandas DataFrame
df_empty_pd = pd.DataFrame()
print(_is_empty_df(df_empty_pd, "pandas")) # True
# Non-empty pandas DataFrame
df_pd = pd.DataFrame({"driver": ["VER", "HAM"]})
print(_is_empty_df(df_pd, "pandas")) # False
# Empty polars DataFrame
df_empty_pl = pl.DataFrame()
print(_is_empty_df(df_empty_pl, "polars")) # True
# Non-empty polars DataFrame
df_pl = pl.DataFrame({"driver": ["VER", "HAM"]})
print(_is_empty_df(df_pl, "polars")) # False
Example - Integration with Data Loading:
from tif1.core_utils.helpers import _is_empty_df, _create_empty_df
def load_laps_with_fallback(year: int, gp: str, session: str, lib: str):
"""Load laps with empty DataFrame fallback."""
try:
laps = fetch_laps_from_cdn(year, gp, session, lib)
# Check if we got data
if _is_empty_df(laps, lib):
logger.warning(f"No lap data found for {year} {gp} {session}")
return _create_empty_df(lib)
return laps
except Exception as e:
logger.error(f"Failed to load laps: {e}")
return _create_empty_df(lib)
Example - Conditional Processing:
from tif1.core_utils.helpers import _is_empty_df
def process_telemetry(telemetry_df, lib: str):
"""Process telemetry data if available."""
# Skip processing if no data
if _is_empty_df(telemetry_df, lib):
print("No telemetry data to process")
return None
# Proceed with processing
print(f"Processing {len(telemetry_df)} telemetry samples")
# ... processing logic ...
This function prefers concrete type checking over the lib parameter because some code paths can surface pandas DataFrames even when the configured backend is polars (e.g., during backend conversion).
_create_empty_df
def _create_empty_df(lib: str)
Create an empty DataFrame for the specified backend. This function provides a consistent way to create empty DataFrames across backends, useful for fallback scenarios and initialization.
Parameters:
lib (str): Backend library name. Must be either "pandas" or "polars".
Returns:
pd.DataFrame if lib == "pandas"
pl.DataFrame if lib == "polars" and polars is available
pd.DataFrame as fallback if polars is requested but not installed
Performance:
- Time complexity: O(1) - creates empty structure
- Typical execution: <10μs
Example - Basic Usage:
from tif1.core_utils.helpers import _create_empty_df
# Create empty pandas DataFrame
df_pd = _create_empty_df("pandas")
print(type(df_pd)) # <class 'pandas.core.frame.DataFrame'>
print(len(df_pd)) # 0
# Create empty polars DataFrame
df_pl = _create_empty_df("polars")
print(type(df_pl)) # <class 'polars.dataframe.frame.DataFrame'>
print(len(df_pl)) # 0
Example - Fallback Pattern:
from tif1.core_utils.helpers import _create_empty_df, _is_empty_df
def get_driver_laps(session, driver: str, lib: str):
"""Get laps for a driver with empty DataFrame fallback."""
try:
laps = session.laps
# Filter by driver
if lib == "polars":
import polars as pl
driver_laps = laps.filter(pl.col("Driver") == driver)
else:
driver_laps = laps[laps["Driver"] == driver]
# Return empty DataFrame if no laps found
if _is_empty_df(driver_laps, lib):
return _create_empty_df(lib)
return driver_laps
except Exception as e:
logger.error(f"Error getting driver laps: {e}")
return _create_empty_df(lib)
Example - Initialization:
from tif1.core_utils.helpers import _create_empty_df
class SessionData:
"""Container for session data."""
def __init__(self, lib: str = "pandas"):
self.lib = lib
# Initialize with empty DataFrames
self.laps = _create_empty_df(lib)
self.telemetry = _create_empty_df(lib)
self.weather = _create_empty_df(lib)
self.messages = _create_empty_df(lib)
def is_loaded(self) -> bool:
"""Check if any data has been loaded."""
from tif1.core_utils.helpers import _is_empty_df
return not all([
_is_empty_df(self.laps, self.lib),
_is_empty_df(self.telemetry, self.lib),
_is_empty_df(self.weather, self.lib),
_is_empty_df(self.messages, self.lib),
])
Use _create_empty_df() instead of pd.DataFrame() or pl.DataFrame() directly when you need backend-agnostic code. This ensures consistency and handles the polars availability check automatically.
_filter_valid_laptimes
def _filter_valid_laptimes(laps, lib: str)
Filter laps DataFrame to include only rows with valid lap times, and add a numeric LapTimeSeconds column for analysis. This function is critical for data quality, removing invalid laps (pit laps, out laps, deleted laps) and providing a consistent numeric representation of lap times.
Parameters:
laps: Laps DataFrame (pandas or polars). Must contain a LapTime column.
lib (str): Backend library name ("pandas" or "polars").
Returns:
- Filtered DataFrame with:
- Only rows where
LapTime is valid (not null/NaN)
- New
LapTimeSeconds column containing lap time as float (seconds)
- For pandas:
LapTime converted to timedelta64[ns] dtype
- For polars:
LapTime kept as original type, LapTimeSeconds added as Float64
Behavior:
- Pandas: Converts
LapTime to timedelta64[ns] and creates LapTimeSeconds as float
- Polars: Casts
LapTime to Float64 (non-strict) and aliases as LapTimeSeconds
- Optimization: Minimizes copies by filtering before copying (pandas) or using lazy operations (polars)
Performance:
- Time complexity: O(n) where n is the number of laps
- Memory overhead: Minimal (single column addition)
- Typical execution: ~5ms for 1000 laps
Example - Basic Usage:
from tif1.core_utils.helpers import _filter_valid_laptimes
import pandas as pd
# Raw laps data with some invalid times
laps = pd.DataFrame({
"Driver": ["VER", "VER", "VER", "HAM"],
"LapNumber": [1, 2, 3, 1],
"LapTime": [95.5, 90.2, None, 91.0], # Lap 3 is invalid
})
# Filter valid laps
valid_laps = _filter_valid_laptimes(laps, "pandas")
print(len(valid_laps)) # 3 (lap 3 removed)
print(valid_laps["LapTimeSeconds"].tolist()) # [95.5, 90.2, 91.0]
print(valid_laps["LapTime"].dtype) # timedelta64[ns]
Example - Polars Backend:
from tif1.core_utils.helpers import _filter_valid_laptimes
import polars as pl
# Raw laps data
laps = pl.DataFrame({
"Driver": ["VER", "VER", "VER"],
"LapNumber": [1, 2, 3],
"LapTime": ["95.5", "90.2", "89.8"],
})
# Filter and add numeric column
valid_laps = _filter_valid_laptimes(laps, "polars")
print(valid_laps["LapTimeSeconds"])
# ┌─────────────────┐
# │ LapTimeSeconds │
# │ --- │
# │ f64 │
# ╞═════════════════╡
# │ 95.5 │
# │ 90.2 │
# │ 89.8 │
# └─────────────────┘
Example - Handling Mixed Valid/Invalid Data:
from tif1.core_utils.helpers import _filter_valid_laptimes
import pandas as pd
import numpy as np
# Realistic race data with pit laps and deleted laps
laps = pd.DataFrame({
"Driver": ["VER"] * 10,
"LapNumber": range(1, 11),
"LapTime": [
95.5, # Out lap (slow)
90.2, # Valid
89.8, # Valid
None, # Pit lap (no time)
None, # In lap (no time)
92.1, # Out lap after pit
89.5, # Valid
89.3, # Valid
np.nan, # Deleted lap
90.0, # Valid
],
})
# Filter to valid laps only
valid_laps = _filter_valid_laptimes(laps, "pandas")
print(f"Original laps: {len(laps)}") # 10
print(f"Valid laps: {len(valid_laps)}") # 7
print(f"Removed: {len(laps) - len(valid_laps)}") # 3
Example - Performance Optimization:
import time
import pandas as pd
import numpy as np
from tif1.core_utils.helpers import _filter_valid_laptimes
# Large dataset (full race weekend, all drivers)
num_laps = 20_000 # ~20 drivers × ~1000 laps
laps = pd.DataFrame({
"Driver": np.random.choice(["VER", "HAM", "LEC"], num_laps),
"LapNumber": np.tile(range(1, 101), num_laps // 100),
"LapTime": np.random.uniform(85, 95, num_laps),
})
# Add some invalid laps
invalid_indices = np.random.choice(num_laps, size=2000, replace=False)
laps.loc[invalid_indices, "LapTime"] = None
# Benchmark filtering
start = time.perf_counter()
valid_laps = _filter_valid_laptimes(laps, "pandas")
elapsed = time.perf_counter() - start
print(f"Filtered {len(laps)} laps in {elapsed*1000:.1f}ms")
print(f"Valid laps: {len(valid_laps)} ({len(valid_laps)/len(laps)*100:.1f}%)")
# Output:
# Filtered 20000 laps in 45.2ms
# Valid laps: 18000 (90.0%)
This function is called automatically during session loading. The LapTimeSeconds column is essential for numerical analysis, sorting, and filtering operations that require numeric comparison.
_rename_columns
def _rename_columns(df, rename_map: dict, lib: str)
Rename DataFrame columns based on a mapping dictionary, with intelligent handling of duplicates and backend-specific APIs. This function is used to transform CDN column names to user-facing FastF1-compatible names.
Parameters:
df: DataFrame to rename (pandas or polars)
rename_map (dict): Mapping of old column names to new names. Use None as the value to drop a column.
lib (str): Backend library name ("pandas" or "polars")
Returns:
- DataFrame with renamed columns. Columns mapped to
None are dropped.
Special Handling:
- Duplicate prevention: Skips renames that would create duplicate column names
- Drop columns: Columns mapped to
None are removed from the DataFrame
- No-op renames: Skips renames where source == target (e.g.,
{"Driver": "Driver"})
- Existing columns: Avoids conflicts when target name already exists independently
Performance:
- Time complexity: O(n) where n is the number of columns
- Memory overhead: Minimal (column metadata only, no data copy)
- Typical execution: <1ms for typical DataFrames
Example - Basic Column Renaming:
from tif1.core_utils.helpers import _rename_columns
import pandas as pd
# Raw CDN data with abbreviated column names
df = pd.DataFrame({
"drv": ["VER", "HAM"],
"lap": [1, 1],
"time": [90.5, 91.2],
"s1": [28.1, 28.5],
})
# Rename to user-friendly names
rename_map = {
"drv": "Driver",
"lap": "LapNumber",
"time": "LapTime",
"s1": "Sector1Time",
}
df_renamed = _rename_columns(df, rename_map, "pandas")
print(df_renamed.columns.tolist())
# ['Driver', 'LapNumber', 'LapTime', 'Sector1Time']
Example - Dropping Columns:
from tif1.core_utils.helpers import _rename_columns
import pandas as pd
# Data with columns to drop
df = pd.DataFrame({
"driver": ["VER", "HAM"],
"lap": [1, 1],
"internal_id": [12345, 12346], # Internal field, not needed
"debug_flag": [True, False], # Debug field, not needed
})
# Rename and drop unwanted columns
rename_map = {
"driver": "Driver",
"lap": "LapNumber",
"internal_id": None, # Drop this column
"debug_flag": None, # Drop this column
}
df_clean = _rename_columns(df, rename_map, "pandas")
print(df_clean.columns.tolist())
# ['Driver', 'LapNumber']
Example - Handling Duplicate Conflicts:
from tif1.core_utils.helpers import _rename_columns
import pandas as pd
# DataFrame where target name already exists
df = pd.DataFrame({
"drv": ["VER"],
"Driver": ["HAM"], # Target name already exists!
"lap": [1],
})
# Attempt to rename drv → Driver
rename_map = {"drv": "Driver", "lap": "LapNumber"}
# Function skips the conflicting rename
df_renamed = _rename_columns(df, rename_map, "pandas")
print(df_renamed.columns.tolist())
# ['drv', 'Driver', 'LapNumber'] # 'drv' not renamed to avoid duplicate
Example - Polars Backend:
from tif1.core_utils.helpers import _rename_columns
import polars as pl
# Polars DataFrame
df = pl.DataFrame({
"drv": ["VER", "HAM"],
"lap": [1, 1],
"time": [90.5, 91.2],
})
# Rename columns
rename_map = {
"drv": "Driver",
"lap": "LapNumber",
"time": "LapTime",
}
df_renamed = _rename_columns(df, rename_map, "polars")
print(df_renamed.columns)
# ['Driver', 'LapNumber', 'LapTime']
Example - Integration with CDN Data Processing:
from tif1.core_utils.helpers import _rename_columns
from tif1.core_utils.constants import LAP_RENAME_MAP
def process_cdn_laps(raw_laps_df, lib: str):
"""Process raw CDN laps data to user-facing format."""
# Rename columns from CDN format to FastF1 format
laps = _rename_columns(raw_laps_df, LAP_RENAME_MAP, lib)
# Now columns have user-friendly names
# "drv" → "Driver"
# "lap" → "LapNumber"
# "time" → "LapTime"
# etc.
return laps
The function automatically handles edge cases like duplicate column names and missing columns, making it safe to use with varying CDN data formats across different years and sessions.
_apply_categorical
def _apply_categorical(df, cols: list, lib: str)
Apply categorical dtype to specified columns for memory optimization and faster operations. Categorical dtypes reduce memory usage by storing repeated string values as integer codes with a lookup table.
Parameters:
df: DataFrame to modify (pandas or polars)
cols (list): List of column names to convert to categorical
lib (str): Backend library name ("pandas" or "polars")
Returns:
- DataFrame with specified columns converted to categorical dtype
Benefits of Categorical Dtype:
- Memory savings: 50-90% reduction for columns with low cardinality
- Faster operations: Comparisons and groupby operations are faster
- Preserved semantics: String operations still work as expected
Performance:
- Time complexity: O(n × m) where n is rows and m is number of categorical columns
- Memory savings: Typically 70-80% for Driver, Team, Compound columns
- Typical execution: ~10ms for 1000 laps with 4 categorical columns
Example - Basic Usage:
from tif1.core_utils.helpers import _apply_categorical
import pandas as pd
# Laps data with repeated string values
laps = pd.DataFrame({
"Driver": ["VER"] * 50 + ["HAM"] * 50, # Only 2 unique values
"Team": ["Red Bull Racing"] * 50 + ["Mercedes"] * 50,
"Compound": ["SOFT"] * 30 + ["MEDIUM"] * 40 + ["HARD"] * 30,
"LapTime": range(100),
})
print(f"Memory before: {laps.memory_usage(deep=True).sum() / 1024:.1f} KB")
# Memory before: 25.3 KB
# Apply categorical dtype
categorical_cols = ["Driver", "Team", "Compound"]
laps_cat = _apply_categorical(laps, categorical_cols, "pandas")
print(f"Memory after: {laps_cat.memory_usage(deep=True).sum() / 1024:.1f} KB")
# Memory after: 8.7 KB (65% reduction)
print(laps_cat["Driver"].dtype) # category
print(laps_cat["Team"].dtype) # category
Example - Memory Savings Analysis:
from tif1.core_utils.helpers import _apply_categorical
import pandas as pd
# Realistic race data (20 drivers, 58 laps each)
drivers = ["VER", "PER", "HAM", "RUS", "LEC", "SAI", "NOR", "PIA",
"ALO", "STR", "OCO", "GAS", "ALB", "SAR", "TSU", "RIC",
"BOT", "ZHO", "MAG", "HUL"]
laps = pd.DataFrame({
"Driver": drivers * 58,
"Team": ["Red Bull Racing"] * 116 + ["Mercedes"] * 116 +
["Ferrari"] * 116 + ["McLaren"] * 116 +
["Aston Martin"] * 116 + ["Alpine"] * 116 +
["Williams"] * 116 + ["Sauber"] * 116 + ["Haas"] * 116,
"Compound": (["SOFT"] * 20 + ["MEDIUM"] * 30 + ["HARD"] * 8) * 20,
"LapNumber": list(range(1, 59)) * 20,
})
# Measure memory usage
mem_before = laps.memory_usage(deep=True).sum()
# Apply categorical
laps_cat = _apply_categorical(laps, ["Driver", "Team", "Compound"], "pandas")
mem_after = laps_cat.memory_usage(deep=True).sum()
savings = (1 - mem_after / mem_before) * 100
print(f"Memory before: {mem_before / 1024:.1f} KB")
print(f"Memory after: {mem_after / 1024:.1f} KB")
print(f"Savings: {savings:.1f}%")
# Output:
# Memory before: 156.3 KB
# Memory after: 42.1 KB
# Savings: 73.1%
Example - Polars Backend:
from tif1.core_utils.helpers import _apply_categorical
import polars as pl
# Polars DataFrame
laps = pl.DataFrame({
"Driver": ["VER"] * 50 + ["HAM"] * 50,
"Team": ["Red Bull Racing"] * 50 + ["Mercedes"] * 50,
"LapNumber": range(1, 101),
})
# Apply categorical
laps_cat = _apply_categorical(laps, ["Driver", "Team"], "polars")
print(laps_cat.schema)
# {'Driver': Categorical, 'Team': Categorical, 'LapNumber': Int64}
Example - Performance Impact:
import time
import pandas as pd
from tif1.core_utils.helpers import _apply_categorical
# Large dataset
laps = pd.DataFrame({
"Driver": ["VER", "HAM", "LEC"] * 10000,
"Team": ["Red Bull Racing", "Mercedes", "Ferrari"] * 10000,
"LapTime": range(30000),
})
# Benchmark groupby without categorical
start = time.perf_counter()
result1 = laps.groupby("Driver")["LapTime"].mean()
time_without = time.perf_counter() - start
# Apply categorical
laps_cat = _apply_categorical(laps, ["Driver", "Team"], "pandas")
# Benchmark groupby with categorical
start = time.perf_counter()
result2 = laps_cat.groupby("Driver")["LapTime"].mean()
time_with = time.perf_counter() - start
print(f"Without categorical: {time_without*1000:.1f}ms")
print(f"With categorical: {time_with*1000:.1f}ms")
print(f"Speedup: {time_without/time_with:.2f}x")
# Output:
# Without categorical: 12.3ms
# With categorical: 4.1ms
# Speedup: 3.00x
The library automatically applies categorical dtype to Driver, Team, Compound, and TrackStatus columns during session loading. This optimization is transparent to users but provides significant memory and performance benefits.
Constants
Column name mappings and constants used throughout the library.
Column rename maps
The constants module defines mappings for renaming columns from CDN format to user-facing format:
# Lap data column renames
LAP_RENAME_MAP = {
"time": "LapTime",
"lap": "LapNumber",
"s1": "Sector1Time",
"s2": "Sector2Time",
"s3": "Sector3Time",
"compound": "Compound",
"stint": "Stint",
"life": "TyreLife",
# ... more mappings
}
# Telemetry column renames
TELEMETRY_RENAME_MAP = {
"time": "Time",
"speed": "Speed",
"rpm": "RPM",
"gear": "nGear",
"throttle": "Throttle",
"brake": "Brake",
"drs": "DRS",
# ... more mappings
}
Standard column order
# FastF1-compatible column order for Laps DataFrame
FASTF1_LAPS_COLUMN_ORDER = [
"index",
"Time",
"Driver",
"DriverNumber",
"LapTime",
"LapNumber",
"Stint",
"PitOutTime",
"PitInTime",
"Sector1Time",
"Sector2Time",
"Sector3Time",
"Sector1SessionTime",
"Sector2SessionTime",
"Sector3SessionTime",
"SpeedI1",
"SpeedI2",
"SpeedFL",
"SpeedST",
"IsPersonalBest",
"Compound",
"TyreLife",
"FreshTyre",
"Team",
"LapStartTime",
"LapStartDate",
"TrackStatus",
"Position",
# ... more columns
]
# Categorical columns for optimization
CATEGORICAL_COLS = ["Driver", "Team", "Compound", "TrackStatus"]
Constants
The constants module (constants.py) defines all column name mappings, configuration values, and standard column orders used throughout the library. These constants ensure consistency across the codebase and provide FastF1 compatibility.
Year Range Constants
MIN_YEAR = 2018 # First year with TracingInsights data
MAX_YEAR = 2100 # Maximum supported year (future-proofing)
These constants define the valid range for F1 season data. The library supports data from 2018 onwards (when the TracingInsights data collection began) through 2100 (allowing for future scheduled races).
Example Usage:
from tif1.core_utils.constants import MIN_YEAR, MAX_YEAR
from tif1.core_utils.helpers import _validate_year
# Validate user input
year = 2024
_validate_year(year, MIN_YEAR, MAX_YEAR) # OK
# Check if year is in valid range
if MIN_YEAR <= year <= MAX_YEAR:
print(f"Year {year} is supported")
Cache Configuration
MAX_CACHE_SIZE = 100 # Maximum number of cached items
Defines the maximum number of items to store in the LRU cache for various operations. This prevents unbounded memory growth while maintaining good cache hit rates for typical usage patterns.
Column Rename Maps
The library defines comprehensive column rename mappings to transform CDN data format to user-facing FastF1-compatible format. These mappings handle both verbose and abbreviated column names from different CDN data versions.
Lap Data Rename Map
LAP_RENAME_MAP = {
# Core lap data
"time": "LapTime",
"lap": "LapNumber",
"compound": "Compound",
"stint": "Stint",
# Sector times
"s1": "Sector1Time",
"s2": "Sector2Time",
"s3": "Sector3Time",
# Tyre data
"life": "TyreLife",
"fresh": "FreshTyre",
"fresh_tyre": "FreshTyre",
# Position and status
"pos": "Position",
"status": "TrackStatus",
"pb": "IsPersonalBest",
# Driver and team
"drv": "Driver",
"source_driver": "Driver",
"driver_number": "DriverNumber",
"dNum": "DriverNumber",
"source_team": "Team",
"team": "Team",
# Session times
"session_time": "Time",
"sesT": "Time",
"pit_out_time": "PitOutTime",
"pout": "PitOutTime",
"pit_in_time": "PitInTime",
"pin": "PitInTime",
# Sector session times
"sector1_session_time": "Sector1SessionTime",
"s1T": "Sector1SessionTime",
"sector2_session_time": "Sector2SessionTime",
"s2T": "Sector2SessionTime",
"sector3_session_time": "Sector3SessionTime",
"s3T": "Sector3SessionTime",
# Speed traps
"speed_i1": "SpeedI1",
"vi1": "SpeedI1",
"speed_i2": "SpeedI2",
"vi2": "SpeedI2",
"speed_fl": "SpeedFL",
"vfl": "SpeedFL",
"speed_st": "SpeedST",
"vst": "SpeedST",
# Lap metadata
"lap_start_time": "LapStartTime",
"lST": "LapStartTime",
"lap_start_date": "LapStartDate",
"lSD": "LapStartDate",
"deleted": "Deleted",
"del": "Deleted",
"deleted_reason": "DeletedReason",
"delR": "DeletedReason",
"fastf1_generated": "FastF1Generated",
"ff1G": "FastF1Generated",
"is_accurate": "IsAccurate",
"iacc": "IsAccurate",
# Qualifying
"qs": "QualifyingSession",
"qualifying_session": "QualifyingSession",
# Weather (per-lap)
"weather_time": "WeatherTime",
"wT": "WeatherTime",
"air_temp": "AirTemp",
"wAT": "AirTemp",
"humidity": "Humidity",
"wH": "Humidity",
"pressure": "Pressure",
"wP": "Pressure",
"rainfall": "Rainfall",
"wR": "Rainfall",
"track_temp": "TrackTemp",
"wTT": "TrackTemp",
"wind_direction": "WindDirection",
"wWD": "WindDirection",
"wind_speed": "WindSpeed",
"wWS": "WindSpeed",
}
Example Usage:
from tif1.core_utils.constants import LAP_RENAME_MAP
from tif1.core_utils.helpers import _rename_columns
import pandas as pd
# Raw CDN data with abbreviated names
raw_laps = pd.DataFrame({
"drv": ["VER", "HAM"],
"lap": [1, 1],
"time": [90.5, 91.2],
"s1": [28.1, 28.5],
"s2": [35.2, 35.8],
"s3": [27.2, 27.9],
})
# Rename to FastF1 format
laps = _rename_columns(raw_laps, LAP_RENAME_MAP, "pandas")
print(laps.columns.tolist())
# ['Driver', 'LapNumber', 'LapTime', 'Sector1Time', 'Sector2Time', 'Sector3Time']
Telemetry Rename Map
TELEMETRY_RENAME_MAP = {
# Core telemetry
"time": "Time",
"rpm": "RPM",
"speed": "Speed",
"gear": "nGear",
"throttle": "Throttle",
"brake": "Brake",
"drs": "DRS",
# Distance
"distance": "Distance",
"rel_distance": "RelativeDistance",
# Relative positioning
"driver_ahead": "DriverAhead",
"distance_to_driver_ahead": "DistanceToDriverAhead",
# Acceleration (G-forces)
"acc_x": "AccelerationX",
"acc_y": "AccelerationY",
"acc_z": "AccelerationZ",
# Position coordinates
"x": "X",
"y": "Y",
"z": "Z",
# Metadata
"data_key": "DataKey",
"dataKey": "DataKey",
}
Example Usage:
from tif1.core_utils.constants import TELEMETRY_RENAME_MAP
# Telemetry columns are automatically renamed during loading
# "speed" → "Speed"
# "gear" → "nGear"
# "throttle" → "Throttle"
# etc.
Race Control Messages Rename Map
RACE_CONTROL_RENAME_MAP = {
"time": "Time",
"category": "Category",
"cat": "Category",
"message": "Message",
"msg": "Message",
"status": "Status",
"flag": "Flag",
"scope": "Scope",
"sector": "Sector",
"racing_number": "RacingNumber",
"dNum": "RacingNumber",
"lap": "Lap",
}
Weather Data Rename Map
WEATHER_RENAME_MAP = {
"time": "Time",
"wT": "Time",
"air_temp": "AirTemp",
"wAT": "AirTemp",
"humidity": "Humidity",
"wH": "Humidity",
"pressure": "Pressure",
"wP": "Pressure",
"rainfall": "Rainfall",
"wR": "Rainfall",
"track_temp": "TrackTemp",
"wTT": "TrackTemp",
"wind_direction": "WindDirection",
"wWD": "WindDirection",
"wind_speed": "WindSpeed",
"wWS": "WindSpeed",
}
Categorical Columns
CATEGORICAL_COLS = ["Driver", "Team", "Compound", "TrackStatus"]
List of columns that should be converted to categorical dtype for memory optimization. These columns have low cardinality (few unique values) and benefit significantly from categorical encoding.
Memory Savings:
- Driver: ~70% reduction (20 unique values in ~1000 laps)
- Team: ~75% reduction (10 unique values)
- Compound: ~80% reduction (3-5 unique values)
- TrackStatus: ~85% reduction (2-4 unique values)
Standard Column Names
COL_DRIVER = "Driver"
COL_TEAM = "Team"
COL_LAP_NUMBER = "LapNumber"
COL_LAP_TIME = "LapTime"
COL_LAP_TIME_SECONDS = "LapTimeSeconds"
COL_LAP_NUMBER_ALT = "lap" # Alternative name in some data sources
These constants provide a single source of truth for column names, preventing typos and making refactoring easier.
Example Usage:
from tif1.core_utils.constants import COL_DRIVER, COL_LAP_TIME_SECONDS
# Use constants instead of string literals
fastest_lap = laps.loc[laps[COL_LAP_TIME_SECONDS].idxmin()]
driver = fastest_lap[COL_DRIVER]
print(f"Fastest lap by {driver}")
FastF1 Column Order
FASTF1_LAPS_COLUMN_ORDER = [
# Core identification
"index",
"Time",
"Driver",
"DriverNumber",
# Lap timing
"LapTime",
"LapNumber",
"Stint",
"PitOutTime",
"PitInTime",
# Sector times
"Sector1Time",
"Sector2Time",
"Sector3Time",
"Sector1SessionTime",
"Sector2SessionTime",
"Sector3SessionTime",
# Speed traps
"SpeedI1",
"SpeedI2",
"SpeedFL",
"SpeedST",
# Lap metadata
"IsPersonalBest",
"Compound",
"TyreLife",
"FreshTyre",
"Team",
"LapStartTime",
"LapStartDate",
"TrackStatus",
"Position",
"Deleted",
"DeletedReason",
"FastF1Generated",
"IsAccurate",
# Weather (per-lap)
"WeatherTime",
"AirTemp",
"Humidity",
"Pressure",
"Rainfall",
"TrackTemp",
"WindDirection",
"WindSpeed",
# tif1-specific columns
"LapTimeSeconds",
"QualifyingSession",
]
This list defines the standard column order for laps DataFrames, ensuring FastF1 compatibility. Columns are ordered logically:
- Identification columns (index, driver, time)
- Core lap data (lap time, number, stint)
- Detailed timing (sectors, speed traps)
- Metadata (tyres, position, flags)
- Weather data
- tif1-specific additions
Example Usage:
from tif1.core_utils.helpers import _reorder_laps_columns
# Reorder DataFrame to match FastF1 column order
laps_ordered = _reorder_laps_columns(laps, "pandas")
# Now columns are in the standard order
print(laps_ordered.columns[:10].tolist())
# ['index', 'Time', 'Driver', 'DriverNumber', 'LapTime', ...]
Resource Manager
The resource manager module (resource_manager.py) provides a robust pattern for managing resources with guaranteed cleanup, even when initialization fails partway through. This is critical for preventing resource leaks in error scenarios.
Architecture
The ResourceManager class implements the context manager protocol and tracks resources in a LIFO (Last-In-First-Out) stack. When cleanup occurs, resources are cleaned up in reverse order of creation, ensuring that dependencies are respected.
Key Features:
- LIFO cleanup order: Resources cleaned up in reverse order of registration
- Error resilience: Cleanup continues even if individual resources fail to close
- Comprehensive logging: All cleanup operations are logged for debugging
- Multiple cleanup methods: Supports both
.close() and .shutdown() methods
ResourceManager
class ResourceManager:
"""Base class for resource management with guaranteed cleanup."""
def __init__(self) -> None:
"""Initialize the resource manager."""
self._resources: list[tuple[str, Any]] = []
self._initialized: bool = False
def _register_resource(self, name: str, resource: Any) -> None:
"""Register a resource for cleanup tracking."""
...
def _cleanup_resources(self) -> None:
"""Cleanup all registered resources in reverse order."""
...
def __enter__(self) -> ResourceManager:
"""Enter context manager."""
...
def __exit__(self, exc_type, exc_val, exc_tb) -> bool:
"""Exit context manager and cleanup resources."""
...
Methods
_register_resource
def _register_resource(self, name: str, resource: Any) -> None
Register a resource for cleanup tracking. Resources are cleaned up in reverse order of registration (LIFO), ensuring dependencies are respected during cleanup.
Parameters:
name (str): Human-readable name for the resource (used in logging). Should be descriptive for debugging purposes.
resource (Any): The resource object to track. Should have a .close() or .shutdown() method for cleanup.
Example:
class DatabaseManager(ResourceManager):
def __init__(self, db_url: str):
super().__init__()
try:
# Create connection
conn = create_connection(db_url)
self._register_resource("database_connection", conn)
# Create connection pool
pool = create_pool(conn)
self._register_resource("connection_pool", pool)
# Create cache
cache = create_cache()
self._register_resource("cache", cache)
self._initialized = True
except Exception:
# Cleanup any resources created before the error
self._cleanup_resources()
raise
_cleanup_resources
def _cleanup_resources(self) -> None
Cleanup all registered resources in reverse order (LIFO). This method attempts to clean up all resources even if individual cleanup operations fail. Cleanup errors are logged but do not prevent other resources from being cleaned up.
Cleanup Strategy:
- Iterate through resources in reverse order
- For each resource, try
.close() method first
- If no
.close(), try .shutdown(wait=True) method
- If neither exists, skip the resource
- Log any errors but continue cleanup
- Clear the resource list after all cleanup attempts
Example:
# Resources registered in this order:
# 1. database_connection
# 2. connection_pool
# 3. cache
# Cleanup happens in reverse order:
# 1. cache.close()
# 2. connection_pool.close()
# 3. database_connection.close()
Usage Patterns
Basic Context Manager Usage
from tif1.core_utils.resource_manager import ResourceManager
class SessionManager(ResourceManager):
def __init__(self, session_id: str):
super().__init__()
try:
# Create HTTP session
self.http_session = create_http_session()
self._register_resource("http_session", self.http_session)
# Create cache connection
self.cache = create_cache()
self._register_resource("cache", self.cache)
self._initialized = True
except Exception:
self._cleanup_resources()
raise
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self._cleanup_resources()
return False
# Use with context manager
with SessionManager("session_123") as manager:
# Resources are automatically cleaned up on exit
data = manager.http_session.get("https://example.com")
manager.cache.set("key", data)
# Cleanup happens here automatically
Error Handling During Initialization
from tif1.core_utils.resource_manager import ResourceManager
class ComplexManager(ResourceManager):
def __init__(self):
super().__init__()
try:
# Step 1: Create database connection
self.db = create_database()
self._register_resource("database", self.db)
# Step 2: Create thread pool (might fail)
self.pool = create_thread_pool()
self._register_resource("thread_pool", self.pool)
# Step 3: Create cache (might fail)
self.cache = create_cache()
self._register_resource("cache", self.cache)
self._initialized = True
except Exception as e:
# If any step fails, cleanup resources created so far
# e.g., if cache creation fails, pool and db are still cleaned up
self._cleanup_resources()
raise
# If initialization fails at step 2:
# - database is cleaned up
# - thread_pool is NOT registered (creation failed)
# - cache is NOT registered (never reached)
try:
manager = ComplexManager()
except Exception as e:
print(f"Initialization failed: {e}")
# Database was cleaned up automatically
Manual Cleanup
from tif1.core_utils.resource_manager import ResourceManager
class ManualManager(ResourceManager):
def __init__(self):
super().__init__()
self.session = create_session()
self._register_resource("session", self.session)
def close(self):
"""Explicit cleanup method."""
self._cleanup_resources()
# Use without context manager
manager = ManualManager()
try:
# Do work
data = manager.session.get("https://example.com")
finally:
# Explicit cleanup
manager.close()
Logging
The ResourceManager provides comprehensive logging for debugging:
import logging
logging.basicConfig(level=logging.DEBUG)
with SessionManager("test") as manager:
pass
# Log output:
# DEBUG:tif1.core_utils.resource_manager:Registered resource: http_session
# DEBUG:tif1.core_utils.resource_manager:Registered resource: cache
# DEBUG:tif1.core_utils.resource_manager:Closing resource: cache
# DEBUG:tif1.core_utils.resource_manager:Closing resource: http_session
# DEBUG:tif1.core_utils.resource_manager:All resources cleaned up successfully
Error Resilience
If cleanup fails for one resource, other resources are still cleaned up:
class FailingResource:
def close(self):
raise RuntimeError("Cleanup failed!")
class ResilientManager(ResourceManager):
def __init__(self):
super().__init__()
self._register_resource("good_resource", GoodResource())
self._register_resource("failing_resource", FailingResource())
self._register_resource("another_good_resource", GoodResource())
with ResilientManager() as manager:
pass
# Log output:
# DEBUG: Closing resource: another_good_resource ✓
# WARNING: Error cleaning up resource failing_resource: Cleanup failed!
# DEBUG: Closing resource: good_resource ✓
# WARNING: Cleanup completed with errors: failing_resource: Cleanup failed!
ResourceManager is used internally by the library for managing HTTP sessions, cache connections, and other resources. Most users don’t need to interact with it directly, but it’s available for advanced use cases requiring custom resource management.
JSON Parsing
The library uses orjson for JSON parsing, which provides:
- 2-3x faster parsing than stdlib json
- Lower memory usage
- Native support for bytes input
- Automatic handling of numpy types
Lib Conversion
When converting between backends:
- pandas → polars: Uses PyArrow for zero-copy when possible
- polars → pandas: Uses PyArrow by default for efficiency
- Rechunking: Optional for polars to optimize memory layout
Benchmark results:
# pandas → polars conversion
# 1M rows: ~50ms (zero-copy via PyArrow)
# polars → pandas conversion
# 1M rows: ~100ms (with PyArrow)
Advanced Usage
Custom lib conversion
from tif1.core_utils.backend_conversion import convert_backend
import tif1
# Load 2021 Belgian Grand Prix Race with pandas
session = tif1.get_session(2021, "Belgian Grand Prix", "Race", lib="pandas")
laps_pandas = session.laps
# Convert to polars for analysis
laps_polars = convert_backend(laps_pandas, "polars")
# Perform polars operations
import polars as pl
fast_laps = laps_polars.filter(pl.col("LapTimeSeconds") < 120.0)
# Convert back to pandas if needed
fast_laps_pandas = convert_backend(fast_laps, "pandas")
Custom JSON Processing
from tif1.core_utils.json_utils import json_loads, json_dumps
# Parse JSON from CDN
json_data = '{"laps": [{"lap": 1, "time": 90.5}]}'
data = json_loads(json_data)
# Modify data
data["laps"][0]["time"] = 89.5
# Serialize back
modified_json = json_dumps(data)
Best Practices
- Use orjson for JSON: Always use
json_loads/json_dumps for performance
- Prefer PyArrow conversion: Keep
use_pyarrow=True for lib conversion
- Validate early: Use validation helpers to catch errors early
- Let the library handle resources: ResourceManager is automatic
- Use constants for column names: Reference standard column names from constants
Summary
The core_utils package provides:
- High-performance JSON parsing with orjson
- Efficient lib conversion (pandas ↔ polars)
- Data validation utilities
- Column name standardization
- Resource management
- Internal helpers for DataFrame operations
These utilities enable the library’s focus on performance and reliability.