Overview
The HTTP and networking layer forms the critical foundation of tif1’s data fetching infrastructure, meticulously engineered for maximum performance, reliability, and resilience when loading Formula 1 telemetry and session data from remote CDN sources. This sophisticated networking stack represents years of optimization work specifically tailored to the unique challenges of F1 data loading, where cold starts can require fetching 100+ telemetry files in parallel while maintaining sub-second response times.
Every aspect of the networking layer has been optimized for the specific patterns of F1 data access: high-concurrency parallel fetching, aggressive connection reuse, intelligent retry strategies, and graceful degradation under failure conditions. The result is a system that can load an entire race session (20 drivers × 50+ laps = 1000+ telemetry files) in just 2-5 seconds on a typical broadband connection.
Architecture Philosophy
The networking layer is built on four core principles that guide every design decision:
-
Performance First: Every design decision prioritizes speed and throughput. Connection pooling eliminates TCP handshake overhead (saving 30-50ms per request), HTTP/2 multiplexing allows multiple requests over a single connection, keep-alive optimization maintains persistent connections for up to 1000 requests, and zero-copy JSON parsing with
orjson provides 2-5x faster parsing than the standard library. These optimizations compound to deliver sub-second response times even when fetching 100+ files in parallel.
-
Resilience by Design: Network failures are expected, not exceptional. The circuit breaker pattern prevents cascading failures by blocking requests during sustained outages, exponential backoff with jitter reduces load on failing services while allowing recovery, and multi-CDN fallback ensures graceful degradation when primary sources fail. The system can tolerate complete CDN failures and automatically recover when services come back online.
-
Resource Efficiency: Thread pools with dynamic sizing, aggressive connection reuse (90%+ reuse rates), and intelligent caching minimize resource consumption while supporting high concurrency. The system can handle 128+ concurrent requests during telemetry prefetch without overwhelming the connection pool or exhausting file descriptors.
-
Observability: Real-time metrics for connection reuse, circuit breaker state, CDN health, and request throughput enable monitoring, debugging, and optimization. Automatic periodic logging provides visibility into system behavior without manual instrumentation.
Key Components
The networking stack provides a comprehensive set of features:
The session layer uses niquests (a modern fork of requests with HTTP/2 and HTTP/3 support) with aggressive connection pooling and reuse strategies:
- Dynamic Connection Pooling: Pool size automatically scales based on configured concurrency (minimum 256 connections, up to 512+ for burst handling)
- Connection Keep-Alive: Persistent connections with configurable timeout (default: 120s) and max requests per connection (default: 1000)
- HTTP/2 and HTTP/3 Support: Modern protocol support with multiplexing for reduced latency and improved throughput
- DNS-over-HTTPS (DoH) Fallback: Automatic fallback to Cloudflare and Google DoH resolvers for improved reliability and privacy
- Thread-Safe Design: All session operations are protected by locks for safe concurrent access
- Connection Reuse Metrics: Real-time tracking of connection pool efficiency with reuse rate monitoring
2. Async Parallel Fetching
Built on Python’s asyncio with thread pool executors for maximum concurrency:
- Concurrent Request Handling: Fetch 20-128+ files simultaneously (configurable based on workload)
- Thread Pool Optimization: Shared executor with worker count matching concurrency requirements
- Process Pool JSON Parsing: Optional multi-process JSON parsing for large payloads (configurable, disabled by default for telemetry)
- Semaphore-Based Rate Limiting: Automatic concurrency control to prevent CDN throttling
- Graceful Error Handling: Failed requests return
None instead of crashing, enabling partial data loading
- Memory-Efficient Batching: Support for processing large request lists in manageable chunks
3. Intelligent Retry Logic
Sophisticated retry mechanism with exponential backoff, jitter, and multi-CDN fallback:
- Configurable Retry Count: Default 3 retries, supports zero-retry mode for ultra-fast cold starts
- Exponential Backoff: Delay increases exponentially (2^attempt seconds) with configurable factor
- Jitter Support: Random jitter (0-1s) prevents thundering herd problems
- Max Delay Cap: Configurable maximum delay (default: 60s) prevents excessive wait times
- CDN Fallback: Automatically tries all configured CDN sources before failing
- Smart Error Classification: Distinguishes between retryable errors (5xx, timeouts) and fatal errors (404, validation failures)
- Pool Exhaustion Handling: Automatic backoff when connection pool is exhausted (0.5-5s with jitter)
4. Circuit Breaker Pattern
Prevents cascading failures and protects against sustained network issues:
- Failure Threshold: Opens after 5 consecutive failures (configurable)
- Timeout Period: Blocks requests for 60 seconds when open (configurable)
- Half-Open State: Automatically tests recovery with single request
- Thread-Safe State Transitions: Atomic state changes using reentrant locks
- Monotonic Time Tracking: Uses
time.monotonic() for accurate timeout tracking
- Per-Request Protection: All HTTP requests automatically protected by circuit breaker
5. Multi-CDN Fallback System
Robust CDN management with automatic failover, health tracking, and optional minification:
- Multiple CDN Support: Configure multiple CDN sources with priority ordering via
cdns config array
- Automatic Failover: Seamlessly switches to backup CDNs on failure without user intervention
- Health Tracking: Tracks failure counts per CDN, automatically disables after 3 consecutive failures
- jsDelivr Optimization: Primary CDN with optional minification support (20-40% size reduction for JSON files)
- URL Formatting: Automatic URL construction for TracingInsights data repositories with format
{cdn_base}/{year}@main/{gp}/{session}/{path}
- Dynamic Enable/Disable: CDNs can be dynamically disabled based on error patterns (e.g., HTTP/2 multiplexing incompatibility after redirects)
- Minification Support: Optional automatic minification for jsDelivr CDN (transforms
file.json → file.min.json)
- Failure Recovery: Automatic reset of failure counts on successful requests, allowing previously failed CDNs to recover
- Priority Ordering: CDNs are tried in priority order (lower priority number = higher precedence)
CDN Configuration:
The default CDN configuration uses jsDelivr as the primary source:
{
"cdns": ["https://cdn.jsdelivr.net/gh/TracingInsights"],
"cdn_use_minification": false
}
You can configure multiple CDNs for redundancy:
{
"cdns": [
"https://cdn.jsdelivr.net/gh/TracingInsights",
"https://your-custom-cdn.com/TracingInsights"
],
"cdn_use_minification": true
}
Minification Benefits:
When cdn_use_minification is enabled with jsDelivr, JSON files are automatically requested in minified form:
- Original:
https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/Bahrain/Race/laps_VER.json
- Minified:
https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/Bahrain/Race/laps_VER.min.json
This can reduce file sizes by 20-40%, significantly improving load times on slow connections. However, minification adds a small processing overhead on the CDN side, so it’s disabled by default for optimal performance on fast connections.
CDN Health Management:
The CDN manager tracks failures per source and automatically disables CDNs that fail repeatedly:
from tif1.cdn import get_cdn_manager
cdn_manager = get_cdn_manager()
# Check available CDNs
sources = cdn_manager.get_sources()
for source in sources:
print(f"{source.name}: enabled={source.enabled}, priority={source.priority}")
# Reset failure counts (useful after network recovery)
cdn_manager.reset()
Unsupported CDNs:
The following CDN sources are explicitly blocked due to rate limiting or compatibility issues:
raw.githubusercontent.com - GitHub’s raw content CDN has aggressive rate limiting (60 requests/hour for unauthenticated users)
Always use a proper CDN like jsDelivr that’s designed for high-volume content delivery.
6. Rate Limiting and Concurrency Control
Prevents CDN throttling and manages resource consumption:
- Configurable Concurrency: Default 20 concurrent requests, up to 128+ for telemetry prefetch
- Semaphore-Based Control: Async semaphores limit concurrent operations
- Per-Operation Limits: Different limits for different workloads (lap data vs telemetry)
- Backoff on Exhaustion: Automatic backoff when limits are reached
Real-time metrics for optimization and debugging:
- Connection Reuse Tracking: Monitor connection pool efficiency (reuse rate, total requests, pools created)
- Periodic Logging: Automatic statistics logging at configurable intervals (default: 60s)
- Circuit Breaker State: Inspect current state (closed, open, half-open) and failure counts
- CDN Health Metrics: Track per-CDN failure counts and enabled status
Use Cases and Optimizations
The networking layer is specifically optimized for F1 data loading scenarios:
Cold Start Optimization (100+ Telemetry Files)
When loading a session for the first time, tif1 may need to fetch 100+ telemetry files in parallel:
- High Concurrency: 128+ concurrent requests during telemetry prefetch
- Connection Reuse: Persistent connections eliminate TCP handshake overhead (30-50% faster)
- HTTP/2 Multiplexing: Multiple requests over single connection reduce latency
- Zero-Retry Mode: Ultra-fast mode tries each CDN once without delays (configurable via
max_retries=0)
- Process Pool Parsing: Optional multi-process JSON parsing for large payloads (disabled by default for telemetry due to IPC overhead)
Large JSON Payloads (Lap Data, Telemetry Arrays)
Lap and telemetry data can contain thousands of data points:
- Streaming JSON Parsing: Uses
orjson for fast, memory-efficient parsing
- Zero-Copy Operations: Direct byte buffer parsing where possible
- Validation Pipeline: Optional Pydantic validation (disabled by default for performance)
- Cache Integration: Parsed data automatically cached to avoid re-fetching
Network Failure Scenarios
Graceful degradation when CDN sources fail or network is unstable:
- Multi-CDN Fallback: Automatically tries backup CDNs (jsDelivr → custom CDNs)
- Circuit Breaker Protection: Prevents cascading failures by blocking requests during sustained outages
- Exponential Backoff: Reduces load on failing services while allowing recovery
- Partial Data Loading: Failed requests return
None, allowing successful data to be used
Resource-Constrained Environments
Optimizations for limited bandwidth or CPU:
- Configurable Concurrency: Reduce concurrent requests (5-10) for slow connections
- Connection Pool Sizing: Adjust pool size based on available resources
- Timeout Configuration: Increase timeouts (60s+) for slow networks
- Minification Support: Enable CDN minification for 20-40% bandwidth savings
Thread Safety and Resource Management
All networking components are designed for safe concurrent use:
- Thread-Safe Sessions: HTTP session protected by locks for concurrent access
- Reentrant Locks: Circuit breaker uses reentrant locks for nested call support
- Atomic State Transitions: All state changes use compare-and-swap patterns
- Automatic Cleanup: Resources automatically released via
atexit handlers
- Graceful Shutdown: Executors and sessions properly closed on program exit
Configuration
The networking layer is highly configurable via environment variables or .tif1rc file. All networking-related settings can be tuned for your specific use case, from ultra-fast cold starts to resource-constrained environments.
Configuration File Location:
The .tif1rc file is a JSON configuration file that can be placed in:
- Path specified by
TIF1_CONFIG_FILE environment variable (highest priority)
- Current working directory (if
TIF1_TRUST_CWD_CONFIG=true)
- Home directory (
~/.tif1rc) (default location)
Complete Networking Configuration:
{
"timeout": 30,
"max_retries": 3,
"retry_backoff_factor": 2.0,
"retry_jitter": true,
"retry_jitter_max": 1.0,
"max_retry_delay": 60.0,
"circuit_breaker_threshold": 5,
"circuit_breaker_timeout": 60,
"max_workers": 20,
"max_concurrent_requests": 20,
"pool_connections": 256,
"pool_maxsize": 512,
"keepalive_timeout": 120,
"keepalive_max_requests": 1000,
"http_multiplexed": true,
"http_disable_http3": false,
"http_resolvers": ["standard", "doh://cloudflare", "doh://google"],
"cdns": ["https://cdn.jsdelivr.net/gh/TracingInsights"],
"cdn_use_minification": false,
"pool_exhaustion_backoff_base": 0.5,
"pool_exhaustion_backoff_max": 5.0,
"pool_exhaustion_backoff_jitter": 0.5,
"json_parse_workers": 0,
"telemetry_prefetch_max_concurrent_requests": 128,
"connection_stats_log_interval": 60.0
}
Configuration Options Explained:
Basic HTTP Settings:
timeout (int, default: 30): Request timeout in seconds. Increase for slow connections or large files.
max_retries (int, default: 3): Maximum retry attempts per CDN. Set to 0 for zero-retry mode (ultra-fast cold start).
max_workers (int, default: 20): Thread pool size for async operations. Should match or exceed max_concurrent_requests.
Retry Strategy:
retry_backoff_factor (float, default: 2.0): Exponential backoff multiplier. Delay = factor^attempt seconds.
retry_jitter (bool, default: true): Add random jitter to backoff delays to prevent thundering herd.
retry_jitter_max (float, default: 1.0): Maximum jitter in seconds (random value between 0 and this).
max_retry_delay (float, default: 60.0): Maximum backoff delay in seconds (caps exponential growth).
Circuit Breaker:
circuit_breaker_threshold (int, default: 5): Number of consecutive failures before opening circuit.
circuit_breaker_timeout (int, default: 60): Seconds to wait before testing recovery (half-open state).
Connection Pooling:
pool_connections (int, default: 256): Number of connection pools to maintain. Auto-calculated if not set.
pool_maxsize (int, default: 512): Maximum connections per pool. Should be 2-4x pool_connections.
keepalive_timeout (int, default: 120): Keep-alive timeout in seconds for persistent connections.
keepalive_max_requests (int, default: 1000): Maximum requests per connection before recycling.
Concurrency Control:
max_concurrent_requests (int, default: 20): Maximum concurrent HTTP requests for general operations.
telemetry_prefetch_max_concurrent_requests (int, default: 128): Maximum concurrent requests for telemetry prefetch.
Protocol Configuration:
http_multiplexed (bool, default: true): Enable HTTP/2 multiplexing for reduced latency.
http_disable_http3 (bool, default: false): Disable HTTP/3 if experiencing compatibility issues.
http_resolvers (array, default: [“standard”, “doh://cloudflare”, “doh://google”]): DNS resolver priority order.
CDN Configuration:
cdns (array, default: [“https://cdn.jsdelivr.net/gh/TracingInsights”]): CDN sources in priority order.
cdn_use_minification (bool, default: false): Enable automatic minification for jsDelivr (20-40% size reduction).
Pool Exhaustion Handling:
pool_exhaustion_backoff_base (float, default: 0.5): Base backoff delay when pool is exhausted.
pool_exhaustion_backoff_max (float, default: 5.0): Maximum backoff delay for pool exhaustion.
pool_exhaustion_backoff_jitter (float, default: 0.5): Random jitter for pool exhaustion backoff.
Advanced Performance:
json_parse_workers (int, default: 0): Number of worker processes for JSON parsing. 0 = disabled (recommended for telemetry).
connection_stats_log_interval (float, default: 60.0): Seconds between automatic connection stats logging.
Environment Variable Equivalents:
All configuration options can be set via environment variables with the TIF1_ prefix:
# Basic HTTP settings
export TIF1_TIMEOUT=30
export TIF1_MAX_RETRIES=3
export TIF1_MAX_WORKERS=20
# Retry strategy
export TIF1_RETRY_BACKOFF_FACTOR=2.0
export TIF1_RETRY_JITTER=true
export TIF1_RETRY_JITTER_MAX=1.0
export TIF1_MAX_RETRY_DELAY=60.0
# Circuit breaker
export TIF1_CIRCUIT_BREAKER_THRESHOLD=5
export TIF1_CIRCUIT_BREAKER_TIMEOUT=60
# Connection pooling
export TIF1_POOL_CONNECTIONS=256
export TIF1_POOL_MAXSIZE=512
export TIF1_KEEPALIVE_TIMEOUT=120
export TIF1_KEEPALIVE_MAX_REQUESTS=1000
# Concurrency
export TIF1_MAX_CONCURRENT_REQUESTS=20
export TIF1_TELEMETRY_PREFETCH_MAX_CONCURRENT_REQUESTS=128
# Protocol
export TIF1_HTTP_MULTIPLEXED=true
export TIF1_HTTP_DISABLE_HTTP3=false
export TIF1_HTTP_RESOLVERS="standard,doh://cloudflare,doh://google"
# CDN
export TIF1_CDNS="https://cdn.jsdelivr.net/gh/TracingInsights"
export TIF1_CDN_USE_MINIFICATION=false
# Pool exhaustion
export TIF1_POOL_EXHAUSTION_BACKOFF_BASE=0.5
export TIF1_POOL_EXHAUSTION_BACKOFF_MAX=5.0
export TIF1_POOL_EXHAUSTION_BACKOFF_JITTER=0.5
# Advanced
export TIF1_JSON_PARSE_WORKERS=0
export TIF1_CONNECTION_STATS_LOG_INTERVAL=60.0
Configuration Presets:
Ultra-Fast Cold Start (Zero-Retry Mode):
{
"max_retries": 0,
"max_concurrent_requests": 128,
"telemetry_prefetch_max_concurrent_requests": 256,
"pool_connections": 512,
"pool_maxsize": 2048
}
Slow Connection (Conservative):
{
"timeout": 60,
"max_retries": 5,
"max_concurrent_requests": 5,
"telemetry_prefetch_max_concurrent_requests": 10,
"retry_backoff_factor": 3.0
}
Resource-Constrained (Low Memory/CPU):
{
"max_workers": 5,
"max_concurrent_requests": 5,
"telemetry_prefetch_max_concurrent_requests": 10,
"pool_connections": 50,
"pool_maxsize": 100,
"json_parse_workers": 0
}
High-Reliability (Maximum Retries):
{
"max_retries": 10,
"retry_backoff_factor": 1.5,
"max_retry_delay": 120.0,
"circuit_breaker_threshold": 10,
"circuit_breaker_timeout": 120,
"cdns": [
"https://cdn.jsdelivr.net/gh/TracingInsights",
"https://backup-cdn.example.com/TracingInsights"
]
}
See Configuration for complete details on all configuration options.
HTTP Session Management
The HTTP session layer provides a shared, thread-safe niquests.Session instance with optimized configuration for F1 data fetching. The session is lazily initialized on first use and automatically cleaned up on program exit.
get_session
Get or create the global HTTP session instance with connection pooling and DoH fallback.
def get_session() -> Any # Returns niquests.Session
This function implements a thread-safe singleton pattern, ensuring only one session instance exists across the entire application. The session is configured with:
Connection Pooling:
- Dynamic pool sizing based on configured concurrency levels
- Minimum 256 connections for optimal performance
- Maximum pool size of 512+ for burst handling
- Non-blocking pool behavior (pool_block=False) to prevent deadlocks
Keep-Alive Configuration:
- Persistent connections with 120-second timeout (configurable via
keepalive_timeout)
- Maximum 1000 requests per connection (configurable via
keepalive_max_requests)
- Automatic connection recycling when limits are reached
DNS Resolution:
- Primary: Standard system DNS resolver (uses OS-configured DNS servers)
- Fallback 1: Cloudflare DNS-over-HTTPS (
doh://cloudflare - 1.1.1.1)
- Fallback 2: Google DNS-over-HTTPS (
doh://google - 8.8.8.8)
- Configurable via
http_resolvers config option (array of resolver strings)
- Automatic fallback if primary resolver fails during session initialization
- DoH provides improved privacy and reliability compared to traditional DNS
DNS-over-HTTPS (DoH) Benefits:
DoH encrypts DNS queries, providing several advantages:
- Privacy: DNS queries are encrypted and can’t be intercepted or logged by ISPs
- Reliability: Cloudflare and Google DNS have 99.99%+ uptime
- Performance: Global anycast networks provide low-latency resolution
- Security: DNSSEC validation prevents DNS spoofing attacks
Configuring DNS Resolvers:
You can customize the DNS resolver order in your .tif1rc file:
{
"http_resolvers": ["standard", "doh://cloudflare", "doh://google"]
}
Or via environment variable:
export TIF1_HTTP_RESOLVERS="standard,doh://cloudflare,doh://google"
Valid resolver values:
"standard", "default", "none", or "" - Use system DNS
"doh://cloudflare" - Cloudflare DNS-over-HTTPS (1.1.1.1)
"doh://google" - Google DNS-over-HTTPS (8.8.8.8)
The session will try each resolver in order until one succeeds. If all resolvers fail, a NetworkError is raised.
Protocol Support:
- HTTP/2 with multiplexing (configurable via
http_multiplexed, default: true)
- Multiple requests over a single TCP connection
- Reduces latency by eliminating TCP handshake overhead
- Header compression reduces bandwidth usage
- Server push support (though not used by current CDNs)
- Automatic protocol negotiation via ALPN
- HTTP/3 support (can be disabled via
http_disable_http3, default: false)
- QUIC-based protocol for improved performance over lossy networks
- Faster connection establishment (0-RTT)
- Better handling of packet loss
- Automatic fallback to HTTP/2 if HTTP/3 is unavailable
- Automatic protocol negotiation
- Client and server negotiate the best available protocol
- Transparent fallback from HTTP/3 → HTTP/2 → HTTP/1.1
HTTP/2 Multiplexing Benefits:
HTTP/2 multiplexing is a game-changer for F1 data loading:
- Single Connection: All requests to a CDN use one TCP connection instead of 20+
- No Head-of-Line Blocking: Slow requests don’t block fast ones
- Reduced Latency: Eliminates TCP handshake overhead (30-50ms per request)
- Lower Resource Usage: Fewer file descriptors and socket buffers
For telemetry prefetch (100+ files), HTTP/2 multiplexing can reduce total load time by 40-60% compared to HTTP/1.1.
Multiplexing Compatibility:
Some CDNs have issues with HTTP/2 multiplexing after redirects. If you encounter errors like "non-multiplexed response after a redirect", the CDN will be automatically disabled. You can also manually disable multiplexing:
{
"http_multiplexed": false
}
Or via environment variable:
export TIF1_HTTP_MULTIPLEXED=false
HTTP/3 Considerations:
HTTP/3 provides benefits for high-latency or lossy networks, but may have compatibility issues with some CDNs or corporate firewalls. If you experience connection issues, try disabling HTTP/3:
{
"http_disable_http3": true
}
Or via environment variable:
export TIF1_HTTP_DISABLE_HTTP3=true
Performance Optimizations:
trust_env=False to skip costly environment variable resolution on every request
- Custom User-Agent header for CDN optimization
- Connection reuse tracking for monitoring
Returns:
- Shared
niquests.Session instance (thread-safe, lazily initialized)
Raises:
NetworkError: If session creation fails with all configured DNS resolvers
Thread Safety:
- Uses double-checked locking pattern for safe concurrent initialization
- All session operations are thread-safe
- Multiple threads can safely call
get_session() simultaneously
Example:
from tif1.http_session import get_session
# Get shared session (thread-safe)
session = get_session()
# Make requests (connection automatically reused)
response1 = session.get("https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/Bahrain/Race/drivers.json")
response2 = session.get("https://cdn.jsdelivr.net/gh/TracingInsights/2024@main/Bahrain/Race/laps_VER.json")
# Connections are automatically reused between requests
data1 = response1.json()
data2 = response2.json()
The HTTP session is automatically managed and thread-safe. You rarely need to interact with it directly - the async fetch layer handles all session management internally.
Do not close the session manually unless you’re certain no other code is using it. The session is shared across the entire application and automatically cleaned up on program exit.
close_session
Close the global HTTP session and release all connections.
def close_session() -> None
This function performs a graceful shutdown of the HTTP session, closing all active connections and releasing resources. It’s automatically registered with atexit for cleanup on program exit, so manual calls are rarely necessary.
Behavior:
- Closes all connections in the connection pool
- Releases socket file descriptors
- Clears internal session state
- Thread-safe with lock protection
- Idempotent (safe to call multiple times)
When to Use:
- Testing scenarios where you need to reset session state
- Long-running applications that need to free resources
- Before forking processes (to avoid sharing file descriptors)
Example:
from tif1.http_session import close_session
import atexit
# Automatic cleanup on program exit (already registered by default)
atexit.register(close_session)
# Manual cleanup (rarely needed)
close_session()
Session cleanup is automatically registered with atexit, so manual cleanup is usually unnecessary. The session will be properly closed when your program exits.
get_connection_stats
Get current connection pool statistics for monitoring, debugging, and optimization.
def get_connection_stats() -> dict[str, Any]
This function provides real-time metrics about connection pool usage, helping you understand how effectively connections are being reused. High reuse rates (>80%) indicate optimal performance, while low rates may suggest configuration issues.
Returns:
Dictionary with the following metrics:
total_requests (int): Total number of HTTP requests made since session creation
connections_reused (int): Number of requests that reused existing connections from the pool
connections_created (int): Number of connection pools created (typically 1 for the shared session)
reuse_rate (float): Percentage of requests that reused connections (0-100)
Reuse Rate Interpretation:
- 90-100%: Excellent - optimal connection reuse
- 70-90%: Good - most connections are reused
- 50-70%: Fair - some optimization possible
- <50%: Poor - check pool configuration or request patterns
Example:
from tif1.http_session import get_connection_stats
# Fetch some data first
import tif1
session = tif1.get_session(2024, "Bahrain", "Race")
session.load()
# Check connection pool efficiency
stats = get_connection_stats()
print(f"Total requests: {stats['total_requests']}")
print(f"Connections reused: {stats['connections_reused']}")
print(f"Reuse rate: {stats['reuse_rate']:.1f}%")
# Example output:
# Total requests: 156
# Connections reused: 148
# Reuse rate: 94.9%
Monitoring in Production:
import logging
from tif1.http_session import get_connection_stats
logger = logging.getLogger(__name__)
def log_connection_efficiency():
"""Log connection pool efficiency for monitoring."""
stats = get_connection_stats()
if stats['reuse_rate'] < 70:
logger.warning(
f"Low connection reuse rate: {stats['reuse_rate']:.1f}% "
f"({stats['connections_reused']}/{stats['total_requests']} requests)"
)
else:
logger.info(
f"Connection pool healthy: {stats['reuse_rate']:.1f}% reuse rate"
)
Connection statistics are automatically logged every 60 seconds (configurable via connection_stats_log_interval). You can use this function for on-demand monitoring or custom logging.
reset_connection_stats
Reset connection pool statistics to zero (primarily for testing).
def reset_connection_stats() -> None
This function clears all connection metrics, resetting counters to zero. It’s primarily useful in testing scenarios where you need to measure statistics for a specific operation without interference from previous requests.
Behavior:
- Resets
total_requests to 0
- Resets
connections_reused to 0
- Resets
connections_created to 0
- Resets last log time to 0
- Thread-safe with lock protection
Example:
from tif1.http_session import reset_connection_stats, get_connection_stats
# Reset stats before measurement
reset_connection_stats()
# Perform operations to measure
# ... fetch data ...
# Check stats for just this operation
stats = get_connection_stats()
print(f"Operation used {stats['total_requests']} requests")
This function is primarily for testing. Resetting stats in production may interfere with monitoring and logging systems.
Async Parallel Fetching
The async fetch module provides high-performance parallel data loading using Python’s asyncio with thread pool executors and optional process pool JSON parsing. This layer is specifically optimized for F1 data loading patterns, where cold starts may require fetching 100+ telemetry files simultaneously.
Architecture
The async fetch system uses a multi-layered architecture:
- Async Coordination Layer:
asyncio manages concurrent operations and task scheduling
- Thread Pool Executor: Handles blocking I/O operations (HTTP requests, file I/O, JSON parsing)
- Process Pool Executor: Optional multi-process JSON parsing for large payloads (configurable via
json_parse_workers)
- HTTP Session Layer: Shared
niquests.Session with connection pooling and reuse
- Cache Integration: Automatic cache reads/writes with memory and SQLite tiers
fetch_json_async
Asynchronously fetch and parse JSON data with caching, retry logic, CDN fallback, and validation.
async def fetch_json_async(
year: int,
gp: str,
session: str,
path: str,
max_retries: int | None = None,
timeout: int | None = None,
*,
use_cache: bool = True,
write_cache: bool = True,
validate_payload: bool = True
) -> dict[str, Any]
This is the core async fetching function that powers all data loading in tif1. It implements a sophisticated multi-stage pipeline optimized for performance and reliability.
Pipeline Stages:
- Cache Lookup: Checks in-memory cache first (<1ms), then SQLite cache (~5-10ms)
- Network Fetch: Tries primary CDN (jsDelivr), falls back to secondary CDNs on failure
- JSON Parsing: Uses
orjson for fast parsing (2-5x faster than stdlib), optional process pool for large payloads
- Validation: Optional Pydantic validation (non-strict mode allows partial data)
- Cache Write: Persists to SQLite and updates in-memory cache for fast subsequent access
Zero-Retry Mode: When max_retries=0, the function tries each CDN once without delays, achieving 50-70% faster cold starts. This mode is ideal for ultra-fast startup when you can tolerate occasional transient failures.
Pool Exhaustion Handling: When the connection pool is exhausted, automatic backoff is applied (0.5-5s with jitter) before retrying, preventing cascading failures.
Parameters:
year (int): Season year (e.g., 2024)
gp (str): Grand Prix name (e.g., “Bahrain Grand Prix”)
session (str): Session name (e.g., “Race”, “Qualifying”, “FP1”, “FP2”, “FP3”, “Sprint”)
path (str): Path to JSON file relative to session directory
- Driver list:
"drivers.json"
- Lap data:
"laps_VER.json", "laps_HAM.json", etc.
- Telemetry:
"telemetry_VER_1.json" (driver_lapnumber format)
- Weather:
"weather.json"
- Race control:
"rcm.json"
max_retries (int | None): Maximum retry attempts per CDN
- Default: Uses
max_retries config value (typically 3)
- Set to 0 for zero-retry mode (ultra-fast cold start, tries each CDN once)
- Set to 5+ for unreliable networks
- Each retry includes exponential backoff with jitter
timeout (int | None): Request timeout in seconds
- Default: Uses
timeout config value (typically 30)
- Increase to 60+ for slow connections or large files
- Applies per-request, not total operation time
- Timeout triggers retry with next CDN
use_cache (bool): If True, read from cache before network fetch
- Default: True
- Set to False to force fresh fetch (bypasses cache read entirely)
- Automatically False in CI mode (
ci_mode=True)
- Cache lookup checks memory first, then SQLite
write_cache (bool): If True, persist successful network responses to cache
- Default: True
- Set to False for ephemeral data (testing, one-time queries)
- Automatically False in CI mode (
ci_mode=True)
- Writes to both SQLite (persistent) and memory (fast subsequent access)
validate_payload (bool): If True, run Pydantic validation before returning data
- Default: True
- Set to False for 10-15% performance improvement
- Validation is non-strict (allows missing optional fields)
- Validates drivers, race control, weather, lap times, and telemetry based on file path
- Sanitizes telemetry payloads (removes validator-only defaults that break DataFrame construction)
Returns:
- Parsed JSON data as dictionary (never None, raises on error)
- Dictionary structure depends on file type:
drivers.json: {"DriverNumber": [...], "BroadcastName": [...], "TeamName": [...], ...}
laps_*.json: {"LapNumber": [...], "LapTime": [...], "Sector1Time": [...], ...}
telemetry_*_*.json: {"Distance": [...], "Speed": [...], "RPM": [...], ...}
weather.json: {"Time": [...], "AirTemp": [...], "TrackTemp": [...], ...}
rcm.json: {"Time": [...], "Category": [...], "Message": [...], ...}
Raises:
NetworkError: If network request fails after all retries and CDN fallbacks
- Includes URL and status code (if available) for debugging
- Raised when all CDNs fail or circuit breaker is open
- Check circuit breaker state and CDN health if this occurs frequently
DataNotFoundError: If data doesn’t exist (404 from all CDNs)
- Includes year, event, and session for context
- Not retried (404 is a fatal error indicating missing data)
- Common for sessions that haven’t been run yet or invalid driver codes
InvalidDataError: If JSON parsing or validation fails
- Includes reason and file path for debugging
- Not retried (data corruption is a fatal error)
- May indicate CDN issues or incompatible data format
Performance Characteristics:
- Cache Hit (Memory): <1ms - fastest path, data already in memory
- Cache Hit (SQLite): ~5-10ms - fast path, data on disk
- Cache Miss (Single File): 100-300ms - network latency + parsing
- Cache Miss (100+ Files Parallel): 2-5 seconds - with connection reuse and HTTP/2 multiplexing
- Zero-Retry Mode: 50-70% faster cold starts - tries each CDN once, no delays between attempts
Example - Basic Usage:
import asyncio
from tif1.async_fetch import fetch_json_async
async def load_drivers():
"""Load driver list for a session."""
data = await fetch_json_async(
2024, "Bahrain Grand Prix", "Race", "drivers.json"
)
print(f"Loaded {len(data['DriverNumber'])} drivers")
for i, driver in enumerate(data['BroadcastName']):
print(f" {data['DriverNumber'][i]}: {driver} ({data['TeamName'][i]})")
asyncio.run(load_drivers())
Example - Custom Timeout for Slow Connections:
import asyncio
from tif1.async_fetch import fetch_json_async
async def load_with_long_timeout():
"""Load data with extended timeout for slow connections."""
data = await fetch_json_async(
2024, "Bahrain Grand Prix", "Race", "laps_VER.json",
timeout=60 # 60 second timeout instead of default 30
)
print(f"Loaded {len(data['LapNumber'])} laps")
return data
asyncio.run(load_with_long_timeout())
Example - Zero-Retry Mode (Ultra-Fast Cold Start):
import asyncio
from tif1.async_fetch import fetch_json_async
async def ultra_fast_load():
"""Load data with zero retries for fastest possible cold start."""
data = await fetch_json_async(
2024, "Bahrain Grand Prix", "Race", "drivers.json",
max_retries=0 # Try each CDN once, no delays
)
return data
asyncio.run(ultra_fast_load())
Example - Bypass Cache for Fresh Data:
import asyncio
from tif1.async_fetch import fetch_json_async
async def force_fresh_fetch():
"""Force fresh fetch from network, bypassing cache entirely."""
data = await fetch_json_async(
2024, "Bahrain Grand Prix", "Race", "drivers.json",
use_cache=False, # Skip cache read
write_cache=False # Don't write to cache
)
return data
asyncio.run(force_fresh_fetch())
Example - Disable Validation for Maximum Performance:
import asyncio
from tif1.async_fetch import fetch_json_async
async def fast_load_no_validation():
"""Load data without validation for 10-15% performance improvement."""
data = await fetch_json_async(
2024, "Bahrain Grand Prix", "Race", "laps_VER.json",
validate_payload=False # Skip Pydantic validation
)
return data
asyncio.run(fast_load_no_validation())
Example - Comprehensive Error Handling:
import asyncio
from tif1.async_fetch import fetch_json_async
from tif1.exceptions import NetworkError, DataNotFoundError, InvalidDataError
async def robust_fetch():
"""Fetch with comprehensive error handling."""
try:
data = await fetch_json_async(
2024, "Bahrain Grand Prix", "Race", "laps_VER.json"
)
print(f"Success: {len(data['LapNumber'])} laps")
return data
except DataNotFoundError as e:
print(f"Data not found: {e.year} {e.event} {e.session}")
print("This session may not have been run yet or driver code is invalid")
return None
except NetworkError as e:
print(f"Network error: {e.url}")
print(f"Status code: {e.status_code}")
print("Check your internet connection, CDN health, or try again later")
return None
except InvalidDataError as e:
print(f"Invalid data: {e.reason}")
print("Data may be corrupted, try clearing cache or check CDN status")
return None
asyncio.run(robust_fetch())
This function is the foundation of all data loading in tif1. The high-level Session.load() API uses this internally for all network operations. You rarely need to call it directly unless you’re building custom data loading logic.
Setting max_retries=0 enables zero-retry mode for ultra-fast cold starts, but increases the chance of transient failures. Use only when speed is critical and you can handle occasional failures gracefully.
fetch_multiple_async
Fetch multiple JSON files in parallel with optimized batch size, graceful error handling, and automatic rate limiting.
async def fetch_multiple_async(
requests: list[tuple[int, str, str, str]],
*,
use_cache: bool = True,
write_cache: bool = True,
validate_payload: bool = True,
max_retries: int | None = None,
timeout: int | None = None,
max_concurrent_requests: int | None = None
) -> list[dict[str, Any] | None]
This function is the workhorse of parallel data loading, enabling efficient fetching of 100+ files simultaneously. It’s specifically optimized for telemetry prefetch scenarios where you need to load lap data and telemetry for all drivers in a session.
Concurrency Management:
The function uses two strategies based on request count and concurrency limit:
-
Small Batches (requests ≤ max_concurrent_requests):
- Uses
asyncio.gather() for maximum parallelism
- All requests execute simultaneously without queuing
- Optimal for <20 files (driver lap data)
-
Large Batches (requests > max_concurrent_requests):
- Uses semaphore-based rate limiting to prevent CDN throttling
- Limits concurrent requests, queues excess until slots become available
- Optimal for 100+ files (telemetry prefetch scenarios)
Error Handling Philosophy:
Unlike fetch_json_async which raises exceptions, this function implements graceful degradation for partial data loading:
- Failed requests return
None instead of raising exceptions
- Successful requests return parsed data dictionaries
DataNotFoundError (404) is silently converted to None (common for missing telemetry files)
- Other exceptions are logged with warning level and converted to
None
- Allows partial data loading (some files succeed, others fail)
- Check result list for
None values to identify failed requests
Parameters:
requests (list[tuple[int, str, str, str]]): List of (year, gp, session, path) tuples
- Each tuple represents one file to fetch
- Example:
[(2024, "Bahrain", "Race", "laps_VER.json"), (2024, "Bahrain", "Race", "laps_HAM.json")]
- Can contain 100+ tuples for telemetry prefetch (all drivers × all laps)
- Order is preserved in results (result[i] corresponds to requests[i])
use_cache (bool): If True, read from cache before network fetch
- Default: True
- Applies to all requests in the batch
- Cache hits are nearly instant (<1ms memory, ~10ms SQLite)
write_cache (bool): If True, persist successful network responses to cache
- Default: True
- Applies to all requests in the batch
- Failed requests are not cached
validate_payload (bool): If True, run payload validation before returning data
- Default: True
- Applies to all requests in the batch
- Disable for 10-15% performance improvement on large batches
max_retries (int | None): Maximum retry attempts per request
- Default: Uses
max_retries config value (typically 3)
- Applies independently to each request
- Failed requests after all retries return
None
timeout (int | None): Request timeout in seconds
- Default: Uses
timeout config value (typically 30)
- Applies per-request, not total batch time
- Increase for slow connections or large files
max_concurrent_requests (int | None): Maximum concurrent requests
- Default: Uses
max_concurrent_requests config value (typically 20)
- Increase to 50-128 for fast connections and telemetry prefetch
- Decrease to 5-10 for slow connections or rate-limited CDNs
- Higher values = faster but more resource intensive
Returns:
- List of fetched data dictionaries (same length as
requests)
- Each element is either:
dict[str, Any]: Successfully fetched and parsed data
None: Request failed (logged but not raised)
- Order matches input
requests list (result[i] corresponds to requests[i])
- Filter out
None values to get only successful results
Performance Characteristics:
- 20 files, 20 concurrency: ~1-2 seconds (network latency dominant)
- 100 files, 20 concurrency: ~5-8 seconds (queuing + network)
- 100 files, 128 concurrency: ~2-4 seconds (maximum parallelism, optimal for telemetry)
- Cache hits: <100ms for entire batch (memory/SQLite)
- Connection reuse: 90%+ reuse rate with proper pool configuration
Example - Load Multiple Drivers:
import asyncio
from tif1.async_fetch import fetch_multiple_async
async def load_multiple_drivers():
"""Load lap data for multiple drivers in parallel."""
drivers = ["VER", "HAM", "LEC", "SAI", "NOR"]
# Build requests for all drivers
requests = [
(2024, "Bahrain Grand Prix", "Race", f"laps_{driver}.json")
for driver in drivers
]
# Fetch all in parallel (default 20 concurrent)
results = await fetch_multiple_async(requests)
# Process results (None for failed requests)
for driver, data in zip(drivers, results):
if data:
lap_count = len(data['LapNumber'])
print(f"{driver}: {lap_count} laps")
else:
print(f"{driver}: Failed to load")
asyncio.run(load_multiple_drivers())
Example - Telemetry Prefetch (100+ Files with High Concurrency):
import asyncio
from tif1.async_fetch import fetch_multiple_async
async def prefetch_all_telemetry():
"""Prefetch telemetry for all drivers and laps with high concurrency."""
drivers = ["VER", "HAM", "LEC", "SAI", "NOR"]
laps = range(1, 58) # 57 laps
# Build requests for all driver/lap combinations (285 files)
requests = [
(2024, "Bahrain Grand Prix", "Race", f"telemetry_{driver}_{lap}.json")
for driver in drivers
for lap in laps
]
print(f"Fetching {len(requests)} telemetry files...")
# Fetch with high concurrency for speed (128 concurrent requests)
results = await fetch_multiple_async(
requests,
max_concurrent_requests=128 # High concurrency for telemetry
)
# Count successes
success_count = sum(1 for r in results if r is not None)
print(f"Successfully loaded {success_count}/{len(requests)} files")
print(f"Failed: {len(requests) - success_count} files")
asyncio.run(prefetch_all_telemetry())
Example - Custom Concurrency for Slow Connections:
import asyncio
from tif1.async_fetch import fetch_multiple_async
async def load_with_low_concurrency():
"""Load files with reduced concurrency for slow connections."""
requests = [
(2024, "Bahrain Grand Prix", "Race", f"laps_{driver}.json")
for driver in ["VER", "HAM", "LEC", "SAI", "NOR"]
]
# Limit to 5 concurrent requests for slow network
results = await fetch_multiple_async(
requests,
max_concurrent_requests=5
)
return results
asyncio.run(load_with_low_concurrency())
Example - Batch Processing Large Datasets:
import asyncio
from tif1.async_fetch import fetch_multiple_async
async def batch_process_large_dataset():
"""Process large dataset in manageable batches to control memory usage."""
# Generate 500 requests (all telemetry for one driver)
all_requests = [
(2024, "Bahrain Grand Prix", "Race", f"telemetry_VER_{lap}.json")
for lap in range(1, 501)
]
batch_size = 50
all_results = []
# Process in batches of 50
for i in range(0, len(all_requests), batch_size):
batch = all_requests[i:i+batch_size]
batch_num = i//batch_size + 1
total_batches = (len(all_requests) + batch_size - 1) // batch_size
print(f"Processing batch {batch_num}/{total_batches}")
results = await fetch_multiple_async(
batch,
max_concurrent_requests=20
)
all_results.extend(results)
# Optional: Add delay between batches to avoid rate limiting
if i + batch_size < len(all_requests):
await asyncio.sleep(1)
success_count = sum(1 for r in all_results if r is not None)
print(f"Total: {success_count}/{len(all_requests)} successful")
return all_results
asyncio.run(batch_process_large_dataset())
Example - Filter Successful Results:
import asyncio
from tif1.async_fetch import fetch_multiple_async
async def load_and_filter():
"""Load multiple files and filter out failures."""
drivers = ["VER", "HAM", "LEC", "SAI", "NOR", "INVALID"]
requests = [
(2024, "Bahrain Grand Prix", "Race", f"laps_{driver}.json")
for driver in drivers
]
results = await fetch_multiple_async(requests)
# Filter out None values (failed requests)
successful = [
(driver, data)
for driver, data in zip(drivers, results)
if data is not None
]
print(f"Loaded {len(successful)}/{len(drivers)} drivers")
for driver, data in successful:
print(f"{driver}: {len(data['LapNumber'])} laps")
return successful
asyncio.run(load_and_filter())
Example - Progress Tracking for Large Batches:
import asyncio
from tif1.async_fetch import fetch_multiple_async
async def load_with_progress():
"""Load files with progress tracking."""
drivers = ["VER", "HAM", "LEC", "SAI", "NOR"]
laps = range(1, 58)
requests = [
(2024, "Bahrain Grand Prix", "Race", f"telemetry_{driver}_{lap}.json")
for driver in drivers
for lap in laps
]
print(f"Starting fetch of {len(requests)} files...")
start_time = asyncio.get_event_loop().time()
results = await fetch_multiple_async(
requests,
max_concurrent_requests=128
)
elapsed = asyncio.get_event_loop().time() - start_time
success_count = sum(1 for r in results if r is not None)
print(f"Completed in {elapsed:.2f}s")
print(f"Success rate: {success_count/len(requests)*100:.1f}%")
print(f"Throughput: {len(requests)/elapsed:.1f} files/sec")
return results
asyncio.run(load_with_progress())
This function is used internally by Session.load() for parallel driver lap data loading and telemetry prefetch. It’s the key to tif1’s fast cold start performance, enabling 2-5 second session loads even with 100+ telemetry files.
High concurrency (128+) can overwhelm slow networks or trigger CDN rate limiting. Start with default settings (20) and increase gradually while monitoring connection stats and success rates.
validate_payload: If True, run payload validation before returning data
max_retries: Maximum retry attempts per request
timeout: Request timeout in seconds
max_concurrent_requests: Maximum concurrent requests (defaults to config value, typically 20)
Returns:
- List of fetched data dictionaries (None for failed requests). Exceptions are logged and converted to None for graceful degradation. DataNotFoundError (404) is silently converted to None.
Example:
import asyncio
from tif1.async_fetch import fetch_multiple_async
async def load_multiple_drivers():
requests = [
(2021, "Belgian Grand Prix", "Race", "laps_VER.json"),
(2021, "Belgian Grand Prix", "Race", "laps_HAM.json"),
(2021, "Belgian Grand Prix", "Race", "laps_LEC.json"),
]
results = await fetch_multiple_async(requests)
for i, data in enumerate(results):
if data:
print(f"Driver {i+1}: {len(data['LapNumber'])} laps")
asyncio.run(load_multiple_drivers())
fetch_with_rate_limit
Execute async function with rate limiting using semaphore.
async def fetch_with_rate_limit(
coro_func,
*args,
semaphore: asyncio.Semaphore | None = None,
**kwargs
)
Parameters:
coro_func: Async function to execute
*args: Positional arguments for coro_func
semaphore: Optional semaphore for rate limiting. If None, creates one based on max_concurrent_requests config
**kwargs: Keyword arguments for coro_func
Returns:
- Result from coro_func execution
Example:
import asyncio
from tif1.async_fetch import fetch_with_rate_limit, fetch_json_async
async def fetch_with_custom_limit():
semaphore = asyncio.Semaphore(5)
result = await fetch_with_rate_limit(
fetch_json_async, 2021, "Belgian Grand Prix", "Race", "drivers.json",
semaphore=semaphore
)
print(f"Fetched {len(result)} drivers")
asyncio.run(fetch_with_custom_limit())
close_session
Close the async HTTP session and release resources.
def close_session() -> None
Example:
from tif1.async_fetch import close_session
import atexit
atexit.register(close_session)
cleanup_resources
Clean up all async resources including session and executor.
def cleanup_resources() -> None
Example:
from tif1.async_fetch import cleanup_resources
# At program exit
cleanup_resources()
Rate Limiting
The async fetch module includes automatic rate limiting to prevent CDN throttling:
- Maximum 20 concurrent requests by default (configurable via
max_concurrent_requests config)
- Configurable via
max_concurrent_requests parameter in fetch_multiple_async
- Automatic backoff on rate limit errors and pool exhaustion
- Semaphore-based concurrency control
Example with custom concurrency:
import asyncio
from tif1.async_fetch import fetch_multiple_async
async def load_all_telemetry():
requests = [...] # 100+ (year, gp, session, path) tuples
# Limit to 5 concurrent requests
results = await fetch_multiple_async(requests, max_concurrent_requests=5)
return results
Connection Pooling
The HTTP session uses aggressive connection pooling for maximum performance:
- Dynamic pool sizing based on concurrency (minimum 256 connections, 512 max pool size)
- Reuses TCP connections across requests
- Keep-alive connections with configurable timeout (default: 120s, 1000 max requests)
- Automatic connection cleanup
- Thread-safe for concurrent use
- Connection reuse metrics tracking
Benefits:
- 30-50% faster for multiple requests
- Lower CPU usage
- Reduced network overhead
- Optimized for high-concurrency telemetry fetching (128+ concurrent requests)
Configuration:
from tif1 import get_config
config = get_config()
config.set("pool_connections", 512) # Explicit pool size
config.set("pool_maxsize", 2048) # Max pool size for bursts
config.set("keepalive_timeout", 120) # Keep-alive timeout in seconds
Retry Logic
All HTTP requests include automatic retry with exponential backoff and CDN fallback:
- Default: 3 retries (configurable via
max_retries config)
- Backoff: 2^attempt seconds with jitter (configurable)
- Max delay: 60 seconds (configurable via
max_retry_delay)
- Retries on: Connection errors, timeouts, 5xx errors, pool exhaustion
- No retry on: 404 (data not found), 4xx client errors
- CDN fallback: Tries all configured CDN sources before failing
- Zero-retry mode: Ultra-fast cold start optimization (tries all CDNs once, no delays)
Example retry behavior:
Attempt 1: Immediate (CDN 1)
Attempt 1 failed: Try CDN 2, CDN 3...
Attempt 2: Wait ~2 seconds, retry all CDNs
Attempt 3: Wait ~4 seconds, retry all CDNs
Attempt 4: Fail with NetworkError
Pool exhaustion handling:
When connection pool is exhausted, automatic backoff is applied:
- Base backoff: 0.5 seconds (configurable via
pool_exhaustion_backoff_base)
- Max backoff: 5 seconds (configurable via
pool_exhaustion_backoff_max)
- Jitter: 0.5 seconds (configurable via
pool_exhaustion_backoff_jitter)
Circuit Breaker Integration
HTTP requests are protected by a circuit breaker to prevent cascading failures:
- Opens after 5 consecutive failures (configurable via
circuit_breaker_threshold)
- Blocks requests for 60 seconds when open (configurable via
circuit_breaker_timeout)
- Automatically tests recovery in half-open state
- Closes on successful request
- Thread-safe with atomic state transitions
- Uses monotonic time for accurate timeout tracking
States:
closed: Normal operation, requests allowed
open: Too many failures, requests blocked
half-open: Testing recovery, single request allowed
See Retry & Reliability for details.
Complete Examples
Parallel session loading
import asyncio
from tif1.async_fetch import fetch_multiple_async
async def load_session_parallel():
"""Load all driver lap data in parallel."""
drivers = ["VER", "HAM", "LEC", "SAI", "NOR"]
# Build requests for all drivers
requests = [
(2021, "Belgian Grand Prix", "Race", f"laps_{driver}.json")
for driver in drivers
]
# Fetch all in parallel
results = await fetch_multiple_async(requests, max_concurrent_requests=10)
for driver, data in zip(drivers, results):
if data:
print(f"{driver}: {len(data['LapNumber'])} laps")
asyncio.run(load_session_parallel())
Custom timeout handling
import asyncio
from tif1.async_fetch import fetch_json_async
from tif1.exceptions import NetworkError, DataNotFoundError
async def fetch_with_custom_timeout():
"""Fetch with custom timeout and error handling."""
try:
# Use 60 second timeout for slow connections
data = await fetch_json_async(
2021, "Belgian Grand Prix", "Race", "laps_VER.json",
timeout=60
)
print(f"Success: {len(data['LapNumber'])} laps")
except DataNotFoundError as e:
print(f"Data not found: {e.event} {e.session}")
except NetworkError as e:
print(f"Network error: {e.url}")
asyncio.run(fetch_with_custom_timeout())
Batch processing with rate limiting
import asyncio
from tif1.async_fetch import fetch_multiple_async
async def batch_fetch_telemetry():
"""Fetch telemetry for multiple driver/lap combinations."""
drivers = ["VER", "HAM", "LEC"]
laps = [1, 2, 3, 4, 5]
# Build requests for all combinations
requests = [
(2021, "Belgian Grand Prix", "Race", f"telemetry_{driver}_{lap}.json")
for driver in drivers
for lap in laps
]
# Fetch in batches of 20
batch_size = 20
all_results = []
for i in range(0, len(requests), batch_size):
batch = requests[i:i+batch_size]
results = await fetch_multiple_async(batch, max_concurrent_requests=10)
all_results.extend(results)
print(f"Processed batch {i//batch_size + 1}")
return all_results
asyncio.run(batch_fetch_telemetry())
Resource cleanup
import asyncio
import atexit
from tif1.async_fetch import cleanup_resources, fetch_json_async
# Register cleanup on exit (automatic, but shown for clarity)
atexit.register(cleanup_resources)
async def main():
"""Main application with automatic cleanup."""
data = await fetch_json_async(
2021, "Belgian Grand Prix", "Race", "drivers.json"
)
print(f"Loaded {len(data)} drivers")
asyncio.run(main())
# cleanup_resources() called automatically on exit
Zero-retry mode for ultra-fast cold starts
import asyncio
from tif1 import get_config
from tif1.async_fetch import fetch_json_async
async def ultra_fast_fetch():
"""Disable retries for fastest possible cold start."""
config = get_config()
config.set("max_retries", 0) # Try each CDN once, no delays
data = await fetch_json_async(
2021, "Belgian Grand Prix", "Race", "drivers.json"
)
print(f"Loaded {len(data)} drivers")
asyncio.run(ultra_fast_fetch())
-
Use async methods for multiple requests: 5-10x faster than sequential fetching.
-
Tune max_concurrent_requests for your network: Higher values (50-128) for fast connections, lower (5-10) for slow.
-
Disable validation in production: Saves 10-15% processing time.
data = await fetch_json_async(
2021, "Belgian Grand Prix", "Race", "drivers.json",
validate_payload=False
)
- Use zero-retry mode for cold starts: Fastest possible startup, tries all CDNs once.
from tif1 import get_config
config = get_config()
config.set("max_retries", 0)
- Monitor connection pool stats: Track reuse rate for optimization.
from tif1.http_session import get_connection_stats
stats = get_connection_stats()
print(f"Reuse rate: {stats['reuse_rate']:.1f}%")
- Monitor circuit breaker: Check state if experiencing network issues.
from tif1.retry import get_circuit_breaker
cb = get_circuit_breaker()
print(f"Circuit breaker state: {cb.state}")
- Clean up resources: Automatic via
atexit, but can call manually if needed.
from tif1.async_fetch import cleanup_resources
cleanup_resources()
- Use process pool for JSON parsing: For large payloads, enable multi-process parsing.
from tif1 import get_config
config = get_config()
config.set("json_parse_workers", 4) # Use 4 worker processes
Advanced Topics
Connection Pool Optimization
The connection pool is the heart of tif1’s networking performance. Understanding how to tune it for your workload is critical for optimal performance.
How Connection Pooling Works:
When you make an HTTP request, the session checks if an existing connection to that host is available in the pool:
- Pool Hit: Reuse existing connection (saves 30-50ms TCP handshake + TLS negotiation)
- Pool Miss: Create new connection (full TCP + TLS overhead)
With HTTP/2 multiplexing enabled, multiple requests can share a single connection, dramatically improving pool efficiency.
Pool Sizing Guidelines:
The pool size should match your concurrency requirements:
# Calculate optimal pool size
max_concurrent = max(
config.get("max_concurrent_requests", 20),
config.get("telemetry_prefetch_max_concurrent_requests", 128)
)
# Recommended sizing
pool_connections = max(256, max_concurrent) # At least 256, or match concurrency
pool_maxsize = pool_connections * 4 # 4x for burst handling
Monitoring Pool Health:
from tif1.http_session import get_connection_stats
import time
def monitor_pool_health(duration_seconds=60):
"""Monitor connection pool health over time."""
start_time = time.time()
while time.time() - start_time < duration_seconds:
stats = get_connection_stats()
print(f"Time: {time.time() - start_time:.1f}s")
print(f" Total requests: {stats['total_requests']}")
print(f" Reuse rate: {stats['reuse_rate']:.1f}%")
print(f" Pools created: {stats['connections_created']}")
# Alert on low reuse rate
if stats['total_requests'] > 50 and stats['reuse_rate'] < 70:
print(" ⚠️ WARNING: Low connection reuse rate!")
print(" Consider increasing pool_connections or reducing concurrency")
time.sleep(10)
# Run monitoring
monitor_pool_health(60)
Pool Exhaustion Detection:
Pool exhaustion occurs when all connections are in use and new requests must wait. This manifests as:
- Increased latency (requests queued waiting for connections)
- “Pool timeout” or “Max retries” errors
- Automatic backoff delays (0.5-5s with jitter)
Fixing Pool Exhaustion:
from tif1 import get_config
config = get_config()
# Option 1: Increase pool size
config.set("pool_connections", 512)
config.set("pool_maxsize", 2048)
# Option 2: Reduce concurrency
config.set("max_concurrent_requests", 10)
config.set("telemetry_prefetch_max_concurrent_requests", 50)
# Option 3: Increase backoff tolerance
config.set("pool_exhaustion_backoff_max", 10.0)
JSON Parsing Optimization
JSON parsing is a significant bottleneck for large payloads (lap data, telemetry arrays). tif1 uses several strategies to optimize parsing performance.
Default Strategy (orjson in thread pool):
By default, JSON parsing happens in the thread pool using orjson, which is 2-5x faster than the standard library:
import orjson
# Fast parsing with orjson
data = orjson.loads(response.content)
Process Pool Parsing (Advanced):
For very large payloads (>1MB), you can enable multi-process JSON parsing:
from tif1 import get_config
config = get_config()
config.set("json_parse_workers", 4) # Use 4 worker processes
When to Use Process Pool:
- ✅ Large lap data files (>1MB)
- ✅ CPU-bound workloads with few large files
- ❌ Telemetry prefetch (100+ small files) - IPC overhead dominates
- ❌ Resource-constrained environments
Benchmarking JSON Parsing:
import time
import asyncio
from tif1.async_fetch import fetch_json_async
from tif1 import get_config
async def benchmark_json_parsing():
"""Benchmark JSON parsing strategies."""
config = get_config()
# Test 1: Thread pool (default)
config.set("json_parse_workers", 0)
start = time.time()
data1 = await fetch_json_async(2024, "Bahrain", "Race", "laps_VER.json")
thread_time = time.time() - start
# Test 2: Process pool
config.set("json_parse_workers", 4)
start = time.time()
data2 = await fetch_json_async(2024, "Bahrain", "Race", "laps_VER.json", use_cache=False)
process_time = time.time() - start
print(f"Thread pool: {thread_time:.3f}s")
print(f"Process pool: {process_time:.3f}s")
print(f"Speedup: {thread_time/process_time:.2f}x")
asyncio.run(benchmark_json_parsing())
Zero-Retry Mode (Ultra-Fast Cold Start)
Zero-retry mode is an advanced optimization that trades reliability for speed. Instead of retrying failed requests with exponential backoff, it tries each CDN once and moves on.
Performance Impact:
- Normal mode (3 retries): 100 files in ~5-8 seconds
- Zero-retry mode: 100 files in ~2-3 seconds (50-70% faster)
Enabling Zero-Retry Mode:
from tif1 import get_config
config = get_config()
config.set("max_retries", 0)
Or via environment variable:
export TIF1_MAX_RETRIES=0
How It Works:
# Normal mode (max_retries=3):
# Try CDN 1 → fail → wait 1s → retry CDN 1 → fail → wait 2s → retry CDN 1 → fail
# Try CDN 2 → fail → wait 1s → retry CDN 2 → fail → wait 2s → retry CDN 2 → fail
# Total: ~15 seconds of delays
# Zero-retry mode (max_retries=0):
# Try CDN 1 → fail → immediately try CDN 2 → fail → immediately try CDN 3
# Total: ~0 seconds of delays
When to Use:
- ✅ Fast, reliable networks (fiber, datacenter)
- ✅ Development/testing environments
- ✅ When speed is critical and occasional failures are acceptable
- ❌ Slow or unreliable networks
- ❌ Production systems requiring high reliability
Handling Failures in Zero-Retry Mode:
import asyncio
from tif1.async_fetch import fetch_multiple_async
from tif1 import get_config
async def robust_zero_retry_fetch():
"""Fetch with zero-retry mode and handle failures gracefully."""
config = get_config()
config.set("max_retries", 0)
requests = [
(2024, "Bahrain", "Race", f"laps_{driver}.json")
for driver in ["VER", "HAM", "LEC", "SAI", "NOR"]
]
# First attempt with zero retries
results = await fetch_multiple_async(requests)
# Identify failures
failed_indices = [i for i, r in enumerate(results) if r is None]
if failed_indices:
print(f"First attempt: {len(results) - len(failed_indices)}/{len(results)} succeeded")
# Retry failures with normal retry logic
config.set("max_retries", 3)
failed_requests = [requests[i] for i in failed_indices]
retry_results = await fetch_multiple_async(failed_requests)
# Merge results
for i, result in zip(failed_indices, retry_results):
results[i] = result
print(f"After retry: {sum(1 for r in results if r)}/{len(results)} succeeded")
return results
asyncio.run(robust_zero_retry_fetch())
HTTP/2 Multiplexing Deep Dive
HTTP/2 multiplexing is one of the most impactful optimizations in tif1. Understanding how it works helps you maximize its benefits.
What is Multiplexing?
HTTP/2 allows multiple requests to share a single TCP connection:
HTTP/1.1 (6 connections):
Connection 1: [Request 1] → [Response 1]
Connection 2: [Request 2] → [Response 2]
Connection 3: [Request 3] → [Response 3]
Connection 4: [Request 4] → [Response 4]
Connection 5: [Request 5] → [Response 5]
Connection 6: [Request 6] → [Response 6]
HTTP/2 (1 connection):
Connection 1: [Req 1][Req 2][Req 3][Req 4][Req 5][Req 6]
[Res 1][Res 2][Res 3][Res 4][Res 5][Res 6]
Benefits:
- Reduced Latency: No TCP handshake overhead (saves 30-50ms per request)
- Lower Resource Usage: Fewer file descriptors, socket buffers, and kernel resources
- Better Throughput: No head-of-line blocking (slow requests don’t block fast ones)
- Header Compression: HPACK compression reduces bandwidth usage
Measuring Multiplexing Efficiency:
from tif1.http_session import get_connection_stats
import asyncio
from tif1.async_fetch import fetch_multiple_async
async def measure_multiplexing():
"""Measure HTTP/2 multiplexing efficiency."""
# Fetch 100 files
requests = [
(2024, "Bahrain", "Race", f"telemetry_VER_{lap}.json")
for lap in range(1, 101)
]
await fetch_multiple_async(requests, max_concurrent_requests=50)
# Check connection stats
stats = get_connection_stats()
print(f"Total requests: {stats['total_requests']}")
print(f"Connections created: {stats['connections_created']}")
print(f"Requests per connection: {stats['total_requests'] / stats['connections_created']:.1f}")
# With HTTP/2 multiplexing, you should see:
# - 1-2 connections created (one per CDN host)
# - 50-100 requests per connection
# - 95%+ reuse rate
asyncio.run(measure_multiplexing())
Troubleshooting Multiplexing Issues:
Some CDNs have issues with HTTP/2 multiplexing after redirects. If you see errors like:
MultiplexingError: non-multiplexed response after a redirect
The CDN will be automatically disabled. You can also manually disable multiplexing:
from tif1 import get_config
config = get_config()
config.set("http_multiplexed", False)
Circuit Breaker Patterns
The circuit breaker prevents cascading failures by blocking requests during sustained outages. Understanding its state machine helps you use it effectively.
State Machine:
CLOSED (normal operation)
↓ (5 consecutive failures)
OPEN (blocking requests)
↓ (60 seconds timeout)
HALF-OPEN (testing recovery)
↓ (success) ↓ (failure)
CLOSED OPEN
Monitoring Circuit Breaker:
from tif1.retry import get_circuit_breaker
import time
def monitor_circuit_breaker(duration_seconds=60):
"""Monitor circuit breaker state over time."""
cb = get_circuit_breaker()
start_time = time.time()
while time.time() - start_time < duration_seconds:
print(f"Time: {time.time() - start_time:.1f}s")
print(f" State: {cb.state}")
print(f" Failures: {cb.failures}")
if cb.state == "open":
print(" ⚠️ Circuit breaker is OPEN - requests are blocked!")
if cb.last_failure_time:
elapsed = (time.time() - cb._last_failure_monotonic)
remaining = cb.timeout - elapsed
print(f" Time until half-open: {remaining:.1f}s")
time.sleep(5)
monitor_circuit_breaker(60)
Manual Circuit Breaker Control:
from tif1.retry import get_circuit_breaker, reset_circuit_breaker
cb = get_circuit_breaker()
# Check state
if cb.state == "open":
print("Circuit breaker is open, waiting for recovery...")
# Option 1: Wait for automatic recovery
import time
time.sleep(cb.timeout)
# Option 2: Force reset (use with caution!)
reset_circuit_breaker()
print("Circuit breaker reset")
Tuning Circuit Breaker:
from tif1 import get_config
config = get_config()
# More tolerant (slower to open, longer recovery)
config.set("circuit_breaker_threshold", 10) # 10 failures before opening
config.set("circuit_breaker_timeout", 120) # 2 minutes recovery time
# Less tolerant (faster to open, shorter recovery)
config.set("circuit_breaker_threshold", 3) # 3 failures before opening
config.set("circuit_breaker_timeout", 30) # 30 seconds recovery time
Comprehensive monitoring helps you identify bottlenecks and optimize performance.
Real-Time Performance Dashboard:
import asyncio
import time
from tif1.http_session import get_connection_stats
from tif1.retry import get_circuit_breaker
from tif1.cdn import get_cdn_manager
async def performance_dashboard():
"""Real-time performance monitoring dashboard."""
start_time = time.time()
while True:
elapsed = time.time() - start_time
# Connection stats
conn_stats = get_connection_stats()
# Circuit breaker
cb = get_circuit_breaker()
# CDN health
cdn_manager = get_cdn_manager()
cdn_sources = cdn_manager.get_sources()
# Clear screen (optional)
print("\033[2J\033[H") # ANSI escape codes
print(f"=== tif1 Performance Dashboard (Runtime: {elapsed:.1f}s) ===\n")
print("Connection Pool:")
print(f" Total requests: {conn_stats['total_requests']}")
print(f" Reuse rate: {conn_stats['reuse_rate']:.1f}%")
print(f" Pools created: {conn_stats['connections_created']}")
if conn_stats['total_requests'] > 0:
throughput = conn_stats['total_requests'] / elapsed
print(f" Throughput: {throughput:.1f} req/s")
print(f"\nCircuit Breaker:")
print(f" State: {cb.state}")
print(f" Failures: {cb.failures}")
print(f"\nCDN Health:")
for source in cdn_sources:
status = "✓" if source.enabled else "✗"
print(f" {status} {source.name} (priority: {source.priority})")
await asyncio.sleep(1)
# Run dashboard
asyncio.run(performance_dashboard())
Profiling Network Operations:
import asyncio
import time
from tif1.async_fetch import fetch_multiple_async
async def profile_fetch_operation():
"""Profile a fetch operation to identify bottlenecks."""
requests = [
(2024, "Bahrain", "Race", f"laps_{driver}.json")
for driver in ["VER", "HAM", "LEC", "SAI", "NOR"]
]
# Measure total time
start = time.time()
results = await fetch_multiple_async(requests, max_concurrent_requests=20)
total_time = time.time() - start
# Calculate metrics
success_count = sum(1 for r in results if r is not None)
throughput = len(requests) / total_time
avg_time_per_request = total_time / len(requests)
print(f"Profile Results:")
print(f" Total time: {total_time:.3f}s")
print(f" Requests: {len(requests)}")
print(f" Successes: {success_count}")
print(f" Failures: {len(requests) - success_count}")
print(f" Throughput: {throughput:.1f} req/s")
print(f" Avg time per request: {avg_time_per_request:.3f}s")
# Connection stats
from tif1.http_session import get_connection_stats
stats = get_connection_stats()
print(f"\nConnection Pool:")
print(f" Reuse rate: {stats['reuse_rate']:.1f}%")
print(f" Pools created: {stats['connections_created']}")
asyncio.run(profile_fetch_operation())
Troubleshooting
This section provides comprehensive solutions to common networking issues you may encounter when using tif1.
Slow Requests
Symptoms:
- Requests taking >5 seconds to complete
- Timeouts on fast connections
- Inconsistent performance
Diagnosis:
import logging
from tif1 import get_config
from tif1.http_session import get_connection_stats
# Enable debug logging to see request timing
logging.basicConfig(level=logging.DEBUG)
# Check timeout setting
config = get_config()
print(f"Timeout: {config.get('timeout')}s")
print(f"Max retries: {config.get('max_retries')}")
print(f"Backoff factor: {config.get('retry_backoff_factor')}")
# Check connection pool health
stats = get_connection_stats()
print(f"Connection reuse rate: {stats['reuse_rate']:.1f}%")
Solutions:
from tif1 import get_config
config = get_config()
# Solution 1: Increase timeout for slow connections
config.set("timeout", 60)
# Solution 2: Reduce retries for faster failure
config.set("max_retries", 1)
# Solution 3: Enable zero-retry mode for ultra-fast cold start
config.set("max_retries", 0)
# Solution 4: Increase concurrency for parallel operations
config.set("max_concurrent_requests", 50)
config.set("telemetry_prefetch_max_concurrent_requests", 128)
# Solution 5: Check if HTTP/2 multiplexing is enabled
if not config.get("http_multiplexed"):
print("HTTP/2 multiplexing is disabled - consider enabling for better performance")
config.set("http_multiplexed", True)
Connection Errors
Symptoms:
NetworkError exceptions
- “Connection refused” or “Connection reset” errors
- Intermittent failures
Diagnosis:
from tif1.retry import get_circuit_breaker
from tif1.cdn import get_cdn_manager
from tif1 import get_config
# Check circuit breaker state
cb = get_circuit_breaker()
print(f"Circuit breaker state: {cb.state}")
print(f"Failure count: {cb.failures}")
# Check CDN health
cdn_manager = get_cdn_manager()
sources = cdn_manager.get_sources()
print(f"Available CDNs: {len(sources)}")
for source in sources:
print(f" {source.name}: enabled={source.enabled}")
# Check DNS resolvers
config = get_config()
print(f"DNS resolvers: {config.get('http_resolvers')}")
Solutions:
from tif1.retry import reset_circuit_breaker
from tif1.cdn import get_cdn_manager
from tif1 import get_config
# Solution 1: Reset circuit breaker if it's open
cb = get_circuit_breaker()
if cb.state == "open":
print("Circuit breaker is open, waiting for recovery...")
import time
time.sleep(cb.timeout)
reset_circuit_breaker()
print("Circuit breaker reset")
# Solution 2: Reset CDN failure counts
cdn_manager = get_cdn_manager()
cdn_manager.reset()
print("CDN failure counts reset")
# Solution 3: Add backup CDN sources
config = get_config()
config.set("cdns", [
"https://cdn.jsdelivr.net/gh/TracingInsights",
"https://backup-cdn.example.com/TracingInsights"
])
# Solution 4: Try different DNS resolvers
config.set("http_resolvers", ["doh://cloudflare", "doh://google", "standard"])
# Solution 5: Disable HTTP/3 if experiencing protocol issues
config.set("http_disable_http3", True)
Rate Limiting
Symptoms:
- 429 “Too Many Requests” errors
- Requests being throttled or blocked
- CDN returning rate limit errors
Diagnosis:
from tif1 import get_config
from tif1.http_session import get_connection_stats
config = get_config()
print(f"Max concurrent requests: {config.get('max_concurrent_requests')}")
print(f"Telemetry prefetch concurrency: {config.get('telemetry_prefetch_max_concurrent_requests')}")
stats = get_connection_stats()
print(f"Total requests: {stats['total_requests']}")
print(f"Requests per second: {stats['total_requests'] / 60:.1f}") # Assuming 60s runtime
Solutions:
import asyncio
from tif1.async_fetch import fetch_multiple_async
from tif1 import get_config
config = get_config()
# Solution 1: Reduce concurrent requests
config.set("max_concurrent_requests", 5)
config.set("telemetry_prefetch_max_concurrent_requests", 10)
# Solution 2: Add delays between batches
async def fetch_with_delays():
requests = [...] # Your requests
batch_size = 10
for i in range(0, len(requests), batch_size):
batch = requests[i:i+batch_size]
results = await fetch_multiple_async(batch, max_concurrent_requests=5)
# Add delay between batches
if i + batch_size < len(requests):
await asyncio.sleep(2) # 2 second delay
return results
# Solution 3: Increase retry backoff to give CDN time to recover
config.set("retry_backoff_factor", 3.0)
config.set("max_retry_delay", 120.0)
Memory Issues
Symptoms:
- Out of memory errors
- Slow performance due to swapping
- High memory usage during telemetry prefetch
Diagnosis:
import psutil
import os
# Check current memory usage
process = psutil.Process(os.getpid())
memory_mb = process.memory_info().rss / 1024 / 1024
print(f"Current memory usage: {memory_mb:.1f} MB")
from tif1 import get_config
config = get_config()
print(f"Max concurrent requests: {config.get('max_concurrent_requests')}")
print(f"Telemetry prefetch concurrency: {config.get('telemetry_prefetch_max_concurrent_requests')}")
print(f"JSON parse workers: {config.get('json_parse_workers')}")
Solutions:
import asyncio
from tif1.async_fetch import fetch_multiple_async
from tif1 import get_config
config = get_config()
# Solution 1: Reduce concurrency to limit memory usage
config.set("max_concurrent_requests", 5)
config.set("telemetry_prefetch_max_concurrent_requests", 10)
# Solution 2: Disable process pool JSON parsing
config.set("json_parse_workers", 0)
# Solution 3: Process in smaller batches and free memory
async def memory_efficient_fetch():
all_requests = [...] # Large list of requests
batch_size = 10
all_results = []
for i in range(0, len(all_requests), batch_size):
batch = all_requests[i:i+batch_size]
results = await fetch_multiple_async(batch)
# Process results immediately
for result in results:
if result:
# Process data
pass
# Free memory
del results
# Optional: Force garbage collection
import gc
gc.collect()
return all_results
# Solution 4: Reduce cache size
config.set("memory_cache_max_items", 256)
config.set("memory_telemetry_cache_max_items", 512)
Pool Exhaustion
Symptoms:
- “Pool timeout” or “Max retries” errors
- Automatic backoff delays (0.5-5s)
- Low connection reuse rate (<70%)
Diagnosis:
from tif1 import get_config
from tif1.http_session import get_connection_stats
# Check pool configuration
config = get_config()
print(f"Pool connections: {config.get('pool_connections')}")
print(f"Pool maxsize: {config.get('pool_maxsize')}")
print(f"Max concurrent requests: {config.get('max_concurrent_requests')}")
# Check connection stats
stats = get_connection_stats()
print(f"Total requests: {stats['total_requests']}")
print(f"Reuse rate: {stats['reuse_rate']:.1f}%")
print(f"Pools created: {stats['connections_created']}")
# Calculate if pool is undersized
max_concurrent = max(
config.get("max_concurrent_requests", 20),
config.get("telemetry_prefetch_max_concurrent_requests", 128)
)
print(f"Max concurrent: {max_concurrent}")
print(f"Pool connections: {config.get('pool_connections', 256)}")
if config.get('pool_connections', 256) < max_concurrent:
print("⚠️ WARNING: Pool is undersized for your concurrency!")
Solutions:
from tif1 import get_config
config = get_config()
# Solution 1: Increase pool size to match concurrency
max_concurrent = max(
config.get("max_concurrent_requests", 20),
config.get("telemetry_prefetch_max_concurrent_requests", 128)
)
config.set("pool_connections", max(512, max_concurrent))
config.set("pool_maxsize", max(2048, max_concurrent * 4))
# Solution 2: Reduce concurrency to match pool size
config.set("max_concurrent_requests", 10)
config.set("telemetry_prefetch_max_concurrent_requests", 50)
# Solution 3: Increase pool exhaustion backoff tolerance
config.set("pool_exhaustion_backoff_max", 10.0)
# Solution 4: Enable HTTP/2 multiplexing for better connection reuse
config.set("http_multiplexed", True)
DNS Resolution Failures
Symptoms:
- “Name or service not known” errors
- DNS lookup timeouts
- Intermittent connection failures
Diagnosis:
from tif1 import get_config
config = get_config()
print(f"DNS resolvers: {config.get('http_resolvers')}")
# Test DNS resolution manually
import socket
try:
socket.gethostbyname("cdn.jsdelivr.net")
print("✓ Standard DNS resolution works")
except socket.gaierror as e:
print(f"✗ Standard DNS resolution failed: {e}")
Solutions:
from tif1 import get_config
config = get_config()
# Solution 1: Prioritize DNS-over-HTTPS resolvers
config.set("http_resolvers", ["doh://cloudflare", "doh://google", "standard"])
# Solution 2: Use only DoH resolvers (bypass system DNS)
config.set("http_resolvers", ["doh://cloudflare", "doh://google"])
# Solution 3: Use only standard DNS (if DoH is blocked by firewall)
config.set("http_resolvers", ["standard"])
HTTP/2 Multiplexing Issues
Symptoms:
- “MultiplexingError: non-multiplexed response after a redirect” errors
- CDNs being automatically disabled
- Lower than expected connection reuse rate
Diagnosis:
from tif1 import get_config
from tif1.cdn import get_cdn_manager
config = get_config()
print(f"HTTP/2 multiplexing enabled: {config.get('http_multiplexed')}")
cdn_manager = get_cdn_manager()
sources = cdn_manager.get_sources()
for source in sources:
print(f"{source.name}: enabled={source.enabled}")
Solutions:
from tif1 import get_config
from tif1.cdn import get_cdn_manager
config = get_config()
# Solution 1: Disable HTTP/2 multiplexing
config.set("http_multiplexed", False)
# Solution 2: Reset CDN health and try again
cdn_manager = get_cdn_manager()
cdn_manager.reset()
# Solution 3: Use different CDN sources
config.set("cdns", ["https://alternative-cdn.example.com/TracingInsights"])
Validation Errors
Symptoms:
InvalidDataError exceptions
- “Validation failed” errors
- Data corruption warnings
Diagnosis:
from tif1 import get_config
config = get_config()
print(f"Validate data: {config.get('validate_data')}")
print(f"Validate lap times: {config.get('validate_lap_times')}")
print(f"Validate telemetry: {config.get('validate_telemetry')}")
Solutions:
from tif1 import get_config
import asyncio
from tif1.async_fetch import fetch_json_async
config = get_config()
# Solution 1: Disable validation for performance (use with caution)
config.set("validate_data", False)
config.set("validate_lap_times", False)
config.set("validate_telemetry", False)
# Solution 2: Fetch without validation for specific requests
async def fetch_without_validation():
data = await fetch_json_async(
2024, "Bahrain", "Race", "laps_VER.json",
validate_payload=False
)
return data
# Solution 3: Clear cache if data is corrupted
from tif1.cache import get_cache
cache = get_cache()
cache.clear()
print("Cache cleared")
Debugging Network Issues
Enable Comprehensive Logging:
import logging
# Enable debug logging for all tif1 modules
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
# Or enable for specific modules
logging.getLogger('tif1.http_session').setLevel(logging.DEBUG)
logging.getLogger('tif1.async_fetch').setLevel(logging.DEBUG)
logging.getLogger('tif1.retry').setLevel(logging.DEBUG)
logging.getLogger('tif1.cdn').setLevel(logging.DEBUG)
Capture Network Traffic:
import asyncio
from tif1.async_fetch import fetch_json_async
async def debug_fetch():
"""Fetch with detailed error information."""
try:
data = await fetch_json_async(
2024, "Bahrain", "Race", "laps_VER.json"
)
print(f"Success: {len(data)} keys")
except Exception as e:
print(f"Error type: {type(e).__name__}")
print(f"Error message: {e}")
# Print detailed error context
if hasattr(e, '__dict__'):
print(f"Error context: {e.__dict__}")
# Print traceback
import traceback
traceback.print_exc()
asyncio.run(debug_fetch())
Test Network Connectivity:
import asyncio
from tif1.async_fetch import fetch_json_async
from tif1.exceptions import NetworkError, DataNotFoundError, InvalidDataError
async def test_connectivity():
"""Test network connectivity and CDN health."""
test_cases = [
(2024, "Bahrain", "Race", "drivers.json", "Driver list"),
(2024, "Bahrain", "Race", "laps_VER.json", "Verstappen laps"),
(2024, "Bahrain", "Race", "telemetry_VER_1.json", "Verstappen lap 1 telemetry"),
]
for year, gp, session, path, description in test_cases:
try:
data = await fetch_json_async(year, gp, session, path)
print(f"✓ {description}: OK ({len(data)} keys)")
except DataNotFoundError:
print(f"✗ {description}: Data not found (404)")
except NetworkError as e:
print(f"✗ {description}: Network error ({e})")
except InvalidDataError as e:
print(f"✗ {description}: Invalid data ({e})")
except Exception as e:
print(f"✗ {description}: Unexpected error ({type(e).__name__}: {e})")
asyncio.run(test_connectivity())