System Architecture Overview
tif1’s architecture is designed around three core principles:- Performance First: Every component is optimized for speed, from HTTP/2 multiplexing to orjson parsing to categorical data types
- Resilience: Multi-tier caching, circuit breakers, and retry logic ensure reliability even under adverse network conditions
- Transparency: Comprehensive logging and monitoring allow you to understand exactly what’s happening at each stage
High-Level Architecture Diagram
Component Responsibilities
Session Object (core.py)
- Entry point for all data access
- Manages lazy loading of laps, telemetry, weather, and race control data
- Coordinates between cache layers and CDN fetching
- Handles backend selection (pandas vs polars)
cache.py)
- In-memory cache using Python’s
functools.lru_cache - Stores fully constructed Python objects (DataFrames, model instances)
- Default capacity: 1024 items (configurable via
TIF1_CACHE_SIZEenv var) - Eviction policy: Least Recently Used (LRU)
- Lifetime: Process duration only
cache.py)
- Disk-based cache using SQLite database
- Location:
~/.tif1/cache/tif1_cache.db(configurable viaTIF1_CACHE_DIR) - Stores compressed JSON representations
- Schema:
(key TEXT PRIMARY KEY, value BLOB, timestamp REAL) - Supports TTL-based expiration (default: 7 days)
- Thread-safe with connection pooling
cdn.py)
- Manages multiple CDN sources with automatic fallback
- Primary: StaticDelivr CDN (
cdn.staticdelivr.com/gh/TracingInsights/{year}@main) - Fallback: jsDelivr CDN (
cdn.jsdelivr.net/gh/TracingInsights/{year}@main) - Tracks failure counts per CDN source
- Automatically disables failing sources after 3 consecutive failures
- Handles URL encoding and path construction
- Never uses
raw.githubusercontent.com(strict rate limits)
async_fetch.py)
- Parallel HTTP requests using
niquests(HTTP/2 support) - Connection pooling and keep-alive
- Automatic retry with exponential backoff
- Timeout management (default: 30s per request)
- Progress tracking for batch operations
retry.py)
- Prevents cascading failures during network issues
- States: CLOSED (normal), OPEN (failing), HALF_OPEN (testing recovery)
- Failure threshold: 5 consecutive failures
- Recovery timeout: 60 seconds
- Automatic state transitions
orjson)
- High-performance JSON parsing (2-3x faster than stdlib
json) - Direct bytes-to-Python object conversion
- Handles large payloads efficiently (100MB+ telemetry files)
- Strict validation mode enabled
validation.py)
- Pydantic-based validation of JSON structure
- Ensures data integrity before DataFrame construction
- Type coercion and default value handling
- Detailed error messages for debugging
io_pipeline.py)
- Converts validated JSON to pandas/polars DataFrames
- Column renaming (snake_case → PascalCase)
- Type inference and optimization
- Index management
core.py)
- Adds computed columns (LapTimeSeconds, IsPersonalBest, etc.)
- Merges weather data with lap data
- Calculates stint information
- Adds driver metadata
core_utils/helpers.py)
- Converts string columns to categoricals (50-90% memory reduction)
- Downcasts numeric types where safe (float64 → float32)
- Optimizes datetime representations
- Handles missing data efficiently
Complete Data Loading Pipeline
The data loading pipeline consists of eight distinct stages, each with specific responsibilities and performance characteristics. Understanding each stage helps you optimize your code and troubleshoot issues effectively.Stage 1: Request Initiation
When you access data through a Session object, tif1 initiates the loading pipeline. This stage involves property access, lazy evaluation, and request routing.- Property access triggers
__getattribute__or explicit getter method - Session checks if data is already loaded (
self._laps is not None) - If not loaded, calls internal
_load_laps()method _load_laps()constructs cache key:f"laps_{year}_{gp}_{session_type}"- Passes control to cache layer
- Property access overhead: < 0.1ms
- Cache key construction: < 0.01ms
- No network I/O at this stage
Stage 2: Multi-Tier Cache Lookup
tif1 implements a sophisticated two-tier caching system that dramatically reduces load times for frequently accessed data. Understanding cache behavior is crucial for performance optimization.Tier 1: Memory LRU Cache (L1 Cache)
The memory cache is the fastest tier, storing fully constructed Python objects in RAM. Technical Specifications:- Implementation: Python
functools.lru_cachewith custom wrapper - Storage: In-process memory (heap)
- Data format: Native Python objects (DataFrames, model instances)
- Capacity: 1024 items (default), configurable via
TIF1_CACHE_SIZE - Eviction: Least Recently Used (LRU) algorithm
- Access time: < 1ms (typically 0.1-0.5ms)
- Thread safety: GIL-protected (safe for multi-threaded access)
- Persistence: None (cleared on process exit)
Tier 2: SQLite Persistent Cache (L2 Cache)
The SQLite cache provides persistent storage that survives process restarts. Technical Specifications:- Implementation: SQLite3 with custom connection pooling
- Storage: Disk-based database file
- Location:
~/.tif1/cache/tif1_cache.db(configurable viaTIF1_CACHE_DIR) - Data format: Compressed JSON (zlib compression, level 6)
- Capacity: Unlimited (constrained by disk space)
- Access time: 10-50ms (depends on disk I/O)
- Thread safety: Connection pooling with thread-local storage
- Persistence: Permanent (until manually cleared or TTL expires)
Cache Lookup Flow
| Scenario | Memory Cache | SQLite Cache | CDN Fetch |
|---|---|---|---|
| Access Time | < 1ms | 10-50ms | 2-5s |
| Throughput | 1000+ req/s | 50-100 req/s | 0.2-0.5 req/s |
| Persistence | No | Yes | N/A |
| Capacity | Limited (RAM) | Unlimited (disk) | N/A |
| Thread Safety | Yes (GIL) | Yes (pooling) | N/A |
Stage 3: CDN Fetching with Fallback Strategy
When data isn’t found in either cache tier, tif1 fetches from the CDN using a sophisticated multi-source strategy with automatic fallback.CDN Architecture
Primary Source: StaticDelivr CDN- URL Pattern:
https://cdn.staticdelivr.com/gh/TracingInsights/{year}@main/{path} - Global CDN with edge locations worldwide
- Automatic caching and compression
- No rate limits for reasonable usage
- HTTPS with HTTP/2 support
- Average latency: 50-200ms (depending on location)
- Uptime: 99.9%+
- URL Pattern:
https://cdn.jsdelivr.net/gh/TracingInsights/{year}@main/{path} - Global CDN with edge locations worldwide
- Automatic caching and compression
- No rate limits for reasonable usage
- HTTPS with HTTP/2 support
- Average latency: 50-200ms (depending on location)
- Used when StaticDelivr fails or is unavailable
- Never used due to strict rate limits (10 requests/hour)
- Will cause
NetworkErrorif all other sources fail
URL Construction
Fallback Logic
Circuit Breaker Pattern
tif1 implements a circuit breaker to prevent cascading failures during network issues. Circuit Breaker States:-
CLOSED (Normal Operation)
- All requests pass through
- Failures are counted
- Threshold: 5 consecutive failures
-
OPEN (Failing)
- Requests fail immediately without attempting network call
- Prevents overwhelming failing service
- Duration: 60 seconds
-
HALF_OPEN (Testing Recovery)
- Limited requests allowed through
- Success → transition to CLOSED
- Failure → transition back to OPEN
Retry Strategy
Request Timeout Management
Stage 4: Async Parallel Fetching
One of tif1’s most significant performance optimizations is parallel fetching of data for multiple drivers using asyncio and HTTP/2.Sequential vs Parallel Fetching
Sequential Fetching (Traditional Approach):HTTP/2 Multiplexing
tif1 usesniquests library which supports HTTP/2, enabling true request multiplexing over a single TCP connection.
HTTP/1.1 Limitations:
- One request per TCP connection
- Multiple connections required for parallelism (typically 6-8 max)
- High overhead: TCP handshake + TLS handshake per connection
- Head-of-line blocking
- Multiple requests over single TCP connection
- Binary framing for efficiency
- Header compression (HPACK)
- Server push capability (not used by tif1)
- Stream prioritization
| Metric | HTTP/1.1 | HTTP/2 |
|---|---|---|
| Connections | 6-8 | 1 |
| Handshake Overhead | High | Low |
| Request Latency | 500-800ms | 300-500ms |
| Throughput (20 drivers) | 3-4s | 0.5-0.8s |
| Memory Usage | Higher | Lower |
Connection Pooling
Progress Tracking
Error Handling in Parallel Fetching
Batch Size Optimization
Real-World Performance Example
Stage 5: High-Performance JSON Parsing
After fetching raw data from the CDN, tif1 parses JSON usingorjson, a high-performance JSON library that’s 2-3x faster than Python’s standard json module.
Why orjson?
Performance Comparison:| Library | Parse Time (10MB) | Serialize Time | Memory Usage |
|---|---|---|---|
json (stdlib) | 450ms | 380ms | High |
ujson | 280ms | 220ms | Medium |
orjson | 150ms | 120ms | Low |
- Written in Rust for maximum performance
- Direct bytes-to-Python object conversion (no intermediate string)
- Efficient handling of large payloads (100MB+ telemetry files)
- Strict validation mode
- Native support for datetime, UUID, and other types
Parsing Pipeline
Data Structure Examples
Lap Data JSON Structure:Parsing Performance Optimization
Lazy Parsing for Large Files:Error Recovery
Validation After Parsing
Stage 6: DataFrame Construction and Transformation
After parsing and validating JSON, tif1 constructs DataFrames with optimized column names, types, and ordering.DataFrame Construction Pipeline
Column Naming Convention
tif1 uses PascalCase for all column names to maintain consistency with F1 terminology and improve readability. Rename Mapping:Type Optimization
Pandas Type Optimization:Column Ordering
Columns are ordered logically for better readability:Index Management
Pandas Index Strategy:Missing Data Handling
DataFrame Validation
Performance Benchmarks
DataFrame Construction Performance:| Operation | Pandas | Polars | Speedup |
|---|---|---|---|
| Create from dict | 45ms | 12ms | 3.8x |
| Rename columns | 8ms | 2ms | 4.0x |
| Type conversion | 25ms | 5ms | 5.0x |
| Reorder columns | 3ms | 1ms | 3.0x |
| Set index | 5ms | N/A | N/A |
| Total | 86ms | 20ms | 4.3x |
| Data Type | Before Optimization | After Optimization | Savings |
|---|---|---|---|
| Lap data (1500 laps) | 15 MB | 4 MB | 73% |
| Telemetry (15k samples) | 45 MB | 18 MB | 60% |
| Weather (150 samples) | 0.8 MB | 0.3 MB | 63% |
Stage 7: Data Enrichment and Augmentation
After constructing the base DataFrame, tif1 automatically enriches data with computed columns, merged weather information, and derived metrics.Lap Data Enrichment
Computed Time Columns:Weather Data Integration
Telemetry Enrichment
Acceleration Calculation:Enrichment Performance
Enrichment Timing:| Enrichment Type | Time (1500 laps) | Time (Single Lap Telemetry) |
|---|---|---|
| Time calculations | 15ms | N/A |
| Position analysis | 25ms | N/A |
| Tyre strategy | 20ms | N/A |
| Performance flags | 30ms | N/A |
| Weather merge | 40ms | N/A |
| Telemetry acceleration | N/A | 50ms |
| Distance normalization | N/A | 5ms |
| Corner detection | N/A | 30ms |
| Total | 130ms | 85ms |
| Data Type | Before Enrichment | After Enrichment | Increase |
|---|---|---|---|
| Lap data | 4 MB | 7 MB | +75% |
| Telemetry | 18 MB | 25 MB | +39% |
Stage 8: Cache Storage and Finalization
The final stage saves processed data to both cache tiers and returns the DataFrame to the user.Cache Storage Strategy
Dual-Tier Write:Cache Metadata Tracking
Cache Eviction Policies
LRU Eviction (Memory Cache):Cache Statistics and Monitoring
Final Data Return
Complete Pipeline Timing
End-to-End Performance (Cold Start):| Stage | Time | Cumulative |
|---|---|---|
| 1. Request Initiation | < 1ms | < 1ms |
| 2. Cache Lookup (miss) | 2ms | 2ms |
| 3. CDN Fetching | 2000ms | 2002ms |
| 4. Async Parallel Fetch | 500ms | 2502ms |
| 5. JSON Parsing | 100ms | 2602ms |
| 6. DataFrame Construction | 86ms | 2688ms |
| 7. Data Enrichment | 130ms | 2818ms |
| 8. Cache Storage | 50ms | 2868ms |
| Total | ~2.9s |
| Stage | Time | Cumulative |
|---|---|---|
| 1. Request Initiation | < 1ms | < 1ms |
| 2. Cache Lookup (SQLite hit) | 30ms | 30ms |
| 3-7. (Skipped) | 0ms | 30ms |
| 8. Memory Cache Save | < 1ms | 31ms |
| Total | ~31ms |
| Stage | Time | Cumulative |
|---|---|---|
| 1. Request Initiation | < 1ms | < 1ms |
| 2. Cache Lookup (memory hit) | < 1ms | < 1ms |
| 3-8. (Skipped) | 0ms | < 1ms |
| Total | < 1ms |
- Warm vs Cold: 93x faster (31ms vs 2868ms)
- Hot vs Cold: 2868x faster (< 1ms vs 2868ms)
- Hot vs Warm: 31x faster (< 1ms vs 31ms) ## Data Transformation Through the Pipeline
Stage-by-Stage Data Evolution
Stage 1: Raw JSON (from CDN)
- Format: UTF-8 encoded JSON
- Size: ~2-5 KB per driver (compressed), ~10-20 KB (uncompressed)
- Naming: snake_case
- Types: Mixed (strings, numbers, booleans, nulls)
Stage 2: Python Dictionary (after orjson parsing)
- Format: Native Python dict
- Size: ~3x JSON size in memory (~30-60 KB)
- Types: Python native (int, float, str, bool, None)
- Access: O(1) dictionary lookups
Stage 3: Initial DataFrame (after construction)
- Format: pandas DataFrame
- Size: ~15 MB for 1500 laps (before optimization)
- Column names: snake_case
- Types: Default pandas types (int64, float64, object)
Stage 4: Renamed DataFrame (after column renaming)
- Format: pandas DataFrame
- Size: Same as Stage 3
- Column names: PascalCase (tif1 convention)
- Types: Still default types
Stage 5: Optimized DataFrame (after type optimization)
- Before: 15 MB
- After: 4 MB
- Reduction: 73%
Stage 6: Enriched DataFrame (after enrichment)
- Format: pandas DataFrame
- Size: ~7 MB (75% increase from Stage 5)
- Columns: Original + ~20 computed columns
- Ready for analysis
Stage 7: Final DataFrame (cached and returned)
- Format: pandas/polars DataFrame
- Size: ~7 MB (in memory)
- Cached: Yes (both memory and SQLite)
- Ready: For immediate analysis
Data Type Comparison: Pandas vs Polars
Pandas Types:- Polars uses more efficient internal representation
- Polars strings are always UTF-8 validated
- Polars categoricals use dictionary encoding by default
- Polars has better null handling (no NaN vs None confusion)
Memory Usage Comparison
Full Pipeline Memory Usage (1500 laps):| Stage | Pandas | Polars | Difference |
|---|---|---|---|
| Raw JSON | 15 KB | 15 KB | 0% |
| Python dict | 45 KB | 45 KB | 0% |
| Initial DataFrame | 15 MB | 8 MB | -47% |
| Optimized DataFrame | 4 MB | 2 MB | -50% |
| Enriched DataFrame | 7 MB | 3.5 MB | -50% |
Advanced Performance Optimizations
tif1 implements numerous performance optimizations throughout the data pipeline. Understanding these optimizations helps you write faster code and make informed architectural decisions.1. HTTP/2 Multiplexing and Connection Reuse
HTTP Protocol Evolution
HTTP/1.0 (Legacy):- One request per TCP connection
- Connection closed after each request
- High overhead: TCP handshake (3-way) + TLS handshake (2-3 round trips)
- Total overhead: ~200-300ms per request
- Connection keep-alive (reuse connection)
- Pipelining (limited browser support)
- Head-of-line blocking (requests must complete in order)
- Typical browser limit: 6-8 concurrent connections per domain
- Binary framing protocol (vs text-based HTTP/1.1)
- Multiplexing: Multiple requests over single connection
- Header compression (HPACK algorithm)
- Server push (not used by tif1)
- Stream prioritization
- No head-of-line blocking at HTTP layer
Performance Impact
Implementation in tif1
2. Lazy Loading and On-Demand Data Fetching
Lazy loading ensures data is only fetched when actually needed, reducing unnecessary network I/O and memory usage.Implementation
Performance Benefits
3. Categorical Data Type Optimization
Converting string columns to categoricals provides massive memory savings and faster operations.Memory Comparison
Performance Comparison
Automatic Categorization in tif1
4. Backend Selection: Pandas vs Polars
Choose the right backend for your use case to maximize performance.Performance Benchmarks
Operation Speed (1500 laps):| Operation | Pandas | Polars | Speedup |
|---|---|---|---|
| Load from JSON | 86ms | 20ms | 4.3x |
| Filter (single condition) | 2.5ms | 0.8ms | 3.1x |
| Filter (multiple conditions) | 5.2ms | 1.2ms | 4.3x |
| Groupby + aggregation | 12ms | 3ms | 4.0x |
| Sort | 8ms | 2ms | 4.0x |
| Join (merge) | 15ms | 4ms | 3.8x |
| Column selection | 0.5ms | 0.1ms | 5.0x |
| Row iteration | 450ms | 120ms | 3.8x |
| Data Type | Pandas | Polars | Savings |
|---|---|---|---|
| Lap data | 7 MB | 3.5 MB | 50% |
| Telemetry (15k samples) | 25 MB | 12 MB | 52% |
| Weather | 0.3 MB | 0.15 MB | 50% |
When to Use Each Backend
Use Pandas When:- You need compatibility with existing pandas code
- You’re using libraries that require pandas (matplotlib, seaborn, etc.)
- You need mutable DataFrames (in-place operations)
- Dataset is small (<10k rows)
- You need the full pandas ecosystem
- Performance is critical
- Working with large datasets (>100k rows)
- Memory is constrained
- You need lazy evaluation
- You want type safety and better error messages
- You’re starting a new project
Switching Backends
5. Async Parallel Fetching
Async fetching is one of tif1’s most significant performance optimizations.Sequential vs Parallel Comparison
Concurrency Control
6. JSON Parsing Optimization
orjson provides 2-3x faster JSON parsing than stdlib json.Benchmark Comparison
7. Cache Optimization Strategies
Pre-warming Cache
Cache Size Tuning
8. Batch Operations
Process multiple items together for better performance.Performance Summary
Key Optimizations and Their Impact:| Optimization | Speedup | Memory Savings |
|---|---|---|
| HTTP/2 Multiplexing | 4x | - |
| Async Parallel Fetching | 17x | - |
| Lazy Loading | 10x | 80% |
| Categorical Types | 1.2x | 90% |
| orjson Parsing | 3x | - |
| Polars Backend | 4x | 50% |
| Multi-tier Caching | 2868x | - |
| Batch Operations | 16x | - |
- Cold start: ~3s
- Warm start: ~30ms (100x faster)
- Hot start: <1ms (3000x faster)
- Memory usage: 50-70% reduction vs naive implementation
Comprehensive Error Handling
tif1 implements a robust error handling system with a hierarchical exception structure, detailed error context, and recovery strategies.Exception Hierarchy
Network Errors
Network errors occur during CDN fetching and HTTP operations.NetworkError (Base)
ConnectionError
TimeoutError
CDNError
Data Not Found Errors
These errors occur when requested data doesn’t exist.DataNotFoundError (Base)
DriverNotFoundError
LapNotFoundError
Invalid Data Errors
These errors occur during data parsing and validation.InvalidDataError (Base)
JSONParseError
ValidationError
Cache Errors
These errors occur during cache operations.CacheError (Base)
CacheCorruptionError
Error Recovery Strategies
Automatic Retry with Exponential Backoff
Fallback to Alternative Data Source
Graceful Degradation
Error Context and Debugging
All tif1 exceptions include rich context for debugging.Monitoring and Observability
tif1 provides comprehensive monitoring capabilities to help you understand system behavior, diagnose issues, and optimize performance.Logging System
Log Levels and Configuration
What Gets Logged
DEBUG Level:Performance Monitoring
Timing Decorators
Performance Metrics Collection
Cache Monitoring
Cache Statistics
Cache Performance Testing
Circuit Breaker Monitoring
Network Monitoring
Request Tracking
Memory Monitoring
Comprehensive Monitoring Dashboard
Data Flow Patterns and Scenarios
Understanding common data flow patterns helps you optimize your code for different use cases.Pattern 1: Cold Start (First Load)
Scenario: First time loading data, no cache available.- Pre-warm cache during application startup
- Use async loading for non-blocking operation
- Consider loading only required data (laps vs telemetry)
Pattern 2: Warm Start (SQLite Cache Hit)
Scenario: Data exists in SQLite cache, but not in memory.- Keep SQLite cache on SSD for faster access
- Increase cache TTL to reduce re-fetching
- Monitor cache hit rate
Pattern 3: Hot Start (Memory Cache Hit)
Scenario: Data exists in memory cache.- Reuse session objects to maximize memory cache hits
- Increase memory cache size if you have available RAM
- Keep frequently accessed data in memory
Pattern 4: Partial Cache Hit
Scenario: Some drivers cached, others need fetching.- 15 drivers from memory: < 1ms
- 3 drivers from SQLite: ~90ms
- 2 drivers from CDN: ~700ms
- Batch fetch missing drivers
- Pre-warm cache for commonly accessed drivers
- Use selective loading (only load needed drivers)
Pattern 5: Network Failure with Retry
Scenario: Network request fails, automatic retry with backoff.- Attempt 1: 30s (timeout)
- Backoff 1: 1s
- Attempt 2: 500ms (fast fail)
- Backoff 2: 2s
- Attempt 3: 500ms (success)
- Processing: 370ms
- Reduce timeout for faster failure detection
- Implement circuit breaker to fail fast
- Use fallback data sources
Pattern 6: Batch Loading Multiple Sessions
Scenario: Load data for multiple sessions efficiently.Pattern 7: Incremental Data Loading
Scenario: Load data incrementally as needed.Pattern 8: Cache Warming Strategy
Scenario: Pre-warm cache for better user experience.Advanced Optimization Strategies
These strategies help you achieve maximum performance in production environments.Strategy 1: Intelligent Cache Pre-warming
Pre-warm cache strategically based on usage patterns.Strategy 2: Adaptive Timeout Management
Adjust timeouts based on network conditions.Strategy 3: Selective Data Loading
Load only the data you need.Strategy 4: Batch Operations for Multiple Analyses
Batch operations to minimize overhead.Strategy 5: Memory-Efficient Iteration
Process large datasets without loading everything into memory.Strategy 6: Polars for Large-Scale Analysis
Use Polars for better performance on large datasets.Strategy 7: Connection Pooling Optimization
Optimize HTTP connection pooling for your workload.Strategy 8: Lazy Evaluation with Polars
Use Polars lazy evaluation for complex pipelines.Strategy 9: Compression for Cache Storage
Optimize cache storage with compression.Strategy 10: Monitoring-Driven Optimization
Use monitoring data to identify bottlenecks.Comprehensive Summary
tif1’s data flow architecture is designed from the ground up for maximum performance, reliability, and developer experience. This section summarizes the key concepts and provides actionable recommendations.Architecture Principles
1. Performance First- Every component optimized for speed
- Multi-tier caching reduces latency by 2800x
- HTTP/2 multiplexing enables true parallel fetching
- orjson provides 3x faster JSON parsing
- Categorical types reduce memory by 90%
- Polars backend offers 4x faster operations
- Circuit breaker prevents cascading failures
- Automatic retry with exponential backoff
- Multi-source CDN fallback (jsDelivr → GitHub)
- Comprehensive error handling with detailed context
- Graceful degradation for non-critical data
- Lazy loading minimizes unnecessary work
- Intuitive API with sensible defaults
- Rich logging and monitoring capabilities
- Detailed error messages with recovery suggestions
- Flexible backend selection (pandas/polars)
Performance Characteristics
Load Time Comparison:| Scenario | Time | Speedup vs Cold |
|---|---|---|
| Cold start (no cache) | 2.8s | 1x (baseline) |
| Warm start (SQLite cache) | 30ms | 93x faster |
| Hot start (memory cache) | < 1ms | 2800x faster |
| Data Type | Unoptimized | Optimized | Savings |
|---|---|---|---|
| Lap data (1500 laps) | 15 MB | 4 MB | 73% |
| Telemetry (15k samples) | 45 MB | 18 MB | 60% |
| Weather (150 samples) | 0.8 MB | 0.3 MB | 63% |
| Operation | Pandas | Polars | Speedup |
|---|---|---|---|
| Load from JSON | 86ms | 20ms | 4.3x |
| Filter | 2.5ms | 0.8ms | 3.1x |
| Groupby | 12ms | 3ms | 4.0x |
| Memory usage | 7 MB | 3.5 MB | 50% less |
Key Optimizations
1. Multi-Tier Caching- L1 (Memory): < 1ms access, 1024 items
- L2 (SQLite): 30ms access, unlimited capacity
- Automatic promotion from L2 to L1
- TTL-based expiration (7 days default)
- HTTP/2 multiplexing over single connection
- 20 drivers fetched in 500ms (vs 10s sequential)
- 17x speedup for multi-driver operations
- Automatic error handling and retry
- Categorical encoding for string columns
- Downcast numeric types (float64 → float32)
- Boolean optimization (int64 → bool)
- 73% memory reduction on average
- Automatic computed columns (20+ fields)
- Weather data integration
- Performance flags and metrics
- Minimal overhead (130ms for 1500 laps)
Best Practices
For Maximum Performance:Common Pitfalls and Solutions
Pitfall 1: Creating New Sessions RepeatedlyPerformance Tuning Checklist
Before Deployment:- Enable appropriate logging level (INFO for production)
- Configure cache size based on available memory
- Set cache TTL based on data freshness requirements
- Pre-warm cache for frequently accessed data
- Choose appropriate backend (pandas vs polars)
- Configure timeouts based on network conditions
- Set up monitoring and alerting
- Test error handling and recovery
- Monitor cache hit rate (target: > 80%)
- Monitor network success rate (target: > 95%)
- Monitor memory usage (should be stable)
- Monitor circuit breaker state (should be CLOSED)
- Check for slow queries (> 5s)
- Review error logs regularly
- Clear expired cache entries periodically
- Update cache warming strategy based on usage
- Profile code to identify bottlenecks
- Increase cache size if hit rate is low
- Use Polars for large datasets
- Batch operations where possible
- Use async loading for non-blocking operations
- Optimize network timeouts
- Consider CDN proximity
- Review and optimize data loading patterns
Future Enhancements
tif1’s data flow architecture is designed to evolve. Planned enhancements include: Short Term:- Streaming data support for live sessions
- GraphQL API for selective field loading
- Redis cache tier for distributed systems
- Compression algorithm selection (zstd, lz4)
- Automatic cache warming based on ML predictions
- Edge caching with CloudFlare Workers
- WebSocket support for real-time updates
- Distributed cache with automatic sharding
- Query result caching with automatic invalidation
- Advanced prefetching based on access patterns
Conclusion
tif1’s data flow architecture represents a comprehensive approach to high-performance data loading and processing. By understanding the eight-stage pipeline, leveraging multi-tier caching, utilizing async parallel fetching, and following best practices, you can achieve:- 93-2800x faster data access through caching
- 17x faster multi-driver operations through parallelization
- 50-73% less memory usage through type optimization
- 4x faster operations through Polars backend
- 99%+ reliability through error handling and circuit breakers
Related Documentation
Caching Strategy
Deep dive into multi-tier caching system
Backends
Pandas vs Polars comparison and selection guide
Sessions
Session management and data loading
HTTP API
HTTP client and network operations
Utilities
Helper functions and utilities
CLI
Command-line interface for data management
Additional Resources
- GitHub Repository: TracingInsights/tif1
- Issue Tracker: Report bugs and request features
- Discussions: Community discussions and Q&A
- Examples: Code examples and tutorials