Overview
Thetif1 configuration system provides fine-grained control over every aspect of the library’s behavior, from network settings and caching strategies to data validation and performance optimization. The configuration architecture is designed with flexibility and performance in mind, allowing you to tune the library for your specific use case—whether that’s low-latency data access, high-throughput batch processing, or development debugging.
This comprehensive guide covers all configuration options, their interactions, performance implications, and best practices for different deployment scenarios. Whether you’re running interactive Jupyter notebooks, building production data pipelines, or optimizing for CI/CD environments, this guide will help you configure tif1 for optimal performance.
Configuration Philosophy
tif1 is built for performance-first operation. The default configuration values are carefully chosen to provide excellent out-of-the-box performance for most use cases, but the library exposes dozens of tuning parameters for advanced users who need to squeeze every millisecond of latency or maximize throughput for their specific workload.
The configuration system follows a layered override model, where settings can be specified at multiple levels with clear precedence rules:
- Default values — Hardcoded defaults optimized for general use
- Configuration file (
.tif1rc) — Persistent settings stored in your home directory - Environment variables (
TIF1_*) — Deployment-specific overrides - Programmatic API (
config.set()) — Runtime adjustments based on workload
Configuration Architecture
The configuration system is implemented as a singleton pattern, ensuring that all parts of your application share the same configuration state. This design provides several benefits:- Consistency: All modules and components use the same settings
- Efficiency: No redundant configuration loading or memory overhead
- Simplicity: Single source of truth for all configuration values
- Thread-safe reads: Multiple threads can safely read configuration values
- Dynamic updates: Changes propagate immediately to all components
When to Configure
You should consider customizing the configuration when:- Performance is critical — You need to minimize latency or maximize throughput for time-sensitive applications
- Network conditions vary — You’re on a slow connection, behind a corporate proxy, experiencing DNS issues, or have specific network requirements
- Resource constraints exist — You need to limit memory usage, connection pools, or concurrent requests due to system limitations
- Data validation is needed — You’re debugging data issues, implementing quality checks, or need strict validation for compliance
- Caching behavior matters — You want to disable caching for fresh data, change cache location for shared storage, or tune cache parameters for your access patterns
- Backend preferences differ — You prefer polars over pandas for performance, or need specific DataFrame behavior for your workflow
- Deployment environment differs — Production, staging, development, or CI/CD environments have different performance, reliability, and debugging requirements
- Workload characteristics change — You’re switching between interactive analysis (low concurrency) and batch processing (high concurrency)
- Compliance requirements exist — You need to control data storage locations, disable certain features, or meet specific regulatory requirements
- Debugging is required — You need verbose logging, validation, or specific diagnostic information
Configuration Use Cases
Different use cases benefit from different configuration strategies: Interactive Analysis (Jupyter/IPython)- Lower concurrency (5-20 workers) to avoid overwhelming the system
- Enable ultra cold start for fast initial loads
- Moderate cache settings for iterative exploration
- Optional validation for data quality checks
- High concurrency (50-200 workers) for maximum throughput
- Aggressive connection pooling to handle burst traffic
- Large cache sizes to minimize redundant fetches
- Disable validation for maximum performance
- Balanced concurrency based on expected load
- Robust retry and circuit breaker settings
- Persistent cache with appropriate sizing
- Comprehensive logging for monitoring
- CI mode enabled for optimized testing
- Shorter timeouts to fail fast
- Minimal caching to ensure fresh data
- Validation enabled to catch data issues early
- Cache disabled for fresh data on every run
- Validation enabled to catch issues early
- Verbose logging (DEBUG level)
- Lower concurrency for easier debugging
Configuration Loading Process
Whentif1 initializes, it loads configuration in the following sequence:
- Initialize defaults — All configuration keys start with sensible default values hardcoded in the library
- Load config file — If a
.tif1rcfile exists, its values override defaults. The library searches for config files in this order:- Path specified in
TIF1_CONFIG_FILEenvironment variable (if set) .tif1rcin current working directory (only ifTIF1_TRUST_CWD_CONFIG=true)~/.tif1rcin user’s home directory (default location)
- Path specified in
- Apply environment variables — Any
TIF1_*environment variables override file settings - Accept runtime changes — Programmatic
config.set()calls override everything
- Explicit path (
TIF1_CONFIG_FILEenv var) — Highest priority, useful for testing or custom deployments - Current directory (
./.tif1rc) — Only checked ifTIF1_TRUST_CWD_CONFIG=truefor security - Home directory (
~/.tif1rc) — Default location, most common for user-specific settings
.tif1rc from the current working directory to prevent malicious config files in untrusted directories. Set TIF1_TRUST_CWD_CONFIG=true only if you trust the current directory.
Example Configuration Loading:
get()), not when they’re set. This design allows you to set any value programmatically, but invalid values are rejected when accessed, falling back to the provided default. Validation includes:
- Type checking (int, float, bool, str, list)
- Range validation (positive numbers, valid enums)
- Format validation (HTTPS URLs for CDNs)
- Path expansion (~ to home directory)
Configuration File Management
File Location and Discovery
The.tif1rc configuration file is a JSON file that stores persistent configuration settings. The library searches for this file in multiple locations with a specific precedence order.
Default Location:
TIF1_CONFIG_FILE environment variable:
.tif1rc file in your project directory. However, for security reasons, this file is only loaded if you explicitly enable it:
File Format
The configuration file must be valid JSON with a single object containing key-value pairs:- Must be valid JSON (not JSON5, JSONC, or other variants)
- Root element must be an object
{} - Keys must be strings matching configuration key names
- Values must match expected types (string, number, boolean, array)
- Comments are not supported (JSON doesn’t allow comments)
- Trailing commas are not allowed
Creating Configuration Files
Method 1: Programmatic Creation The recommended way to create a configuration file is using thesave() method:
Managing Multiple Configurations
For different environments or use cases, you can maintain multiple configuration files and switch between them using environment variables. Example: Development vs ProductionConfiguration File Best Practices
- Version control: Commit project-specific
.tif1rcfiles to version control - Documentation: Add comments in a separate README explaining configuration choices
- Validation: Test configuration files before deploying to production
- Backup: Keep backups of working configurations before making changes
- Minimal: Only include settings that differ from defaults to keep files small
- Security: Never commit sensitive information (API keys, passwords) to config files
Environment Variables
All configuration keys can be overridden using environment variables with theTIF1_ prefix. Environment variables take precedence over configuration files but are overridden by programmatic config.set() calls.
Environment Variable Naming
Configuration keys are converted to environment variables by:- Adding the
TIF1_prefix - Converting to uppercase
- Replacing underscores with underscores (no change)
lib→TIF1_LIBenable_cache→TIF1_ENABLE_CACHEmax_workers→TIF1_MAX_WORKERScache_dir→TIF1_CACHE_DIR
Type Conversion
Environment variables are strings, so the library automatically converts them to the appropriate type: Boolean Values:- True:
1,true,yes,on(case-insensitive) - False:
0,false,no,off(case-insensitive)
- Integers:
42,100,0 - Floats:
3.14,2.0,0.5
- Used as-is
- Comma-separated strings
Complete Environment Variable Reference
Environment Variable Use Cases
Docker Containers:Configuration API Reference
get_config()
Config object is a singleton, meaning there’s only one instance per Python process. All calls to get_config() return the same object, ensuring configuration consistency across your entire application.
Returns:
Config— The global configuration singleton instance
config.set()) during concurrent operations may lead to race conditions. It’s recommended to configure the library once at startup before spawning threads or processes.
Example:
Configuration Methods
TheConfig object provides three primary methods for interacting with configuration values: get() for reading, set() for modifying, and save() for persisting changes to disk.
get(key, default=None)
Retrieve the value for a specific configuration key. This method includes built-in validation for many configuration keys to ensure values are within acceptable ranges and of the correct type.
key(str) — The configuration key name (case-sensitive)default(Any, optional) — Default value to return if the key doesn’t exist or validation fails. Defaults toNone.
Any— The configuration value, ordefaultif the key doesn’t exist or validation fails
get() method performs automatic validation for many configuration keys:
- Numeric values — Must be positive integers or floats for keys like
timeout,max_workers,pool_connections, etc. - Retry count —
max_retriescan be 0 or positive (0 means no retries) - Backoff factor —
retry_backoff_factormust be >= 1.0 - Library selection —
libmust be either"pandas"or"polars" - CDN URLs —
cdnsmust be a list of HTTPS URLs - Path expansion —
cache_dirautomatically expands~to the user’s home directory
default value instead of the invalid value.
Example:
set(key, value)
Update a configuration value in memory for the current Python session. Changes made with set() are not persisted to disk unless you explicitly call save().
key(str) — The configuration key name (case-sensitive)value(Any) — The new value to set. Type should match the expected type for the key.
None
- Changes take effect immediately for all subsequent operations
- Changes are session-only unless you call
save() - No validation is performed during
set()— validation happens inget() - You can set custom keys that aren’t part of the default configuration
- Runtime optimization — Adjust settings based on workload characteristics
- A/B testing — Compare performance with different configurations
- Dynamic tuning — Increase concurrency for large batches, decrease for small queries
- Temporary overrides — Disable caching for a specific operation, then re-enable
Changes made with
set() only affect the current Python session. To make changes permanent, call save() after setting your desired values.save(path=None)
Persist the current in-memory configuration to a JSON file on disk. This allows you to make configuration changes permanent across Python sessions.
path(Path | None, optional) — Path where the configuration file should be saved. IfNone, saves to the default location (~/.tif1rc). Defaults toNone.
None
- Writes the entire current configuration (including defaults) to the specified file
- Creates the file if it doesn’t exist
- Overwrites the file if it already exists
- Uses JSON format with 2-space indentation for readability
- Logs success or failure messages
The
save() method writes the entire configuration, not just the keys you’ve modified. This ensures the saved file is a complete, self-contained configuration.Configuration Keys
Core Settings
Default DataFrame lib (
"pandas" or "polars").Enable or disable the multi-layer caching system (memory + SQLite).
Path where the SQLite cache database is stored.
Enable Pydantic validation of incoming JSON data.
Validation adds overhead. Disabled by default for performance.
Enable ultra-low latency mode for first-time loads. Skips loading full session data when only specific data is needed.
Network Settings
Network request timeout in seconds.
Number of times to retry a failed CDN request.
Maximum number of concurrent workers for parallel requests.
Maximum number of concurrent HTTP requests.
HTTP Session Settings
Enable HTTP/2 multiplexing for multiple requests over a single connection.
Disable HTTP/3 support.
Number of connection pools to maintain. If not set, automatically calculated as
max(256, max_workers, max_concurrent_requests, telemetry_prefetch_max_concurrent_requests).Most users should rely on automatic sizing. Only set explicitly for specific performance tuning.
Maximum connections per pool. If not set, automatically calculated as
max(512, pool_connections * 4) to handle burst traffic.Automatically sized to 4x pool_connections with a minimum of 512. Only override for specific use cases.
Keep-alive timeout in seconds.
Maximum requests per keep-alive connection.
User-Agent header sent with all HTTP requests. Useful for identifying your application in server logs or implementing custom rate limiting.Default:
tif1/{version} (e.g., tif1/0.2.0)Use Cases:- Identify your application in CDN logs
- Implement custom rate limiting per application
- Debug network issues by filtering logs
- Comply with API usage policies
DNS resolver configuration with DNS-over-HTTPS (DoH) fallback support. The library tries resolvers in order until one succeeds.Default:
["standard", "doh://cloudflare", "doh://google"]Resolver Types:standard— System DNS resolver (fastest, but may be blocked or censored)doh://cloudflare— Cloudflare DNS-over-HTTPS (1.1.1.1)doh://google— Google DNS-over-HTTPS (8.8.8.8)
- Bypass DNS blocking or censorship
- Improved privacy (encrypted DNS queries)
- Reliability when system DNS is misconfigured
- Consistent resolution across different networks
- System DNS (
standard) is fastest when working correctly - DoH adds latency due to HTTPS overhead
- DoH is useful as fallback, not primary resolver
Telemetry Settings
Maximum concurrent requests for telemetry prefetching.
Logging Settings
Interval in seconds for logging connection pool statistics.
Advanced Configuration
These settings are for advanced users and performance tuning. Most users should use the defaults.
Cache Internals
Cache Internals
Number of cache operations before committing to SQLite.
Maximum items in the in-memory cache layer.
Maximum telemetry items in the in-memory cache.
SQLite connection timeout in seconds.
CDN Configuration
CDN Configuration
Retry & Circuit Breaker
Retry & Circuit Breaker
Exponential backoff multiplier for retries. Must be >= 1.0.
Add random jitter to retry delays to prevent thundering herd.
Maximum jitter amount in seconds. Must be > 0 to have effect.
Maximum delay between retries in seconds.
Number of consecutive failures before circuit breaker opens.
Seconds to wait before attempting to close circuit breaker.
HTTP/2 Configuration
HTTP/2 Configuration
Connection Pool Exhaustion
Connection Pool Exhaustion
Base backoff delay when connection pool is exhausted (seconds).
Maximum backoff delay for pool exhaustion (seconds).
Jitter amount for pool exhaustion backoff (seconds).
Data Validation
Data Validation
Prefetch Strategies
Prefetch Strategies
Automatically prefetch lap data when accessing a driver.
Prefetch all telemetry data on first lap telemetry request.
Prefetch all telemetry data immediately after loading laps.
Fill cache in background during ultra cold start mode.
Skip retries in ultra cold start mode for faster initial load.
Other Advanced Settings
Other Advanced Settings
Default logging level. Use
setup_logging() to change at runtime.Enable offline mode (cache-only, no network requests).
Enable CI mode (optimized for continuous integration environments).
Use categorical types for string columns in polars DataFrames.
Number of worker processes for parallel JSON parsing. 0 disables multiprocessing.
Configuration file format
The.tif1rc file is a JSON file with key-value pairs:
- Default:
~/.tif1rc - Custom: Specify path with
TIF1_CONFIG_PATHenvironment variable
Environment Variables
All configuration keys can be set via environment variables with theTIF1_ prefix:
Configuration Precedence
Configuration is loaded in this order (later overrides earlier):- Default values (hardcoded in library)
.tif1rcfile (~/.tif1rcorTIF1_CONFIG_PATH)- Environment variables (
TIF1_*) - Programmatic calls (
config.set())
Common configuration patterns
High-Performance Setup
Low-Latency Setup
Development Setup
Production Setup
Setting Log Level
While not part of theConfig object, you can set the log level using setup_logging:
Configuration Patterns and Recipes
This section provides comprehensive configuration recipes for common use cases, with detailed explanations of why each setting is chosen.Maximum Performance Configuration
For absolute maximum throughput in batch processing scenarios:- High worker counts maximize parallel fetching
- Large connection pools prevent pool exhaustion
- Long keep-alive reduces connection overhead
- Disabled validation eliminates CPU overhead
- Aggressive prefetching reduces sequential fetches
- Large caches reduce redundant network requests
- High memory usage (4-8GB+)
- May overwhelm slower systems
- Not suitable for resource-constrained environments
Minimum Latency Configuration
For interactive analysis where first-byte latency matters most:- Ultra cold start skips unnecessary data loading
- DoH can be faster than misconfigured system DNS
- Moderate concurrency balances speed and resource usage
- Short timeout fails fast on slow connections
- HTTP/2 multiplexing reduces connection overhead
- Selective prefetching reduces wait time for common operations
- May skip retries on transient failures
- Short timeout may fail on slow networks
- Moderate concurrency limits maximum throughput
Development and Debugging Configuration
For development environments where debugging and data quality matter more than performance:- Disabled cache ensures fresh data on every run
- Validation catches data quality issues early
- Low concurrency makes logs easier to follow
- Long timeout accommodates debugging pauses
- Full data loading helps understand data structure
- Verbose logging provides detailed diagnostic information
- Much slower than production configuration
- High CPU overhead from validation
- Verbose logs can be overwhelming
Production Service Configuration
For production services that need reliability, performance, and observability:- Balanced concurrency handles typical load
- Robust retry settings handle transient failures
- Circuit breaker prevents cascading failures
- Reasonable timeout balances reliability and speed
- Production logging reduces noise
- Optimized pooling handles burst traffic
- Not maximum performance (prioritizes reliability)
- Higher memory usage than minimal configuration
- May be overkill for low-traffic services
CI/CD Pipeline Configuration
For continuous integration and testing environments:- CI mode enables CI-specific optimizations
- Disabled cache ensures tests use fresh data
- Validation catches data quality regressions
- Low concurrency respects CI runner limits
- Short timeout and few retries fail fast
- System DNS is faster in CI environments
- Slower than production configuration
- May fail on transient network issues
- Not suitable for performance testing
Resource-Constrained Configuration
For systems with limited CPU, memory, or network bandwidth:- Pandas has lower memory overhead than polars
- Minimal concurrency reduces CPU and memory usage
- Small pools reduce memory footprint
- Small caches limit memory usage
- Disabled prefetching reduces unnecessary fetches
- Ultra cold start loads only needed data
- Much slower than high-performance configuration
- Sequential operations dominate execution time
- Not suitable for large-scale analysis
Offline/Cache-Only Configuration
For working with previously cached data without network access:- Offline mode prevents network requests
- Cache must be enabled to serve data
- No retries since network is unavailable
- Short timeout fails fast on cache misses
- Only works with previously cached data
- Cache misses result in immediate failures
- No way to fetch new data
High-Reliability Configuration
For scenarios where reliability matters more than performance:- Conservative concurrency reduces load on CDN
- Aggressive retries handle transient failures
- Long timeout accommodates slow networks
- Conservative circuit breaker tolerates more failures
- Multiple DNS resolvers provide fallback
- Cache provides redundancy
- Slower than performance-optimized configuration
- May retry excessively on persistent failures
- Higher latency due to conservative settings
Memory-Optimized Configuration
For minimizing memory usage while maintaining reasonable performance:- Pandas has lower memory overhead
- Moderate concurrency balances speed and memory
- Small caches reduce memory footprint
- Frequent commits reduce memory buffer size
- Smaller pools reduce connection overhead
- Disabled prefetching reduces memory usage
- Slower than high-performance configuration
- More frequent disk I/O from cache commits
- May not be suitable for large-scale analysis
Configuration Troubleshooting
Common Configuration Issues
Issue: Configuration changes not taking effect Symptoms:- Changes made with
config.set()don’t seem to work - Environment variables are ignored
- Config file changes don’t apply
- Check configuration precedence (programmatic > env > file > defaults)
- Verify environment variable names (must be
TIF1_prefix, uppercase) - Ensure config file is valid JSON
- Check config file location (use
TIF1_CONFIG_FILEto specify) - Restart Python process after changing config file
- Check for validation failures in logs
Issue: Poor performance despite high concurrency settings Symptoms:
- High
max_workersbut slow execution - Connection pool exhaustion warnings
- Low CPU utilization
- Check if connection pool is too small
- Verify network bandwidth isn’t saturated
- Check if CDN is rate limiting
- Ensure cache is enabled
- Monitor connection reuse rate
Issue: High memory usage Symptoms:
- Python process using excessive memory
- Out of memory errors
- System slowdown
- Reduce cache sizes
- Lower concurrency
- Use pandas instead of polars
- Disable prefetching
- Enable ultra cold start
Issue: Frequent timeout errors Symptoms:
- Many timeout errors in logs
- Slow data loading
- Inconsistent performance
- Increase timeout value
- Check network connectivity
- Try different DNS resolvers
- Reduce concurrency
- Enable retries
Issue: Cache not working Symptoms:
- Every request hits the network
- No performance improvement on repeated queries
- Cache directory empty
- Verify cache is enabled
- Check cache directory permissions
- Ensure cache directory exists
- Check disk space
- Verify SQLite timeout isn’t too short
Best Practices
Configuration Strategy
- Start with defaults — The default configuration is optimized for most use cases. Only change settings when you have a specific need.
-
Use config files for persistent settings — Store common settings in
~/.tif1rcfor user-specific configuration that persists across all projects. - Use environment variables for deployment — Configure per-environment settings (dev, staging, production) using environment variables without modifying code.
- Use programmatic API for runtime changes — Adjust settings dynamically based on workload characteristics or user preferences.
- Document your configuration — Keep a README or comments explaining why specific settings were chosen.
- Test configuration changes — Verify performance impact before deploying to production. Use benchmarks to measure improvements.
- Monitor in production — Track connection stats, cache hit rates, and error rates to validate configuration choices.
-
Version control project configs — Commit project-specific
.tif1rcfiles to version control for reproducibility.
Performance Optimization
-
Let pool sizing auto-calculate — Only override
pool_connectionsandpool_maxsizefor specific tuning needs. The automatic sizing works well for most cases. - Monitor connection reuse — Enable connection stats logging to track connection reuse rate. Low reuse indicates pool exhaustion.
- Balance concurrency and resources — Higher concurrency isn’t always better. Find the sweet spot for your system and network.
- Use polars for large datasets — Polars provides better performance for large-scale analysis, but pandas has lower memory overhead.
- Enable caching — Cache dramatically improves performance for repeated queries. Only disable for debugging or when fresh data is critical.
- Tune prefetching — Enable prefetching for common access patterns, but disable for memory-constrained environments.
- Use ultra cold start — Enable for interactive analysis where first-byte latency matters. Disable for batch processing that needs all data.
Reliability and Robustness
- Keep validation disabled in production — Enable only for debugging or data quality checks. Validation adds significant overhead.
- Configure robust retry settings — Use exponential backoff with jitter to handle transient failures gracefully.
- Set appropriate timeouts — Balance between failing fast and tolerating slow networks. 30-60 seconds is reasonable for most cases.
- Use circuit breakers — Configure circuit breaker thresholds to prevent cascading failures.
- Multiple DNS resolvers — Use DoH as fallback for reliability, but prefer system DNS for performance.
- Monitor error rates — Track timeout, retry, and circuit breaker events to identify configuration issues.
Security and Privacy
-
Trust config files carefully — Only set
TIF1_TRUST_CWD_CONFIG=truein directories you trust. - Use DoH for privacy — DNS-over-HTTPS encrypts DNS queries, improving privacy on untrusted networks.
- Validate config files — Ensure config files are valid JSON and don’t contain malicious values.
- Limit cache locations — Store cache in secure locations with appropriate permissions.
- Custom user agents — Use descriptive user agents to identify your application in logs.
Development Workflow
- Separate dev and prod configs — Maintain different configurations for development and production environments.
-
Use CI mode in pipelines — Enable
ci_modefor CI/CD-specific optimizations. - Enable validation in dev — Catch data quality issues early by enabling validation in development.
- Disable cache in dev — Ensure fresh data during development by disabling cache.
- Verbose logging in dev — Use DEBUG logging to understand library behavior during development.
Memory Management
- Monitor memory usage — Track Python process memory to identify configuration issues.
- Reduce cache sizes — Lower cache sizes if memory usage is too high.
- Use pandas for low memory — Pandas has lower memory overhead than polars.
- Disable prefetching — Reduce memory usage by disabling aggressive prefetching.
- Frequent cache commits — Reduce memory buffer size by committing cache more frequently.
Troubleshooting
-
Enable debug logging — Use
tif1.setup_logging(logging.DEBUG)to see detailed diagnostic information. -
Check configuration values — Verify actual configuration values with
config.get(). - Monitor connection stats — Enable connection stats logging to track pool usage.
- Test incrementally — Change one setting at a time to isolate issues.
- Compare with defaults — Reset to defaults to verify custom configuration is the issue.
Configuration Maintenance
- Review periodically — Revisit configuration as workload characteristics change.
- Update with library — Check release notes for new configuration options.
- Benchmark regularly — Measure performance to validate configuration choices.
- Document changes — Keep a changelog of configuration changes and their rationale.
- Backup working configs — Save backups before making experimental changes.
Summary
Thetif1 configuration system provides comprehensive control over library behavior through multiple configuration sources with clear precedence rules. Key takeaways:
Configuration Sources (Precedence Order)
- Programmatic API (
config.set()) — Highest precedence, runtime changes - Environment variables (
TIF1_*) — Deployment-specific overrides - Configuration file (
.tif1rc) — Persistent user settings - Default values — Hardcoded defaults optimized for general use
Key Features
- Singleton pattern — Single configuration instance per Python process
- Lazy initialization — Configuration loaded on first access
- Validation on read — Invalid values rejected when accessed, not when set
- Type conversion — Automatic conversion from environment variables
- Path expansion — Automatic
~expansion for paths - Thread-safe reads — Safe to read from multiple threads
Configuration Categories
- Core settings — DataFrame library, caching, validation
- Network settings — Timeouts, retries, concurrency
- HTTP session — Connection pooling, keep-alive, multiplexing
- Cache configuration — Memory limits, commit intervals, SQLite settings
- CDN configuration — CDN URLs, minification
- Retry & circuit breaker — Backoff, jitter, thresholds
- Prefetch strategies — Automatic data prefetching
- Telemetry settings — Telemetry-specific concurrency
- Logging — Log levels, connection stats
- Advanced settings — Polars options, JSON parsing, offline mode
Common Use Cases
- Maximum performance — High concurrency, large pools, aggressive prefetching
- Minimum latency — Ultra cold start, DoH, moderate concurrency
- Development — Disabled cache, enabled validation, verbose logging
- Production — Balanced settings, robust retries, monitoring
- CI/CD — CI mode, disabled cache, fast failure
- Resource-constrained — Low concurrency, small caches, minimal prefetching
- Offline — Cache-only mode, no network requests
- High-reliability — Aggressive retries, long timeouts, multiple resolvers
Best Practices Summary
- Start with defaults, change only when needed
- Use config files for persistent settings
- Use environment variables for deployment
- Use programmatic API for runtime changes
- Monitor performance and adjust accordingly
- Document configuration choices
- Test changes before production deployment
- Keep validation disabled in production
- Enable caching for performance
- Balance concurrency with resources
Getting Help
If you encounter configuration issues:- Enable debug logging:
tif1.setup_logging(logging.DEBUG) - Check actual values:
config.get(key) - Verify precedence: Check file, env vars, and programmatic sets
- Review validation: Check logs for validation warnings
- Compare with defaults: Reset to defaults to isolate issues
- Consult documentation: Review this guide for detailed explanations
tif1 for your use case.
Configuration Reference Tables
Quick Reference: Common Settings
| Setting | Default | Production | Development | CI/CD |
|---|---|---|---|---|
lib | pandas | polars | pandas | pandas |
enable_cache | True | True | False | False |
validate_data | False | False | True | True |
max_workers | 20 | 100 | 5 | 10 |
timeout | 30 | 60 | 120 | 30 |
max_retries | 3 | 5 | 3 | 2 |
ultra_cold_start | True | True | False | True |
log_level | WARNING | WARNING | DEBUG | INFO |
Quick Reference: Performance Impact
| Setting | Performance Impact | Memory Impact | Reliability Impact |
|---|---|---|---|
max_workers | High ↑ | Medium ↑ | Low |
pool_connections | Medium ↑ | Medium ↑ | High ↑ |
enable_cache | Very High ↑ | High ↑ | Medium ↑ |
validate_data | High ↓ | Low | High ↑ |
ultra_cold_start | High ↑ | Low ↓ | Low |
prefetch_* | Medium ↑ | Medium ↑ | Low |
lib=polars | Medium ↑ | Medium ↑ | Low |
max_retries | Low ↓ | Low | High ↑ |
Quick Reference: Memory Usage
| Configuration | Estimated Memory | Use Case |
|---|---|---|
| Minimal | 100-200 MB | Resource-constrained |
| Default | 200-500 MB | General use |
| High-performance | 1-2 GB | Batch processing |
| Maximum | 4-8 GB+ | Large-scale analysis |
Quick Reference: Concurrency Settings
| Workload | max_workers | pool_connections | pool_maxsize |
|---|---|---|---|
| Interactive | 5-20 | 128-256 | 256-512 |
| Batch | 50-200 | 512-1024 | 2048-4096 |
| Production | 50-100 | 256-512 | 1024-2048 |
| CI/CD | 5-10 | 64-128 | 128-256 |
| Resource-constrained | 5-10 | 32-64 | 64-128 |
Related Pages
Caching Strategy
Learn about the multi-layer caching system and how to optimize cache configuration
Best Practices
Advanced performance optimization techniques and benchmarking
HTTP Session
Deep dive into HTTP session configuration and connection pooling
Installation
Setup guide and initial configuration
CLI Configuration
Configure tif1 via command-line interface
Backends
Pandas vs Polars backend comparison and configuration
Additional Resources
Example Configuration Files
Minimal Configuration:Configuration Validation Script
Use this script to validate your configuration:Configuration Migration Script
Use this script to migrate from old configuration format:Environment Variable Generator
Use this script to generate environment variables from config file:Frequently Asked Questions
Q: Do I need to configure tif1? A: No, the default configuration works well for most use cases. Only configure if you have specific performance, reliability, or resource requirements. Q: What’s the difference between config file and environment variables? A: Config files are persistent and user-specific. Environment variables are deployment-specific and override config files. Use config files for personal settings, environment variables for deployment settings. Q: Can I use multiple config files? A: Yes, use theTIF1_CONFIG_FILE environment variable to specify which config file to load. Only one config file is loaded at a time (first found in search order).
Q: How do I reset to default configuration?
A: Delete or rename your .tif1rc file and unset all TIF1_* environment variables. The library will use hardcoded defaults.
Q: Why isn’t my configuration taking effect?
A: Check configuration precedence (programmatic > env > file > defaults). Verify environment variable names are correct (TIF1_ prefix, uppercase). Ensure config file is valid JSON. Check logs for validation warnings.
Q: What’s the performance impact of validation?
A: Validation adds 10-30% overhead depending on workload. Disable in production for maximum performance.
Q: Should I use pandas or polars?
A: Polars is faster for large datasets but uses more memory. Pandas has lower memory overhead and better compatibility. Start with pandas, switch to polars if you need more performance.
Q: How much memory does tif1 use?
A: Depends on configuration and workload. Typical usage: 200-500 MB. High-performance: 1-2 GB. Maximum: 4-8 GB+. Reduce cache sizes and concurrency to lower memory usage.
Q: What’s ultra cold start mode?
A: Ultra cold start skips loading full session data when only specific data is needed, reducing initial load time by 50-80%. Enable for interactive analysis, disable for batch processing.
Q: How do I optimize for my use case?
A: See the “Configuration Patterns and Recipes” section for detailed configurations for different use cases (performance, latency, development, production, etc.).
Q: Can I change configuration at runtime?
A: Yes, use config.set() to change values at runtime. Changes take effect immediately but are not persisted unless you call config.save().
Q: What’s the recommended production configuration?
A: See the “Production Service Configuration” in the “Configuration Patterns and Recipes” section for a complete production-ready configuration.
Q: How do I debug configuration issues?
A: Enable debug logging (tif1.setup_logging(logging.DEBUG)), check actual values (config.get(key)), verify precedence, and review logs for validation warnings.
Q: What’s the difference between pool_connections and pool_maxsize?
A: pool_connections is the number of connection pools (one per host). pool_maxsize is the maximum connections per pool. Both auto-calculate by default based on concurrency settings.
Q: Should I enable prefetching?
A: Enable for common access patterns (e.g., accessing driver laps after loading session). Disable for memory-constrained environments or when access patterns are unpredictable.
Q: What’s the recommended cache size?
A: Default (1024 items) works for most cases. Increase for large-scale analysis (4096+). Decrease for memory-constrained environments (256-512).
Q: How do I configure for CI/CD?
A: Enable ci_mode, disable cache, enable validation, use lower concurrency, shorter timeout, and fewer retries. See “CI/CD Pipeline Configuration” for details.
Q: Can I use tif1 offline?
A: Yes, enable offline_mode to use only cached data. Requires previously cached data. See “Offline/Cache-Only Configuration” for details.
Q: What’s the impact of HTTP/2 multiplexing?
A: HTTP/2 multiplexing reduces connection overhead by reusing connections for multiple requests. Enabled by default. Disable only if you have issues with specific proxies or CDNs.
Q: How do I configure DNS resolvers?
A: Use http_resolvers to specify resolver order. Default tries system DNS first, then DoH fallbacks. Use DoH for privacy or when system DNS is blocked.
Q: What’s the circuit breaker for?
A: Circuit breaker prevents cascading failures by stopping requests after consecutive failures. Automatically recovers after timeout. Configure threshold and timeout based on reliability requirements.
Q: How do I monitor configuration effectiveness?
A: Enable connection stats logging, track cache hit rates, monitor error rates, and benchmark performance. Adjust configuration based on metrics.