feat(cache): Add intelligent disk cache cleanup with retention policies (#199)

* feat(cache): Add intelligent disk cache cleanup with retention policies

- Add cleanup_expired_files() method to DiskCache class
- Implement retention policies based on cache data types:
  * Odds data: 2 days (lines move frequently)
  * Live/recent/leaderboard: 7 days (weekly updates)
  * News/stocks: 14 days
  * Upcoming/schedules/team_info/logos: 60 days (stable data)
- Add cleanup_disk_cache() orchestration in CacheManager
- Start background cleanup thread running every 24 hours
- Run cleanup on application startup
- Add disk cleanup metrics tracking
- Comprehensive logging with cleanup statistics

This prevents disk cache from accumulating indefinitely while preserving
important season data longer than volatile live game data.

* refactor(cache): improve disk cache cleanup implementation

- Implement force parameter throttle mechanism in cleanup_disk_cache
- Fix TOCTOU race condition in disk cache cleanup (getsize/remove)
- Reduce lock contention by processing files outside lock where possible
- Add CacheStrategyProtocol for better type safety (replaces Any)
- Move time import to module level in cache_metrics
- Defer initial cleanup to background thread for non-blocking startup
- Add graceful shutdown mechanism with threading.Event for cleanup thread
- Add stop_cleanup_thread() method for controlled thread termination

* fix(cache): improve disk cache cleanup initialization and error handling

- Only start cleanup thread when disk caching is enabled (cache_dir is set)
- Remove unused retention policy keys (leaderboard, live_scores, logos)
- Handle FileNotFoundError as benign race condition in cleanup
- Preserve existing OSError handling for actual file system errors

---------

Co-authored-by: Chuck <chuck@example.com>
This commit is contained in:
Chuck
2026-01-19 15:57:19 -05:00
committed by GitHub
parent bc23b7c75c
commit 2381ead03f
3 changed files with 331 additions and 42 deletions

View File

@@ -5,6 +5,7 @@ Tracks cache performance metrics including hit rates, miss rates, and fetch time
"""
import threading
import time
import logging
from typing import Dict, Any, Optional
@@ -28,7 +29,12 @@ class CacheMetrics:
'background_hits': 0,
'background_misses': 0,
'total_fetch_time': 0.0,
'fetch_count': 0
'fetch_count': 0,
# Disk cleanup metrics
'last_disk_cleanup': 0.0,
'total_files_cleaned': 0,
'total_space_freed_mb': 0.0,
'last_cleanup_duration_sec': 0.0
}
def record_hit(self, cache_type: str = 'regular') -> None:
@@ -69,6 +75,21 @@ class CacheMetrics:
self._metrics['total_fetch_time'] += duration
self._metrics['fetch_count'] += 1
def record_disk_cleanup(self, files_cleaned: int, space_freed_mb: float, duration_sec: float) -> None:
"""
Record disk cleanup operation results.
Args:
files_cleaned: Number of files deleted
space_freed_mb: Space freed in megabytes
duration_sec: Duration of cleanup operation in seconds
"""
with self._lock:
self._metrics['last_disk_cleanup'] = time.time()
self._metrics['total_files_cleaned'] += files_cleaned
self._metrics['total_space_freed_mb'] += space_freed_mb
self._metrics['last_cleanup_duration_sec'] = duration_sec
def get_metrics(self) -> Dict[str, Any]:
"""
Get current cache performance metrics.
@@ -93,7 +114,12 @@ class CacheMetrics:
'api_calls_saved': self._metrics['api_calls_saved'],
'average_fetch_time': avg_fetch_time,
'total_fetch_time': self._metrics['total_fetch_time'],
'fetch_count': self._metrics['fetch_count']
'fetch_count': self._metrics['fetch_count'],
# Disk cleanup metrics
'last_disk_cleanup': self._metrics['last_disk_cleanup'],
'total_files_cleaned': self._metrics['total_files_cleaned'],
'total_space_freed_mb': self._metrics['total_space_freed_mb'],
'last_cleanup_duration_sec': self._metrics['last_cleanup_duration_sec']
}
def log_metrics(self) -> None: