Files
LEDMatrix/docs/PLUGIN_ERROR_HANDLING.md
Chuck 8fb2800495 feat: add error detection, monitoring, and code quality improvements (#223)
* feat: add error detection, monitoring, and code quality improvements

This comprehensive update addresses automatic error detection, code
quality, and plugin development experience:

## Error Detection & Monitoring
- Add ErrorAggregator service for centralized error tracking
- Add pattern detection for recurring errors (5+ in 60 min)
- Add error dashboard API endpoints (/api/v3/errors/*)
- Integrate error recording into plugin executor

## Code Quality
- Remove 10 silent `except: pass` blocks in sports.py and football.py
- Remove hardcoded debug log paths
- Add pre-commit hooks to prevent future bare except clauses

## Validation & Type Safety
- Add warnings when plugins lack config_schema.json
- Add config key collision detection for plugins
- Improve type coercion logging in BasePlugin

## Testing
- Add test_config_validation_edge_cases.py
- Add test_plugin_loading_failures.py
- Add test_error_aggregator.py

## Documentation
- Add PLUGIN_ERROR_HANDLING.md guide
- Add CONFIG_DEBUGGING.md guide

Note: GitHub Actions CI workflow is available in the plan but requires
workflow scope to push. Add .github/workflows/ci.yml manually.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address code review issues

- Fix GitHub issues URL in CONFIG_DEBUGGING.md
- Use RLock in error_aggregator.py to prevent deadlock in export_to_file
- Distinguish missing vs invalid schema files in plugin_manager.py
- Add assertions to test_null_value_for_required_field test
- Remove unused initial_count variable in test_plugin_load_error_recorded
- Add validation for max_age_hours in clear_old_errors API endpoint

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Chuck <chuck@example.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 10:05:09 -05:00

6.4 KiB

Plugin Error Handling Guide

This guide covers best practices for error handling in LEDMatrix plugins.

Custom Exception Hierarchy

LEDMatrix provides typed exceptions for different error categories. Use these instead of generic Exception:

from src.exceptions import PluginError, ConfigError, CacheError, DisplayError

# Plugin-related errors
raise PluginError("Failed to fetch data", plugin_id=self.plugin_id, context={"api": "ESPN"})

# Configuration errors
raise ConfigError("Invalid API key format", field="api_key")

# Cache errors
raise CacheError("Cache write failed", cache_key="game_data")

# Display errors
raise DisplayError("Failed to render", display_mode="live")

Exception Context

All LEDMatrix exceptions support a context dict for additional debugging info:

raise PluginError(
    "API request failed",
    plugin_id=self.plugin_id,
    context={
        "url": api_url,
        "status_code": response.status_code,
        "retry_count": 3
    }
)

Logging Best Practices

Use the Plugin Logger

Every plugin has access to self.logger:

class MyPlugin(BasePlugin):
    def update(self):
        self.logger.info("Starting data fetch")
        self.logger.debug("API URL: %s", api_url)
        self.logger.warning("Rate limit approaching")
        self.logger.error("API request failed", exc_info=True)

Log Levels

  • DEBUG: Detailed info for troubleshooting (API URLs, parsed data)
  • INFO: Normal operation milestones (plugin loaded, data fetched)
  • WARNING: Recoverable issues (rate limits, cache miss, fallback used)
  • ERROR: Failures that need attention (API down, display error)

Include exc_info for Exceptions

try:
    response = requests.get(url)
except requests.RequestException as e:
    self.logger.error("API request failed: %s", e, exc_info=True)

Error Handling Patterns

Never Use Bare except

# BAD - swallows all errors including KeyboardInterrupt
try:
    self.fetch_data()
except:
    pass

# GOOD - catch specific exceptions
try:
    self.fetch_data()
except requests.RequestException as e:
    self.logger.warning("Network error, using cached data: %s", e)
    self.data = self.get_cached_data()

Graceful Degradation

def update(self):
    try:
        self.data = self.fetch_live_data()
    except requests.RequestException as e:
        self.logger.warning("Live data unavailable: %s", e)
        # Fall back to cache
        cached = self.cache_manager.get(self.cache_key)
        if cached:
            self.logger.info("Using cached data")
            self.data = cached
        else:
            self.logger.error("No cached data available")
            self.data = None

Validate Configuration Early

def validate_config(self) -> bool:
    """Validate configuration at load time."""
    api_key = self.config.get("api_key")
    if not api_key:
        self.logger.error("api_key is required but not configured")
        return False

    if not isinstance(api_key, str) or len(api_key) < 10:
        self.logger.error("api_key appears to be invalid")
        return False

    return True

Handle Display Errors

def display(self, force_clear: bool = False) -> bool:
    if not self.data:
        if force_clear:
            self.display_manager.clear()
            self.display_manager.update_display()
        return False

    try:
        self._render_content()
        return True
    except Exception as e:
        self.logger.error("Display error: %s", e, exc_info=True)
        # Clear display on error to prevent stale content
        self.display_manager.clear()
        self.display_manager.update_display()
        return False

Error Aggregation

LEDMatrix automatically tracks plugin errors. Access error data via the API:

# Get error summary
curl http://localhost:5000/api/v3/errors/summary

# Get plugin-specific health
curl http://localhost:5000/api/v3/errors/plugin/my-plugin

# Clear old errors
curl -X POST http://localhost:5000/api/v3/errors/clear

Error Patterns

When the same error occurs repeatedly (5+ times in 60 minutes), it's detected as a pattern and logged as a warning. This helps identify systemic issues.

Common Error Scenarios

API Rate Limiting

def fetch_data(self):
    try:
        response = requests.get(self.api_url)
        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 60))
            self.logger.warning("Rate limited, retry after %ds", retry_after)
            self._rate_limited_until = time.time() + retry_after
            return None
        response.raise_for_status()
        return response.json()
    except requests.RequestException as e:
        self.logger.error("API error: %s", e)
        return None

Timeout Handling

def fetch_data(self):
    try:
        response = requests.get(self.api_url, timeout=10)
        return response.json()
    except requests.Timeout:
        self.logger.warning("Request timed out, will retry next update")
        return None
    except requests.RequestException as e:
        self.logger.error("Request failed: %s", e)
        return None

Missing Data Gracefully

def get_team_logo(self, team_id):
    logo_path = self.logos_dir / f"{team_id}.png"
    if not logo_path.exists():
        self.logger.debug("Logo not found for team %s, using default", team_id)
        return self.default_logo
    return Image.open(logo_path)

Testing Error Handling

def test_handles_api_error(mock_requests):
    """Test plugin handles API errors gracefully."""
    mock_requests.get.side_effect = requests.RequestException("Network error")

    plugin = MyPlugin(...)
    plugin.update()

    # Should not raise, should log warning, should have no data
    assert plugin.data is None

def test_handles_invalid_json(mock_requests):
    """Test plugin handles invalid JSON response."""
    mock_requests.get.return_value.json.side_effect = ValueError("Invalid JSON")

    plugin = MyPlugin(...)
    plugin.update()

    assert plugin.data is None

Checklist

  • No bare except: clauses
  • All exceptions logged with appropriate level
  • exc_info=True for error-level logs
  • Graceful degradation with cache fallbacks
  • Configuration validated in validate_config()
  • Display clears on error to prevent stale content
  • Timeouts configured for all network requests