feat: add error detection, monitoring, and code quality improvements (#223)

* feat: add error detection, monitoring, and code quality improvements

This comprehensive update addresses automatic error detection, code
quality, and plugin development experience:

## Error Detection & Monitoring
- Add ErrorAggregator service for centralized error tracking
- Add pattern detection for recurring errors (5+ in 60 min)
- Add error dashboard API endpoints (/api/v3/errors/*)
- Integrate error recording into plugin executor

## Code Quality
- Remove 10 silent `except: pass` blocks in sports.py and football.py
- Remove hardcoded debug log paths
- Add pre-commit hooks to prevent future bare except clauses

## Validation & Type Safety
- Add warnings when plugins lack config_schema.json
- Add config key collision detection for plugins
- Improve type coercion logging in BasePlugin

## Testing
- Add test_config_validation_edge_cases.py
- Add test_plugin_loading_failures.py
- Add test_error_aggregator.py

## Documentation
- Add PLUGIN_ERROR_HANDLING.md guide
- Add CONFIG_DEBUGGING.md guide

Note: GitHub Actions CI workflow is available in the plan but requires
workflow scope to push. Add .github/workflows/ci.yml manually.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address code review issues

- Fix GitHub issues URL in CONFIG_DEBUGGING.md
- Use RLock in error_aggregator.py to prevent deadlock in export_to_file
- Distinguish missing vs invalid schema files in plugin_manager.py
- Add assertions to test_null_value_for_required_field test
- Remove unused initial_count variable in test_plugin_load_error_recorded
- Add validation for max_age_hours in clear_old_errors API endpoint

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Chuck <chuck@example.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Chuck
2026-01-30 10:05:09 -05:00
committed by GitHub
parent 8912501604
commit 8fb2800495
14 changed files with 2330 additions and 202 deletions

View File

@@ -21,6 +21,7 @@ from src.web_interface.validators import (
validate_image_url, validate_file_upload, validate_mime_type,
validate_numeric_range, validate_string_length, sanitize_plugin_config
)
from src.error_aggregator import get_error_aggregator
# Will be initialized when blueprint is registered
config_manager = None
@@ -6426,4 +6427,107 @@ def delete_cache_file():
error_details = traceback.format_exc()
print(f"Error in delete_cache_file: {str(e)}")
print(error_details)
return jsonify({'status': 'error', 'message': str(e)}), 500
return jsonify({'status': 'error', 'message': str(e)}), 500
# =============================================================================
# Error Aggregation Endpoints
# =============================================================================
@api_v3.route('/errors/summary', methods=['GET'])
def get_error_summary():
"""
Get summary of all errors for monitoring and debugging.
Returns error counts, detected patterns, and recent errors.
"""
try:
aggregator = get_error_aggregator()
summary = aggregator.get_error_summary()
return success_response(data=summary, message="Error summary retrieved")
except Exception as e:
logger.error(f"Error getting error summary: {e}", exc_info=True)
return error_response(
error_code=ErrorCode.SYSTEM_ERROR,
message="Failed to retrieve error summary",
details=str(e),
status_code=500
)
@api_v3.route('/errors/plugin/<plugin_id>', methods=['GET'])
def get_plugin_errors(plugin_id):
"""
Get error health status for a specific plugin.
Args:
plugin_id: Plugin identifier
Returns health status and error statistics for the plugin.
"""
try:
aggregator = get_error_aggregator()
health = aggregator.get_plugin_health(plugin_id)
return success_response(data=health, message=f"Plugin {plugin_id} health retrieved")
except Exception as e:
logger.error(f"Error getting plugin health for {plugin_id}: {e}", exc_info=True)
return error_response(
error_code=ErrorCode.SYSTEM_ERROR,
message=f"Failed to retrieve health for plugin {plugin_id}",
details=str(e),
status_code=500
)
@api_v3.route('/errors/clear', methods=['POST'])
def clear_old_errors():
"""
Clear error records older than specified age.
Request body (optional):
max_age_hours: Maximum age in hours (default: 24, max: 8760 = 1 year)
"""
try:
data = request.get_json(silent=True) or {}
raw_max_age = data.get('max_age_hours', 24)
# Validate and coerce max_age_hours
try:
max_age_hours = int(raw_max_age)
if max_age_hours < 1:
return error_response(
error_code=ErrorCode.INVALID_INPUT,
message="max_age_hours must be at least 1",
context={'provided_value': raw_max_age},
status_code=400
)
if max_age_hours > 8760: # 1 year max
return error_response(
error_code=ErrorCode.INVALID_INPUT,
message="max_age_hours cannot exceed 8760 (1 year)",
context={'provided_value': raw_max_age},
status_code=400
)
except (ValueError, TypeError):
return error_response(
error_code=ErrorCode.INVALID_INPUT,
message="max_age_hours must be a valid integer",
context={'provided_value': str(raw_max_age)},
status_code=400
)
aggregator = get_error_aggregator()
cleared_count = aggregator.clear_old_records(max_age_hours=max_age_hours)
return success_response(
data={'cleared_count': cleared_count},
message=f"Cleared {cleared_count} error records older than {max_age_hours} hours"
)
except Exception as e:
logger.error(f"Error clearing old errors: {e}", exc_info=True)
return error_response(
error_code=ErrorCode.SYSTEM_ERROR,
message="Failed to clear old errors",
details=str(e),
status_code=500
)