Add cross-size/cross-screen plugin safety harness (#361)

* feat(testing): add cross-size/cross-screen plugin safety harness Render every plugin across all supported matrix sizes (64x32, 128x32, 128x64, 256x32) and every declared screen, failing on crashes, content drawn past the panel edge, or visual drift vs committed golden images. - BoundsCheckingDisplayManager: oversized-canvas overflow detection - harness.py: multi-size/multi-screen render engine + golden compare - scripts/check_plugin.py: CLI (functional+bounds, --out-dir, --update-golden, --freeze-time); render_plugin.py refactored onto shared loading helpers - test/plugins/test_harness.py + test_plugin_matrix.py (parametrized, honors per-plugin test/harness.json; skips when no plugins present) - MockCacheManager.cache_dir so cache-dir-using plugins load headlessly - .github/workflows/test.yml + docs/plugin-safety-harness.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(testing): address PR review feedback on plugin safety harness - check_plugin: friendly error for non-numeric --sizes; reject non-object --config / --mock-data JSON; sanitize plugin mode before using as a filename; stop --update-golden from masking crash/overflow failures - bounds_display_manager: pad the canvas out to the largest supported panel (not a fixed 16px) so far-overshoot coordinates are caught, not clipped - harness: merge config_schema defaults inside render_plugin_matrix; surface update() failures as a non-fatal warning + result field instead of a debug log; sanitize mode in golden_path - loading: fail fast when harness.json references a missing mock_data fixture - mocks: clean up the per-instance temp cache dir via weakref.finalize - test_plugin_matrix: add a discovery guard that fails when LEDMATRIX_REQUIRE_PLUGINS=1 but none found (still skips locally); type hints - bound test deps with upper version pins for deterministic CI Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(testing): render plugins across arbitrary panel sizes, not a fixed list Addresses maintainer feedback that there is no canonical set of supported panel sizes — a build can be any size/configuration (square, 2x2, 4x4, 8x2, long strips, tall stacks). - sizes.py: SUPPORTED_SIZES -> DEFAULT_TEST_SIZES (back-compat alias kept), reframed as a representative SAMPLE of real panel-grid arrangements rather than an authoritative list; add parse_size_token / coerce_sizes / resolve_test_sizes helpers - sizes are now fully overridable: LEDMATRIX_TEST_SIZES env (global, e.g. test on your exact hardware) > per-plugin harness.json "sizes" > default sample; CLI --sizes unchanged - bounds_display_manager: pad the canvas to the largest panel IN THE CURRENT RUN (via overflow_extent) instead of a hardcoded max, so cross-size overflow detection scales to whatever sizes a run uses - harness: compute per-run extent and thread it into the bounds manager - tests: arbitrary-shape + size-parsing/precedence coverage - docs: rewrite "Supported sizes" -> "Sizes: a sample, not a fixed list" Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(testing): fail the harness on non-connectivity update() errors Addresses the remaining review thread: recording every update() exception as a non-fatal warning still let a real update() regression pass green as long as display() survived. Now update() failures are classified — a tolerated set of connectivity errors (ConnectionError/TimeoutError/socket/ssl/urllib/http/ requests) is recorded non-fatally (expected with no network in CI), while any other exception is treated as a genuine bug and fails that render. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ci(security): pin actions to SHAs and disable checkout credential persistence Addresses the CodeRabbit/zizmor workflow-hardening finding: pin actions/checkout and actions/setup-python to full commit SHAs and set persist-credentials: false on checkout to reduce supply-chain and token-exposure risk. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(testing): validate positive sizes; narrow requests import except Two review findings: - sizes.py: parse_size_token / coerce_sizes now reject non-positive dimensions (0x32, -64x32) with a clear message instead of passing invalid sizes downstream (CodeRabbit). - harness.py: the optional `requests` import now catches ImportError specifically and logs instead of `except Exception: pass`, clearing the Codacy medium "Try, Except, Pass" (harness.py L52) and Ruff S110/BLE001. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-08-01 08:48:05 +00:00 · 2026-06-05 14:32:52 -04:00
parent 122e6d6863
commit 313e35a98f
13 changed files with 1360 additions and 38 deletions
@@ -0,0 +1,217 @@
+#!/usr/bin/env python3
+"""
+Plugin safety checker.
+
+Renders a plugin across every declared screen (mode) and every supported matrix
+size, and fails if any screen crashes, overflows the panel, or (for plugins with
+committed golden images) drifts visually.
+
+Usage:
+    # Functional + bounds check across all sizes/modes:
+    python scripts/check_plugin.py --plugin clock-simple
+
+    # Every discovered plugin:
+    python scripts/check_plugin.py --all
+
+    # Dump PNGs for each size/mode so you can eyeball them:
+    python scripts/check_plugin.py --plugin ledmatrix-weather --out-dir /tmp/preview
+
+    # Refresh committed golden images after an intentional visual change:
+    python scripts/check_plugin.py --plugin clock-simple --update-golden \
+        --mock-data plugins/clock-simple/test/fixtures/mock.json
+
+Exit code is non-zero if any (plugin, size, mode) fails.
+"""
+
+import argparse
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Dict, List, Optional
+
+PROJECT_ROOT = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(PROJECT_ROOT))
+
+os.environ['EMULATOR'] = 'true'
+
+from src.logging_config import get_logger  # noqa: E402
+from src.plugin_system.testing.loading import (  # noqa: E402
+    find_plugin_dir, load_config_defaults,
+)
+from src.plugin_system.testing.harness import (  # noqa: E402
+    RenderResult, render_plugin_matrix, compare_to_goldens, write_goldens,
+)
+from src.plugin_system.testing.sizes import (  # noqa: E402
+    DEFAULT_TEST_SIZES, parse_size_token, safe_mode_filename, size_label,
+)
+
+logger = get_logger("[Check Plugin]")
+
+DEFAULT_SEARCH_DIRS = [
+    str(PROJECT_ROOT / 'plugins'),
+    str(PROJECT_ROOT / 'plugin-repos'),
+]
+
+
+def discover_plugins(search_dirs: List[str]) -> List[str]:
+    """All plugin ids found across the search dirs (dirs containing manifest.json)."""
+    found = []
+    for d in search_dirs:
+        base = Path(d)
+        if not base.exists():
+            continue
+        for child in sorted(base.iterdir()):
+            if (child / 'manifest.json').exists() and child.name not in found:
+                found.append(child.name)
+    return found
+
+
+def parse_sizes(spec: Optional[str]):
+    if not spec:
+        return DEFAULT_TEST_SIZES
+    sizes = []
+    for token in spec.split(','):
+        if not token.strip():
+            continue
+        try:
+            sizes.append(parse_size_token(token))
+        except ValueError as exc:
+            raise SystemExit(str(exc)) from exc
+    return sizes
+
+
+def check_one(plugin_id: str, search_dirs: List[str], sizes, mock_data: Dict,
+              config: Dict, run_update: bool, out_dir: Optional[Path],
+              update_golden: bool, golden_dir_override: Optional[Path],
+              freeze_time: Optional[str]) -> List[RenderResult]:
+    plugin_dir = find_plugin_dir(plugin_id, search_dirs)
+    if not plugin_dir:
+        logger.error("Plugin '%s' not found in: %s", plugin_id, search_dirs)
+        return [RenderResult(plugin_id, 0, 0, "<not-found>", error="plugin directory not found")]
+
+    # Start from config_schema defaults so plugins behave like a real install.
+    full_config = {"enabled": True}
+    full_config.update(load_config_defaults(plugin_dir))
+    full_config.update(config)
+
+    results = render_plugin_matrix(
+        plugin_id=plugin_id, plugin_dir=plugin_dir, config=full_config,
+        mock_data=mock_data, sizes=sizes, run_update=run_update,
+        freeze_time=freeze_time,
+    )
+
+    golden_dir = golden_dir_override or (plugin_dir / 'test' / 'golden')
+    if update_golden:
+        written = write_goldens(results, golden_dir)
+        logger.info("Wrote %d golden image(s) for %s to %s", written, plugin_id, golden_dir)
+    else:
+        compare_to_goldens(results, golden_dir)
+
+    if out_dir:
+        for r in results:
+            if r.image is None:
+                continue
+            dest = out_dir / plugin_id / size_label(r.width, r.height)
+            dest.mkdir(parents=True, exist_ok=True)
+            r.image.save(dest / f"{safe_mode_filename(r.mode)}.png", format="PNG")
+
+    return results
+
+
+def print_report(all_results: Dict[str, List[RenderResult]]) -> bool:
+    """Print a per-plugin grid. Returns True if everything passed."""
+    everything_ok = True
+    for plugin_id, results in all_results.items():
+        print(f"\n=== {plugin_id} ===")
+        for r in results:
+            if r.ok:
+                status = "PASS"
+                detail = ""
+                if r.golden_checked:
+                    detail = " (golden ✓)"
+                if r.update_error is not None:
+                    detail += f" (update warn: {r.update_error})"
+            else:
+                everything_ok = False
+                if r.error is not None:
+                    status, detail = "FAIL", f" error={r.error}"
+                elif r.overflow is not None:
+                    status, detail = "FAIL", f" overflow bbox={r.overflow}"
+                elif r.golden_ok is False:
+                    status = "FAIL"
+                    detail = f" golden drift: {r.golden_diff_pixels}px (max Δ={r.golden_max_delta})"
+                else:
+                    status, detail = "FAIL", ""
+            print(f"  [{status}] {r.size_label:>7}  {r.mode}{detail}")
+    print()
+    return everything_ok
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Check a plugin renders safely across sizes & screens")
+    group = parser.add_mutually_exclusive_group(required=True)
+    group.add_argument('--plugin', '-p', help='Plugin id to check')
+    group.add_argument('--all', action='store_true', help='Check every discovered plugin')
+    parser.add_argument('--plugin-dir', '-d', default=None, help='Directory to search for plugins')
+    parser.add_argument('--sizes', default=None, help='Comma-separated WxH list (default: all supported)')
+    parser.add_argument('--config', '-c', default='{}', help='Plugin config overrides as JSON')
+    parser.add_argument('--mock-data', '-m', default=None, help='Path to JSON file with mock cache data')
+    parser.add_argument('--out-dir', '-o', default=None, help='Also dump rendered PNGs here')
+    parser.add_argument('--skip-update', action='store_true', help='Skip calling update()')
+    parser.add_argument('--update-golden', action='store_true', help='Write/refresh golden images')
+    parser.add_argument('--golden-dir', default=None, help='Override golden dir (default: <plugin>/test/golden)')
+    parser.add_argument('--freeze-time', default=None,
+                        help='Freeze wall clock, e.g. "2025-08-01 15:25:00" (for time-dependent plugins)')
+    args = parser.parse_args()
+
+    search_dirs = [args.plugin_dir] if args.plugin_dir else DEFAULT_SEARCH_DIRS
+    sizes = parse_sizes(args.sizes)
+
+    try:
+        config = json.loads(args.config)
+    except json.JSONDecodeError as e:
+        logger.error("Invalid --config JSON: %s", e)
+        return 2
+    if not isinstance(config, dict):
+        logger.error("--config must be a JSON object, got %s", type(config).__name__)
+        return 2
+
+    mock_data = {}
+    if args.mock_data:
+        mock_path = Path(args.mock_data)
+        if not mock_path.exists():
+            logger.error("Mock data file not found: %s", args.mock_data)
+            return 2
+        with open(mock_path) as f:
+            mock_data = json.load(f)
+        if not isinstance(mock_data, dict):
+            logger.error("--mock-data must be a JSON object (key -> cache value), got %s",
+                         type(mock_data).__name__)
+            return 2
+
+    plugin_ids = discover_plugins(search_dirs) if args.all else [args.plugin]
+    if not plugin_ids:
+        logger.error("No plugins found in: %s", search_dirs)
+        return 2
+
+    out_dir = Path(args.out_dir) if args.out_dir else None
+    golden_dir_override = Path(args.golden_dir) if args.golden_dir else None
+
+    all_results: Dict[str, List[RenderResult]] = {}
+    for plugin_id in plugin_ids:
+        all_results[plugin_id] = check_one(
+            plugin_id=plugin_id, search_dirs=search_dirs, sizes=sizes,
+            mock_data=mock_data, config=config, run_update=not args.skip_update,
+            out_dir=out_dir, update_golden=args.update_golden,
+            golden_dir_override=golden_dir_override, freeze_time=args.freeze_time,
+        )
+
+    # When refreshing goldens we skip drift comparison, but a crash or overflow
+    # still means the plugin is broken — never let --update-golden mask that.
+    ok = print_report(all_results)
+    return 0 if ok else 1
+
+
+if __name__ == '__main__':
+    sys.exit(main())