Add cross-size/cross-screen plugin safety harness (#361)

* feat(testing): add cross-size/cross-screen plugin safety harness Render every plugin across all supported matrix sizes (64x32, 128x32, 128x64, 256x32) and every declared screen, failing on crashes, content drawn past the panel edge, or visual drift vs committed golden images. - BoundsCheckingDisplayManager: oversized-canvas overflow detection - harness.py: multi-size/multi-screen render engine + golden compare - scripts/check_plugin.py: CLI (functional+bounds, --out-dir, --update-golden, --freeze-time); render_plugin.py refactored onto shared loading helpers - test/plugins/test_harness.py + test_plugin_matrix.py (parametrized, honors per-plugin test/harness.json; skips when no plugins present) - MockCacheManager.cache_dir so cache-dir-using plugins load headlessly - .github/workflows/test.yml + docs/plugin-safety-harness.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(testing): address PR review feedback on plugin safety harness - check_plugin: friendly error for non-numeric --sizes; reject non-object --config / --mock-data JSON; sanitize plugin mode before using as a filename; stop --update-golden from masking crash/overflow failures - bounds_display_manager: pad the canvas out to the largest supported panel (not a fixed 16px) so far-overshoot coordinates are caught, not clipped - harness: merge config_schema defaults inside render_plugin_matrix; surface update() failures as a non-fatal warning + result field instead of a debug log; sanitize mode in golden_path - loading: fail fast when harness.json references a missing mock_data fixture - mocks: clean up the per-instance temp cache dir via weakref.finalize - test_plugin_matrix: add a discovery guard that fails when LEDMATRIX_REQUIRE_PLUGINS=1 but none found (still skips locally); type hints - bound test deps with upper version pins for deterministic CI Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(testing): render plugins across arbitrary panel sizes, not a fixed list Addresses maintainer feedback that there is no canonical set of supported panel sizes — a build can be any size/configuration (square, 2x2, 4x4, 8x2, long strips, tall stacks). - sizes.py: SUPPORTED_SIZES -> DEFAULT_TEST_SIZES (back-compat alias kept), reframed as a representative SAMPLE of real panel-grid arrangements rather than an authoritative list; add parse_size_token / coerce_sizes / resolve_test_sizes helpers - sizes are now fully overridable: LEDMATRIX_TEST_SIZES env (global, e.g. test on your exact hardware) > per-plugin harness.json "sizes" > default sample; CLI --sizes unchanged - bounds_display_manager: pad the canvas to the largest panel IN THE CURRENT RUN (via overflow_extent) instead of a hardcoded max, so cross-size overflow detection scales to whatever sizes a run uses - harness: compute per-run extent and thread it into the bounds manager - tests: arbitrary-shape + size-parsing/precedence coverage - docs: rewrite "Supported sizes" -> "Sizes: a sample, not a fixed list" Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(testing): fail the harness on non-connectivity update() errors Addresses the remaining review thread: recording every update() exception as a non-fatal warning still let a real update() regression pass green as long as display() survived. Now update() failures are classified — a tolerated set of connectivity errors (ConnectionError/TimeoutError/socket/ssl/urllib/http/ requests) is recorded non-fatally (expected with no network in CI), while any other exception is treated as a genuine bug and fails that render. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ci(security): pin actions to SHAs and disable checkout credential persistence Addresses the CodeRabbit/zizmor workflow-hardening finding: pin actions/checkout and actions/setup-python to full commit SHAs and set persist-credentials: false on checkout to reduce supply-chain and token-exposure risk. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(testing): validate positive sizes; narrow requests import except Two review findings: - sizes.py: parse_size_token / coerce_sizes now reject non-positive dimensions (0x32, -64x32) with a clear message instead of passing invalid sizes downstream (CodeRabbit). - harness.py: the optional `requests` import now catches ImportError specifically and logs instead of `except Exception: pass`, clearing the Codacy medium "Try, Except, Pass" (harness.py L52) and Ruff S110/BLE001. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-08-02 17:28:05 +00:00 · 2026-06-05 14:32:52 -04:00
parent 122e6d6863
commit 313e35a98f
13 changed files with 1360 additions and 38 deletions
@@ -0,0 +1,33 @@
 name: Tests
 on:
  pull_request:
  push:
    branches: [main]
 jobs:
  plugin-safety:
    name: Plugin safety harness + unit tests
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          persist-credentials: false
      - uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
        with:
          python-version: "3.12"
          cache: pip
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt -r requirements-test.txt
          pip install RGBMatrixEmulator
      - name: Run harness + visual rendering tests
        run: |
          pytest --no-cov \
            test/plugins/test_harness.py \
            test/plugins/test_visual_rendering.py \
            test/plugins/test_plugin_matrix.py
@@ -0,0 +1,136 @@
 # Plugin Safety Harness
 Renders a plugin across **every declared screen (mode)** and **a spread of
 matrix sizes**, and fails if any combination crashes, draws past the panel edge,
 or — for plugins that ship golden images — drifts visually. The goal: change a
 plugin without breaking a size or screen you didn't think to test.
 ## Sizes: a sample, not a fixed list
 There is **no fixed set of supported panel sizes** — an RGB matrix build can be
 any width/height and configuration (square, rectangle, 2×2, 4×4, 8×2, long
 strips, tall stacks). Plugins are expected to read dimensions dynamically
 (`self.display_manager.matrix.width/height`) and lay themselves out
 accordingly, so a hardcoded coordinate or unscaled font shows up as a failure
 here.
 The harness therefore renders against a **representative sample** that spans the
 axes of variation (`DEFAULT_TEST_SIZES` in `src/plugin_system/testing/sizes.py`),
 not an authoritative list:
 Each module is 64×32; entries are real panel-grid arrangements (cols × rows):
 | Size    | Grid | Why it's in the sample                     |
 |---------|------|--------------------------------------------|
 | 64×32   | 1×1  | single panel — tightest common rectangle   |
 | 128×32  | 2×1  | the baseline most plugins are tuned for    |
 | 64×64   | 1×2  | stacked — tall-narrow centering            |
 | 128×64  | 2×2  | block — icon scaling / vertical centering  |
 | 256×32  | 4×1  | long strip — wide horizontal layout        |
 | 128×96  | 2×3  | tall — vertical overflow                   |
 | 256×128 | 4×4  | large block — both dimensions big at once  |
 **Override the sizes entirely** to test your actual hardware (or any shape):
 ```bash
 # CLI — one-off:
 python scripts/check_plugin.py --plugin clock-simple --sizes 8x16,64x64,256x32
 # pytest — force every plugin onto your panel(s):
 LEDMATRIX_TEST_SIZES="8x16,128x128" pytest test/plugins/test_plugin_matrix.py
 # Per-plugin — declare the shapes a plugin targets in its test/harness.json:
 #   { "sizes": [[8, 16], [64, 64]] }
 ```
 Precedence: `LEDMATRIX_TEST_SIZES` env (global) → per-plugin `harness.json`
 `sizes` → the default sample. Bounds checking adapts to whatever sizes a run
 uses — the backing canvas is padded out to the **largest** panel in the run, so
 a coordinate meant for a big build is still caught when rendering a small one.
 ## Quick start
 ```bash
 # Functional + bounds check across all sizes/screens:
 python scripts/check_plugin.py --plugin clock-simple
 # Every discovered plugin:
 python scripts/check_plugin.py --all
 # Dump PNGs to eyeball each size/screen:
 python scripts/check_plugin.py --plugin ledmatrix-weather --out-dir /tmp/preview
 ```
 Exit code is non-zero if any `(plugin, size, screen)` fails. Plugins are
 discovered in `plugin-repos/` and `plugins/` (override with `--plugin-dir`).
 ## What it checks (Phase 1 — always on)
 1. **Loads** and builds its mode list.
 2. **Renders every screen** at every size without raising. `update()` may fail
   (no network in CI) and is tolerated; a crash in `display()` is a failure —
   `display()` must handle the no-data state.
 3. **Bounds**: nothing is drawn past the right/bottom edge. Implemented by
   `BoundsCheckingDisplayManager`, which backs the declared panel with an
   oversized canvas and flags any pixels that land in the margin. (Left/top
   overflow at negative coordinates and BDF text are not flagged — golden images
   cover those.)
 ## Golden images (Phase 2 — opt-in per plugin)
 A plugin opts in by committing reference PNGs and (usually) a small harness spec:
 ```
 plugins/<id>/test/harness.json          # how to render deterministically
 plugins/<id>/test/fixtures/mock.json     # optional cached data
 plugins/<id>/test/golden/<WxH>/<mode>.png
 ```
 `test/harness.json` keys (all optional):
 ```json
 {
  "config":      { "timezone": "UTC" },
  "mock_data":   "fixtures/mock.json",
  "freeze_time": "2025-08-01 15:25:00",
  "skip_update": false,
  "sizes":       [[128, 32], [128, 64]]
 }
 ```
 Generate / refresh goldens after an intentional visual change, then review the
 diff before committing:
 ```bash
 python scripts/check_plugin.py --plugin clock-simple --update-golden \
  --config '{"timezone":"UTC"}' --freeze-time "2025-08-01 15:25:00"
 ```
 Comparison is exact by default (`compare_images` in `harness.py` accepts a
 tolerance for known anti-aliasing noise). Determinism requires a pinned Pillow
 and the bundled fonts — keep both stable when regenerating goldens.
 ## Tests & CI
 - `test/plugins/test_harness.py` — unit tests for bounds detection, image
  comparison, and mode enumeration (run anywhere).
 - `test/plugins/test_plugin_matrix.py` — parametrized over discovered plugins ×
  sizes × screens; honors each plugin's `test/harness.json` and goldens. Skips
  when no plugins are present (e.g. a fresh core checkout); set
  `LEDMATRIX_REQUIRE_PLUGINS=1` in a pipeline where plugins must be present to
  turn an empty discovery into a hard failure instead. Point it at the monorepo
  with `LEDMATRIX_PLUGINS_DIR=/path/to/ledmatrix-plugins/plugins`.
 - `.github/workflows/test.yml` — runs the harness + visual tests on every PR.
 The plugin monorepo has its own `Plugin Safety` workflow that runs this harness
 against changed plugins on every PR.
 ## Developer workflow
 1. Change the plugin on a branch.
 2. `python scripts/check_plugin.py --plugin <id> --out-dir /tmp/preview` and
   eyeball the PNGs.
 3. Intentional visual change? `--update-golden`, review diffs, commit goldens.
 4. (Monorepo) bump `manifest.json` version and let the pre-commit hook sync
   `plugins.json`.
 5. Push — CI re-runs the harness across all sizes and gates the PR.
@@ -0,0 +1,8 @@
 # Test-only dependencies for the plugin safety harness and pytest suite.
 # Install alongside requirements.txt:  pip install -r requirements.txt -r requirements-test.txt
 # Upper bounds pin the major version so a new release can't silently change
 # golden-image / time-sensitive test behavior between CI runs.
 pytest>=7.4,<9
 pytest-cov>=4.1,<7
 jsonschema>=4.0,<5       # manifest validation
 freezegun>=1.2,<2        # deterministic time for golden-image tests
@@ -0,0 +1,217 @@
 #!/usr/bin/env python3
 """
 Plugin safety checker.
 Renders a plugin across every declared screen (mode) and every supported matrix
 size, and fails if any screen crashes, overflows the panel, or (for plugins with
 committed golden images) drifts visually.
 Usage:
    # Functional + bounds check across all sizes/modes:
    python scripts/check_plugin.py --plugin clock-simple
    # Every discovered plugin:
    python scripts/check_plugin.py --all
    # Dump PNGs for each size/mode so you can eyeball them:
    python scripts/check_plugin.py --plugin ledmatrix-weather --out-dir /tmp/preview
    # Refresh committed golden images after an intentional visual change:
    python scripts/check_plugin.py --plugin clock-simple --update-golden \
        --mock-data plugins/clock-simple/test/fixtures/mock.json
 Exit code is non-zero if any (plugin, size, mode) fails.
 """
 import argparse
 import json
 import os
 import sys
 from pathlib import Path
 from typing import Dict, List, Optional
 PROJECT_ROOT = Path(__file__).resolve().parent.parent
 sys.path.insert(0, str(PROJECT_ROOT))
 os.environ['EMULATOR'] = 'true'
 from src.logging_config import get_logger  # noqa: E402
 from src.plugin_system.testing.loading import (  # noqa: E402
    find_plugin_dir, load_config_defaults,
 )
 from src.plugin_system.testing.harness import (  # noqa: E402
    RenderResult, render_plugin_matrix, compare_to_goldens, write_goldens,
 )
 from src.plugin_system.testing.sizes import (  # noqa: E402
    DEFAULT_TEST_SIZES, parse_size_token, safe_mode_filename, size_label,
 )
 logger = get_logger("[Check Plugin]")
 DEFAULT_SEARCH_DIRS = [
    str(PROJECT_ROOT / 'plugins'),
    str(PROJECT_ROOT / 'plugin-repos'),
 ]
 def discover_plugins(search_dirs: List[str]) -> List[str]:
    """All plugin ids found across the search dirs (dirs containing manifest.json)."""
    found = []
    for d in search_dirs:
        base = Path(d)
        if not base.exists():
            continue
        for child in sorted(base.iterdir()):
            if (child / 'manifest.json').exists() and child.name not in found:
                found.append(child.name)
    return found
 def parse_sizes(spec: Optional[str]):
    if not spec:
        return DEFAULT_TEST_SIZES
    sizes = []
    for token in spec.split(','):
        if not token.strip():
            continue
        try:
            sizes.append(parse_size_token(token))
        except ValueError as exc:
            raise SystemExit(str(exc)) from exc
    return sizes
 def check_one(plugin_id: str, search_dirs: List[str], sizes, mock_data: Dict,
              config: Dict, run_update: bool, out_dir: Optional[Path],
              update_golden: bool, golden_dir_override: Optional[Path],
              freeze_time: Optional[str]) -> List[RenderResult]:
    plugin_dir = find_plugin_dir(plugin_id, search_dirs)
    if not plugin_dir:
        logger.error("Plugin '%s' not found in: %s", plugin_id, search_dirs)
        return [RenderResult(plugin_id, 0, 0, "<not-found>", error="plugin directory not found")]
    # Start from config_schema defaults so plugins behave like a real install.
    full_config = {"enabled": True}
    full_config.update(load_config_defaults(plugin_dir))
    full_config.update(config)
    results = render_plugin_matrix(
        plugin_id=plugin_id, plugin_dir=plugin_dir, config=full_config,
        mock_data=mock_data, sizes=sizes, run_update=run_update,
        freeze_time=freeze_time,
    )
    golden_dir = golden_dir_override or (plugin_dir / 'test' / 'golden')
    if update_golden:
        written = write_goldens(results, golden_dir)
        logger.info("Wrote %d golden image(s) for %s to %s", written, plugin_id, golden_dir)
    else:
        compare_to_goldens(results, golden_dir)
    if out_dir:
        for r in results:
            if r.image is None:
                continue
            dest = out_dir / plugin_id / size_label(r.width, r.height)
            dest.mkdir(parents=True, exist_ok=True)
            r.image.save(dest / f"{safe_mode_filename(r.mode)}.png", format="PNG")
    return results
 def print_report(all_results: Dict[str, List[RenderResult]]) -> bool:
    """Print a per-plugin grid. Returns True if everything passed."""
    everything_ok = True
    for plugin_id, results in all_results.items():
        print(f"\n=== {plugin_id} ===")
        for r in results:
            if r.ok:
                status = "PASS"
                detail = ""
                if r.golden_checked:
                    detail = " (golden ✓)"
                if r.update_error is not None:
                    detail += f" (update warn: {r.update_error})"
            else:
                everything_ok = False
                if r.error is not None:
                    status, detail = "FAIL", f" error={r.error}"
                elif r.overflow is not None:
                    status, detail = "FAIL", f" overflow bbox={r.overflow}"
                elif r.golden_ok is False:
                    status = "FAIL"
                    detail = f" golden drift: {r.golden_diff_pixels}px (max Δ={r.golden_max_delta})"
                else:
                    status, detail = "FAIL", ""
            print(f"  [{status}] {r.size_label:>7}  {r.mode}{detail}")
    print()
    return everything_ok
 def main() -> int:
    parser = argparse.ArgumentParser(description="Check a plugin renders safely across sizes & screens")
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument('--plugin', '-p', help='Plugin id to check')
    group.add_argument('--all', action='store_true', help='Check every discovered plugin')
    parser.add_argument('--plugin-dir', '-d', default=None, help='Directory to search for plugins')
    parser.add_argument('--sizes', default=None, help='Comma-separated WxH list (default: all supported)')
    parser.add_argument('--config', '-c', default='{}', help='Plugin config overrides as JSON')
    parser.add_argument('--mock-data', '-m', default=None, help='Path to JSON file with mock cache data')
    parser.add_argument('--out-dir', '-o', default=None, help='Also dump rendered PNGs here')
    parser.add_argument('--skip-update', action='store_true', help='Skip calling update()')
    parser.add_argument('--update-golden', action='store_true', help='Write/refresh golden images')
    parser.add_argument('--golden-dir', default=None, help='Override golden dir (default: <plugin>/test/golden)')
    parser.add_argument('--freeze-time', default=None,
                        help='Freeze wall clock, e.g. "2025-08-01 15:25:00" (for time-dependent plugins)')
    args = parser.parse_args()
    search_dirs = [args.plugin_dir] if args.plugin_dir else DEFAULT_SEARCH_DIRS
    sizes = parse_sizes(args.sizes)
    try:
        config = json.loads(args.config)
    except json.JSONDecodeError as e:
        logger.error("Invalid --config JSON: %s", e)
        return 2
    if not isinstance(config, dict):
        logger.error("--config must be a JSON object, got %s", type(config).__name__)
        return 2
    mock_data = {}
    if args.mock_data:
        mock_path = Path(args.mock_data)
        if not mock_path.exists():
            logger.error("Mock data file not found: %s", args.mock_data)
            return 2
        with open(mock_path) as f:
            mock_data = json.load(f)
        if not isinstance(mock_data, dict):
            logger.error("--mock-data must be a JSON object (key -> cache value), got %s",
                         type(mock_data).__name__)
            return 2
    plugin_ids = discover_plugins(search_dirs) if args.all else [args.plugin]
    if not plugin_ids:
        logger.error("No plugins found in: %s", search_dirs)
        return 2
    out_dir = Path(args.out_dir) if args.out_dir else None
    golden_dir_override = Path(args.golden_dir) if args.golden_dir else None
    all_results: Dict[str, List[RenderResult]] = {}
    for plugin_id in plugin_ids:
        all_results[plugin_id] = check_one(
            plugin_id=plugin_id, search_dirs=search_dirs, sizes=sizes,
            mock_data=mock_data, config=config, run_update=not args.skip_update,
            out_dir=out_dir, update_golden=args.update_golden,
            golden_dir_override=golden_dir_override, freeze_time=args.freeze_time,
        )
    # When refreshing goldens we skip drift comparison, but a crash or overflow
    # still means the plugin is broken — never let --update-golden mask that.
    ok = print_report(all_results)
    return 0 if ok else 1
 if __name__ == '__main__':
    sys.exit(main())
@@ -17,7 +17,6 @@ import os
 import json
 import argparse
 from pathlib import Path
 from typing import Any, Dict, Optional, Sequence, Union
 # Add project root to path
 PROJECT_ROOT = Path(__file__).resolve().parent.parent
@@ -28,49 +27,15 @@ os.environ['EMULATOR'] = 'true'
 # Import logger after path setup so src.logging_config is importable
 from src.logging_config import get_logger  # noqa: E402
 from src.plugin_system.testing.loading import (  # noqa: E402
    find_plugin_dir, load_manifest, load_config_defaults,
 )
 logger = get_logger("[Render Plugin]")
 MIN_DIMENSION = 1
 MAX_DIMENSION = 512
 def find_plugin_dir(plugin_id: str, search_dirs: Sequence[Union[str, Path]]) -> Optional[Path]:
    """Find a plugin directory by searching multiple paths."""
    from src.plugin_system.plugin_loader import PluginLoader
    loader = PluginLoader()
    for search_dir in search_dirs:
        search_path = Path(search_dir)
        if not search_path.exists():
            continue
        result = loader.find_plugin_directory(plugin_id, search_path)
        if result:
            return Path(result)
    return None
 def load_manifest(plugin_dir: Path) -> Dict[str, Any]:
    """Load and return manifest.json from plugin directory."""
    manifest_path = plugin_dir / 'manifest.json'
    if not manifest_path.exists():
        raise FileNotFoundError(f"No manifest.json in {plugin_dir}")
    with open(manifest_path, 'r') as f:
        return json.load(f)
 def load_config_defaults(plugin_dir: Path) -> Dict[str, Any]:
    """Extract default values from config_schema.json."""
    schema_path = plugin_dir / 'config_schema.json'
    if not schema_path.exists():
        return {}
    with open(schema_path, 'r') as f:
        schema = json.load(f)
    defaults: Dict[str, Any] = {}
    for key, prop in schema.get('properties', {}).items():
        if 'default' in prop:
            defaults[key] = prop['default']
    return defaults
 def main() -> int:
    """Load a plugin, call update() + display(), and save the result as a PNG image."""
    parser = argparse.ArgumentParser(description='Render a plugin display to a PNG image')
@@ -7,13 +7,22 @@ Provides base classes and utilities for testing LEDMatrix plugins.
 from .plugin_test_base import PluginTestCase
 from .mocks import MockDisplayManager, MockCacheManager, MockConfigManager, MockPluginManager
 from .visual_display_manager import VisualTestDisplayManager
 from .bounds_display_manager import BoundsCheckingDisplayManager
 from .sizes import (
    DEFAULT_TEST_SIZES, SUPPORTED_SIZES, resolve_test_sizes, size_label,
 )
 __all__ = [
    'PluginTestCase',
    'VisualTestDisplayManager',
    'BoundsCheckingDisplayManager',
    'MockDisplayManager',
    'MockCacheManager',
    'MockConfigManager',
    'MockPluginManager',
    'DEFAULT_TEST_SIZES',
    'SUPPORTED_SIZES',
    'resolve_test_sizes',
    'size_label',
 ]
@@ -0,0 +1,129 @@
 """
 Bounds-checking display manager.
 A VisualTestDisplayManager that draws onto an oversized canvas (the declared
 panel size plus a right/bottom margin) while still reporting the declared size
 to the plugin. Content that a plugin draws past the right or bottom edge lands
 in the margin instead of being silently clipped by PIL, so the harness can
 detect overflow — the classic symptom of hardcoded coordinates or fonts/icons
 that don't scale down to a smaller panel.
 Limitations (documented on purpose):
 - Overflow past the LEFT or TOP edge (negative coordinates) is still clipped by
  PIL and not detected here. The dominant real-world breakage is content that is
  too wide/tall for a smaller panel, which this catches.
 - BDF text is clipped to the declared bounds by the parent's bitmap drawer, so
  BDF overflow is not flagged. Golden-image regression covers those plugins.
 - If a plugin replaces the canvas with its own image (display_manager.image = ...),
  the margin can't be measured and overflow is reported as undetermined (None).
 """
 from typing import Optional, Tuple
 from .sizes import DEFAULT_TEST_SIZES
 from .visual_display_manager import VisualTestDisplayManager, _MatrixProxy
 # Smallest extra band kept on the right/bottom so a few pixels of overflow are
 # still visible even on the largest panel in a run.
 _BASE_MARGIN = 16
 # Fallback overflow reference when a caller doesn't pass one: the largest shape
 # in the default sample. We extend every (smaller) canvas out to at least this
 # size so content drawn at a coordinate meant for a bigger build — e.g. x=200 on
 # a 64-wide panel — lands in the padded region and is flagged, instead of being
 # clipped off-canvas and read as a false pass.
 _DEFAULT_EXTENT_WIDTH = max(w for w, _ in DEFAULT_TEST_SIZES)
 _DEFAULT_EXTENT_HEIGHT = max(h for _, h in DEFAULT_TEST_SIZES)
 class BoundsCheckingDisplayManager(VisualTestDisplayManager):
    """Detects drawing that overflows the declared panel size."""
    # Kept for backwards compatibility; real padding is computed per-axis below.
    MARGIN = _BASE_MARGIN
    def __init__(self, width: int = 128, height: int = 32,
                 overflow_extent: Optional[Tuple[int, int]] = None):
        self._declared_width = int(width)
        self._declared_height = int(height)
        # Pad the canvas out to at least `overflow_extent` (the largest panel
        # this run cares about) plus a base margin, so coordinates meant for a
        # bigger build are caught — not clipped — when rendering a smaller panel.
        # Defaults to the largest shape in the sample when no run is known.
        ext_w, ext_h = overflow_extent or (_DEFAULT_EXTENT_WIDTH, _DEFAULT_EXTENT_HEIGHT)
        self._canvas_width = max(self._declared_width, int(ext_w)) + _BASE_MARGIN
        self._canvas_height = max(self._declared_height, int(ext_h)) + _BASE_MARGIN
        # Parent builds the (oversized) backing canvas + fonts.
        super().__init__(self._canvas_width, self._canvas_height)
        # Plugins must see the DECLARED size, not the padded canvas size.
        self.matrix = _MatrixProxy(self._declared_width, self._declared_height)
    # -- declared dimensions (override parent's image-derived properties) --
    @property
    def width(self) -> int:
        return self._declared_width
    @property
    def height(self) -> int:
        return self._declared_height
    @property
    def display_width(self) -> int:
        return self._declared_width
    @property
    def display_height(self) -> int:
        return self._declared_height
    # -- overflow detection --
    def _canvas_is_padded(self) -> bool:
        return self.image.size == (self._canvas_width, self._canvas_height)
    def check_overflow(self) -> Optional[Tuple[int, int, int, int]]:
        """Bounding box (in full-canvas coords) of any drawing beyond the
        declared panel, or None if nothing overflowed / undetermined."""
        if not self._canvas_is_padded():
            return None
        exp_w = self._canvas_width
        exp_h = self._canvas_height
        boxes = []
        right = self.image.crop((self._declared_width, 0, exp_w, exp_h)).getbbox()
        if right:
            boxes.append((right[0] + self._declared_width, right[1],
                          right[2] + self._declared_width, right[3]))
        bottom = self.image.crop((0, self._declared_height, exp_w, exp_h)).getbbox()
        if bottom:
            boxes.append((bottom[0], bottom[1] + self._declared_height,
                          bottom[2], bottom[3] + self._declared_height))
        if not boxes:
            return None
        return (
            min(b[0] for b in boxes), min(b[1] for b in boxes),
            max(b[2] for b in boxes), max(b[3] for b in boxes),
        )
    # -- snapshot/image accessors return the cropped, true-panel image --
    def declared_image(self):
        """The visible panel: the canvas cropped to the declared size."""
        if self._canvas_is_padded():
            return self.image.crop((0, 0, self._declared_width, self._declared_height))
        return self.image
    def save_snapshot(self, path: str) -> None:
        self.declared_image().save(path, format='PNG')
    def get_image(self):
        return self.declared_image()
    def get_image_base64(self) -> str:
        import base64
        import io
        buffer = io.BytesIO()
        self.declared_image().save(buffer, format='PNG')
        return base64.b64encode(buffer.getvalue()).decode('utf-8')
@@ -0,0 +1,314 @@
 """
 Plugin safety harness.
 Renders a plugin across every declared screen (mode) and every supported matrix
 size, capturing crashes and overflow. Used by scripts/check_plugin.py and the
 pytest matrix test to guarantee a plugin change doesn't break a screen at a size
 the author didn't try.
 The render flow mirrors scripts/render_plugin.py (same PluginLoader call), but
 this module adds: multi-size iteration, per-mode rendering, overflow detection
 via BoundsCheckingDisplayManager, and golden-image comparison.
 """
 import contextlib
 import http.client
 import inspect
 import socket
 import ssl
 import urllib.error
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Tuple
 from PIL import Image, ImageChops
 from src.logging_config import get_logger
 from .bounds_display_manager import BoundsCheckingDisplayManager
 from .loading import load_config_defaults, load_manifest
 from .sizes import DEFAULT_TEST_SIZES, safe_mode_filename, size_label
 logger = get_logger("[Plugin Harness]")
 def _tolerated_update_errors() -> Tuple[type, ...]:
    """Exception types from update() we treat as a tolerated no-connectivity
    failure (expected in CI / headless dev) rather than a real plugin bug.
    Anything NOT in this set is a genuine regression — a plugin that lets a
    non-network exception escape update() should fail the harness, not pass
    green because display() happened to survive.
    """
    types: List[type] = [
        ConnectionError, TimeoutError,        # builtins
        socket.gaierror, socket.timeout,      # DNS / socket timeouts
        ssl.SSLError,
        urllib.error.URLError,
        http.client.HTTPException,
    ]
    try:  # requests is optional; cover its whole error tree when present
        import requests
        types.append(requests.exceptions.RequestException)
    except ImportError:  # pragma: no cover - requests not installed
        logger.debug("requests not installed; its connectivity errors won't be specifically tolerated")
    return tuple(types)
 _TOLERATED_UPDATE_ERRORS = _tolerated_update_errors()
@dataclass
 class RenderResult:
    """Outcome of rendering one (size, mode) of a plugin."""
    plugin_id: str
    width: int
    height: int
    mode: str
    image: Optional[Image.Image] = None
    error: Optional[str] = None          # fatal: load/display crash, or a non-network update() error
    update_error: Optional[str] = None   # tolerated: connectivity error from update() (no network in CI)
    overflow: Optional[Tuple[int, int, int, int]] = None  # bbox past the panel
    # golden comparison (populated only when a golden was provided)
    golden_checked: bool = False
    golden_ok: Optional[bool] = None
    golden_diff_pixels: int = 0
    golden_max_delta: int = 0
    @property
    def size_label(self) -> str:
        return size_label(self.width, self.height)
    @property
    def ok(self) -> bool:
        """Phase-1 pass: rendered without crashing and without overflow, and if a
        golden was checked it matched."""
        if self.error is not None or self.overflow is not None:
            return False
        if self.golden_checked and self.golden_ok is False:
            return False
        return True
 def list_modes(plugin_instance: Any, manifest: Dict[str, Any], plugin_id: str) -> List[str]:
    """Enumerate a plugin's screens: instance.modes wins, then manifest
    display_modes, then the plugin id as a single mode."""
    modes = getattr(plugin_instance, "modes", None)
    if modes:
        return [str(m) for m in modes]
    declared = manifest.get("display_modes")
    if declared:
        return [str(m) for m in declared]
    return [plugin_id]
 def _instantiate(plugin_id: str, manifest: Dict[str, Any], plugin_dir: Path,
                 config: Dict[str, Any], mock_data: Dict[str, Any],
                 display_manager: Any) -> Any:
    """Load and construct a plugin instance with mocked managers."""
    from src.plugin_system.plugin_loader import PluginLoader
    from src.plugin_system.testing import MockCacheManager, MockPluginManager
    cache_manager = MockCacheManager()
    for key, value in (mock_data or {}).items():
        cache_manager.set(key, value)
    loader = PluginLoader()
    plugin_instance, _module = loader.load_plugin(
        plugin_id=plugin_id,
        manifest=manifest,
        plugin_dir=plugin_dir,
        config=config,
        display_manager=display_manager,
        cache_manager=cache_manager,
        plugin_manager=MockPluginManager(),
        install_deps=False,
    )
    return plugin_instance
 def _render_mode(plugin_instance: Any, mode: str) -> None:
    """Render a specific screen. Prefer an explicit display_mode kwarg; otherwise
    drive the plugin's internal mode state machine (first display() call renders
    modes[current_mode_index] when current_display_mode is None)."""
    sig = inspect.signature(plugin_instance.display)
    if "display_mode" in sig.parameters:
        plugin_instance.display(force_clear=True, display_mode=mode)
        return
    modes = getattr(plugin_instance, "modes", None)
    if modes and mode in modes:
        plugin_instance.current_mode_index = list(modes).index(mode)
    if hasattr(plugin_instance, "current_display_mode"):
        plugin_instance.current_display_mode = None
    plugin_instance.display(force_clear=False)
 def _freeze(freeze_time: Optional[str]):
    """Context manager that freezes wall-clock time when freeze_time is given,
    so time-dependent plugins (clocks, countdowns) render deterministic goldens."""
    if not freeze_time:
        return contextlib.nullcontext()
    try:
        from freezegun import freeze_time as _ft
    except ImportError as e:  # pragma: no cover - only hit without the dep
        raise RuntimeError(
            "freeze_time requires the 'freezegun' package (pip install freezegun)"
        ) from e
    return _ft(freeze_time)
 def render_plugin_matrix(
    plugin_id: str,
    plugin_dir: Path,
    config: Optional[Dict[str, Any]] = None,
    mock_data: Optional[Dict[str, Any]] = None,
    sizes: Optional[List[Tuple[int, int]]] = None,
    run_update: bool = True,
    freeze_time: Optional[str] = None,
 ) -> List[RenderResult]:
    """Render every (size, mode) combination for a plugin.
    Returns a flat list of RenderResult. A fresh plugin instance is built per
    (size, mode) so state never leaks between screens. Pass freeze_time (e.g.
    "2025-08-01 15:25:00") to make time-dependent plugins reproducible.
    """
    plugin_dir = Path(plugin_dir)
    manifest = load_manifest(plugin_dir)
    # Start from config_schema.json defaults so the plugin behaves like a real
    # install; explicit caller config still wins over a schema default.
    config = {"enabled": True, **load_config_defaults(plugin_dir), **(config or {})}
    sizes = sizes or DEFAULT_TEST_SIZES
    results: List[RenderResult] = []
    # The largest panel in this run. Every (smaller) canvas is padded out to it
    # so a coordinate meant for the biggest configuration is still caught when
    # rendering a smaller one, instead of being clipped into a false pass.
    extent = (max(w for w, _ in sizes), max(h for _, h in sizes))
    with _freeze(freeze_time):
        for width, height in sizes:
            results.extend(_render_size(
                plugin_id, manifest, plugin_dir, config, mock_data or {},
                width, height, run_update, extent,
            ))
    return results
 def _render_size(plugin_id, manifest, plugin_dir, config, mock_data,
                 width, height, run_update, extent) -> List[RenderResult]:
    """Render every mode at one size. A fresh instance per mode avoids state leaks."""
    results: List[RenderResult] = []
    # Discover modes once per size (instance build can depend on config).
    try:
        probe_dm = BoundsCheckingDisplayManager(width=width, height=height, overflow_extent=extent)
        probe = _instantiate(plugin_id, manifest, plugin_dir, config, mock_data, probe_dm)
        modes = list_modes(probe, manifest, plugin_id)
    except Exception as e:  # noqa: BLE001 — surface any load failure as a result
        return [RenderResult(plugin_id, width, height, "<load>", error=repr(e))]
    for mode in modes:
        result = RenderResult(plugin_id, width, height, mode)
        dm = BoundsCheckingDisplayManager(width=width, height=height, overflow_extent=extent)
        try:
            inst = _instantiate(plugin_id, manifest, plugin_dir, config, mock_data, dm)
            if run_update:
                try:
                    inst.update()
                except _TOLERATED_UPDATE_ERRORS as e:
                    # Expected when CI / headless dev has no network: record it
                    # (surfaced in the report) but don't fail the run.
                    result.update_error = repr(e)
                    logger.debug("update() connectivity error for %s [%s]: %s", plugin_id, mode, e)
                except Exception as e:  # noqa: BLE001 — a non-network update() failure is a real bug
                    # A regression in update() must not pass green just because
                    # display() survives, so treat it as a failure of this render.
                    result.error = repr(e)
                    logger.warning("update() raised a non-connectivity error for %s [%s]: %s",
                                   plugin_id, mode, e)
            if result.error is None:
                _render_mode(inst, mode)
                result.image = dm.get_image()
                result.overflow = dm.check_overflow()
        except Exception as e:  # noqa: BLE001 — a display crash is a real failure
            result.error = repr(e)
        results.append(result)
    return results
 # ---------------------------------------------------------------------------
 # Golden-image comparison
 # ---------------------------------------------------------------------------
 def compare_images(rendered: Image.Image, golden: Image.Image,
                   max_delta: int = 0, max_diff_pixels: int = 0) -> Tuple[bool, int, int]:
    """Compare two images. Returns (ok, diff_pixel_count, max_per_channel_delta).
    Tolerances default to exact match; bump them only to absorb known platform
    anti-aliasing noise (requires a pinned Pillow + bundled fonts for stability).
    """
    if rendered.size != golden.size:
        return False, rendered.size[0] * rendered.size[1], 255
    a = rendered.convert("RGB")
    b = golden.convert("RGB")
    diff = ImageChops.difference(a, b)
    bbox = diff.getbbox()
    if bbox is None:
        return True, 0, 0
    # Count pixels whose largest per-channel delta exceeds the allowed tolerance,
    # and track the worst delta seen (for reporting).
    diff_pixels = 0
    observed_max = 0
    for px in diff.crop(bbox).getdata():
        m = max(px) if isinstance(px, tuple) else px
        if m > observed_max:
            observed_max = m
        if m > max_delta:
            diff_pixels += 1
    # Pass when the number of out-of-tolerance pixels is within budget.
    ok = diff_pixels <= max_diff_pixels
    return ok, diff_pixels, observed_max
 def golden_path(golden_dir: Path, width: int, height: int, mode: str) -> Path:
    """Location of a golden image: <golden_dir>/<WxH>/<mode>.png.
    The mode is sanitized to a safe basename so a mode name with '/' or '..'
    can't read or write outside the golden directory.
    """
    return Path(golden_dir) / size_label(width, height) / f"{safe_mode_filename(mode)}.png"
 def compare_to_goldens(results: List[RenderResult], golden_dir: Path,
                       max_delta: int = 0, max_diff_pixels: int = 0) -> List[RenderResult]:
    """Compare rendered results against committed goldens, mutating each result's
    golden_* fields. Results with no golden file on disk are left unchecked."""
    for r in results:
        if r.image is None:
            continue
        gp = golden_path(golden_dir, r.width, r.height, r.mode)
        if not gp.exists():
            continue
        r.golden_checked = True
        with Image.open(gp) as g:
            ok, diff_pixels, observed_max = compare_images(
                r.image, g, max_delta=max_delta, max_diff_pixels=max_diff_pixels)
        r.golden_ok = ok
        r.golden_diff_pixels = diff_pixels
        r.golden_max_delta = observed_max
    return results
 def write_goldens(results: List[RenderResult], golden_dir: Path) -> int:
    """Write each successfully-rendered result to its golden path. Returns count."""
    written = 0
    for r in results:
        if r.image is None or r.error is not None:
            continue
        gp = golden_path(golden_dir, r.width, r.height, r.mode)
        gp.parent.mkdir(parents=True, exist_ok=True)
        r.image.save(gp, format="PNG")
        written += 1
    return written
@@ -0,0 +1,82 @@
 """
 Shared helpers for loading a plugin headlessly.
 Used by scripts/render_plugin.py, scripts/check_plugin.py, and the harness so
 plugin discovery / manifest / config-default logic lives in exactly one place.
 """
 import json
 from pathlib import Path
 from typing import Any, Dict, Optional, Sequence, Union
 def find_plugin_dir(plugin_id: str, search_dirs: Sequence[Union[str, Path]]) -> Optional[Path]:
    """Find a plugin directory by searching multiple paths."""
    from src.plugin_system.plugin_loader import PluginLoader
    loader = PluginLoader()
    for search_dir in search_dirs:
        search_path = Path(search_dir)
        if not search_path.exists():
            continue
        result = loader.find_plugin_directory(plugin_id, search_path)
        if result:
            return Path(result)
    return None
 def load_manifest(plugin_dir: Union[str, Path]) -> Dict[str, Any]:
    """Load and return manifest.json from a plugin directory."""
    manifest_path = Path(plugin_dir) / 'manifest.json'
    if not manifest_path.exists():
        raise FileNotFoundError(f"No manifest.json in {plugin_dir}")
    with open(manifest_path, 'r') as f:
        return json.load(f)
 def load_config_defaults(plugin_dir: Union[str, Path]) -> Dict[str, Any]:
    """Extract default values from a plugin's config_schema.json (empty if none)."""
    schema_path = Path(plugin_dir) / 'config_schema.json'
    if not schema_path.exists():
        return {}
    with open(schema_path, 'r') as f:
        schema = json.load(f)
    defaults: Dict[str, Any] = {}
    for key, prop in schema.get('properties', {}).items():
        if isinstance(prop, dict) and 'default' in prop:
            defaults[key] = prop['default']
    return defaults
 def load_harness_spec(plugin_dir: Union[str, Path]) -> Dict[str, Any]:
    """Optional per-plugin harness settings from <plugin>/test/harness.json.
    Lets a plugin opt into golden-image testing by declaring how to render it
    deterministically. All keys optional:
        {
          "config":     {...},            # config overrides
          "mock_data":  "fixtures/mock.json",  # path (relative to plugin dir) to cache fixtures
          "freeze_time": "2025-08-01 15:25:00",
          "skip_update": false
        }
    Returns {} when no harness.json exists.
    """
    spec_path = Path(plugin_dir) / 'test' / 'harness.json'
    if not spec_path.exists():
        return {}
    with open(spec_path, 'r') as f:
        spec = json.load(f)
    # Resolve mock_data path and inline its contents for convenience.
    mock_rel = spec.get('mock_data')
    if mock_rel:
        mock_path = Path(plugin_dir) / mock_rel
        if not mock_path.exists():
            # A declared-but-missing fixture is a harness config error: failing
            # loudly beats silently rendering the plugin with no mock data.
            raise FileNotFoundError(
                f"harness.json references mock_data '{mock_rel}' but "
                f"{mock_path} does not exist"
            )
        with open(mock_path, 'r') as mf:
            spec['mock_data_contents'] = json.load(mf)
    return spec
@@ -63,11 +63,23 @@ class MockCacheManager:
    """Mock cache manager for testing."""
    def __init__(self):
        import shutil
        import tempfile
        import weakref
        self._cache: Dict[str, Any] = {}
        self._cache_timestamps: Dict[str, float] = {}
        self.get_calls = []
        self.set_calls = []
        self.delete_calls = []
        # Real temp dir for plugins that write/read files under cache_dir.
        # Registered for cleanup so each mock instance doesn't leak a tmp dir.
        self.cache_dir = tempfile.mkdtemp(prefix="ledmatrix-mock-cache-")
        self._finalizer = weakref.finalize(
            self, shutil.rmtree, self.cache_dir, ignore_errors=True)
    def cleanup(self) -> None:
        """Remove the temp cache directory created for this instance."""
        self._finalizer()
    def get(self, key: str, max_age: Optional[float] = None) -> Optional[Any]:
        """Get a value from cache."""
@@ -0,0 +1,120 @@
 """
 LED matrix sizes the plugin safety harness renders against.
 There is no fixed set of "supported" panel sizes — an RGB matrix build can be
 any width/height and configuration (square, rectangle, 2x2, 4x4, 8x2, long
 strips, tall stacks, ...). Plugins are expected to read width/height
 dynamically and lay themselves out accordingly, so the harness's job is to
 prove a plugin survives a *spread* of shapes, not a canonical list.
 `DEFAULT_TEST_SIZES` is therefore a representative SAMPLE chosen to span the
 axes of variation (narrow, wide, square, tall, small, long), not an
 exhaustive or authoritative list. Callers can override it entirely:
  - CLI:        scripts/check_plugin.py --sizes 8x16,64x64,256x32
  - pytest:     LEDMATRIX_TEST_SIZES="8x16,64x64" env var (all plugins), or
                per-plugin test/harness.json {"sizes": [[8, 16], [64, 64]]}
 so anyone can point the harness at the exact panel(s) their build uses.
 """
 import os
 from typing import Iterable, List, Optional, Sequence, Tuple, Union
 # A spread of real panel-grid arrangements (each module is 64x32), not a list of
 # "blessed" sizes. Each entry exercises a different layout assumption a plugin
 # might accidentally bake in. Annotations are the panel grid (cols x rows).
 DEFAULT_TEST_SIZES: List[Tuple[int, int]] = [
    (64, 32),    # 1x1 — single panel, the tightest common rectangle
    (128, 32),   # 2x1 — the baseline most plugins are tuned for
    (64, 64),    # 1x2 — stacked, exercises tall-narrow centering
    (128, 64),   # 2x2 — block, icon scaling / vertical centering
    (256, 32),   # 4x1 — long strip, wide horizontal layout
    (128, 96),   # 2x3 — tall, exercises vertical overflow
    (256, 128),  # 4x4 — large block, both dimensions big at once
 ]
 # Backwards-compatible alias. Prefer DEFAULT_TEST_SIZES in new code — the old
 # name implied these were the only valid panel sizes, which they are not.
 SUPPORTED_SIZES = DEFAULT_TEST_SIZES
 def size_label(width: int, height: int) -> str:
    """Human/path-friendly label for a size, e.g. '128x32'."""
    return f"{width}x{height}"
 def parse_size_token(token: str) -> Tuple[int, int]:
    """Parse a single 'WxH' token into an (int, int) pair.
    Raises ValueError (with a user-friendly message) on malformed input so
    callers can surface it however they like.
    """
    cleaned = token.strip().lower()
    if "x" not in cleaned:
        raise ValueError(f"Invalid size '{token}' (expected WxH, e.g. 128x32)")
    w, h = cleaned.split("x", 1)
    try:
        width, height = int(w), int(h)
    except ValueError as exc:
        raise ValueError(
            f"Invalid size '{token}' (expected numeric WxH, e.g. 128x32)"
        ) from exc
    if width <= 0 or height <= 0:
        raise ValueError(
            f"Invalid size '{token}' (width and height must be positive, e.g. 128x32)"
        )
    return (width, height)
 def coerce_sizes(
    value: Union[str, Iterable[Sequence[int]], None]
 ) -> Optional[List[Tuple[int, int]]]:
    """Normalize a size spec into a list of (w, h) tuples, or None if empty.
    Accepts a comma-separated 'WxH,WxH' string (CLI / env var) or an iterable
    of [w, h] / (w, h) pairs (harness.json). Returns None when value is falsy
    so callers can fall back to the default sample.
    """
    if not value:
        return None
    if isinstance(value, str):
        return [parse_size_token(tok) for tok in value.split(",") if tok.strip()]
    sizes: List[Tuple[int, int]] = []
    for pair in value:
        w, h = pair  # raises if not a 2-element sequence
        width, height = int(w), int(h)
        if width <= 0 or height <= 0:
            raise ValueError(f"Invalid size pair {pair!r} (width and height must be positive)")
        sizes.append((width, height))
    return sizes or None
 def resolve_test_sizes(
    spec_sizes: Union[str, Iterable[Sequence[int]], None] = None,
 ) -> List[Tuple[int, int]]:
    """Decide which sizes to render, by precedence:
    1. LEDMATRIX_TEST_SIZES env var — a global "test on my hardware" override
       that wins for every plugin.
    2. spec_sizes — e.g. a per-plugin harness.json "sizes" list.
    3. DEFAULT_TEST_SIZES — the representative sample.
    """
    env = coerce_sizes(os.environ.get("LEDMATRIX_TEST_SIZES"))
    if env:
        return env
    spec = coerce_sizes(spec_sizes)
    if spec:
        return spec
    return list(DEFAULT_TEST_SIZES)
 def safe_mode_filename(mode: str) -> str:
    """A filesystem-safe basename for a plugin mode.
    Mode names come from plugin metadata/render state, so a value containing
    '/' or '..' could otherwise escape the intended output directory. Collapse
    anything that isn't alphanumeric / dash / underscore to '_'.
    """
    cleaned = "".join(ch if ch.isalnum() or ch in ("-", "_") else "_" for ch in mode)
    return cleaned or "mode"
@@ -0,0 +1,182 @@
 """
 Unit tests for the plugin safety harness primitives:
 bounds detection, image comparison, and mode enumeration.
 These don't load real plugins, so they run anywhere (including core CI where
 plugin-repos is empty).
 """
 import pytest
 from PIL import Image
 from src.plugin_system.testing.bounds_display_manager import BoundsCheckingDisplayManager
 from src.plugin_system.testing.harness import (
    _TOLERATED_UPDATE_ERRORS, compare_images, list_modes,
 )
 from src.plugin_system.testing.sizes import (
    DEFAULT_TEST_SIZES, coerce_sizes, parse_size_token, resolve_test_sizes,
 )
 class TestBoundsDetection:
    def test_reports_declared_size_not_canvas_size(self):
        dm = BoundsCheckingDisplayManager(width=64, height=32)
        assert dm.width == 64 and dm.height == 32
        assert dm.matrix.width == 64 and dm.matrix.height == 32
        # Backing canvas is padded out past the declared panel so far-overshoot
        # coordinates land on-canvas and get flagged instead of clipped.
        canvas_w, canvas_h = dm.image.size
        assert canvas_w > 64 and canvas_h > 32
    def test_far_overshoot_on_small_panel_is_detected(self):
        # A coordinate meant for a wide build (x past 64) must still be caught
        # when the declared panel is only 64 wide.
        dm = BoundsCheckingDisplayManager(width=64, height=32)
        dm.draw.rectangle([200, 5, 210, 10], fill=(255, 0, 0))
        bbox = dm.check_overflow()
        assert bbox is not None
        assert bbox[0] >= 64
    def test_in_bounds_drawing_has_no_overflow(self):
        dm = BoundsCheckingDisplayManager(width=64, height=32)
        dm.draw.rectangle([0, 0, 63, 31], fill=(255, 255, 255))
        assert dm.check_overflow() is None
    def test_right_overflow_is_detected(self):
        dm = BoundsCheckingDisplayManager(width=64, height=32)
        # Draw a few pixels past the right edge.
        dm.draw.rectangle([60, 5, 70, 10], fill=(255, 0, 0))
        bbox = dm.check_overflow()
        assert bbox is not None
        assert bbox[0] >= 64  # overflow starts at or past the declared width
    def test_bottom_overflow_is_detected(self):
        dm = BoundsCheckingDisplayManager(width=64, height=32)
        dm.draw.rectangle([5, 30, 10, 40], fill=(0, 255, 0))
        bbox = dm.check_overflow()
        assert bbox is not None
        assert bbox[3] > 32  # overflow extends past the declared height
    def test_declared_image_is_cropped_to_panel(self):
        dm = BoundsCheckingDisplayManager(width=64, height=32)
        assert dm.get_image().size == (64, 32)
    def test_snapshot_saves_cropped_panel(self, tmp_path):
        dm = BoundsCheckingDisplayManager(width=128, height=32)
        out = tmp_path / "snap.png"
        dm.save_snapshot(str(out))
        with Image.open(out) as img:
            assert img.size == (128, 32)
 class TestArbitraryPanelSizes:
    """The harness must handle any panel shape, not a fixed supported list."""
    def test_overflow_extent_pads_to_largest_in_run(self):
        # A wide run (extent 256) means content at x=200 on a 64-wide panel is
        # caught; the same draw with a small extent would be clipped (false pass).
        wide = BoundsCheckingDisplayManager(width=64, height=32, overflow_extent=(256, 32))
        wide.draw.rectangle([200, 5, 210, 10], fill=(255, 0, 0))
        assert wide.check_overflow() is not None
        tight = BoundsCheckingDisplayManager(width=64, height=32, overflow_extent=(64, 32))
        tight.draw.rectangle([200, 5, 210, 10], fill=(255, 0, 0))
        assert tight.check_overflow() is None  # clipped beyond the small canvas
    def test_unusual_shapes_report_their_declared_size(self):
        for w, h in [(8, 2), (6, 6), (200, 8), (64, 96)]:
            dm = BoundsCheckingDisplayManager(width=w, height=h)
            assert dm.width == w and dm.height == h
            assert dm.matrix.width == w and dm.matrix.height == h
 class TestUpdateErrorClassification:
    """update() may fail for lack of network (tolerated) but a logic bug must
    not pass green just because display() survives."""
    def test_connectivity_errors_are_tolerated(self):
        import socket
        import urllib.error
        for exc in (ConnectionError("x"), TimeoutError("x"), socket.gaierror("x"),
                    urllib.error.URLError("x")):
            assert isinstance(exc, _TOLERATED_UPDATE_ERRORS)
    def test_logic_errors_are_not_tolerated(self):
        for exc in (ValueError("x"), KeyError("x"), AttributeError("x"), TypeError("x")):
            assert not isinstance(exc, _TOLERATED_UPDATE_ERRORS)
 class TestSizeParsing:
    def test_parse_size_token_ok(self):
        assert parse_size_token(" 128X32 ") == (128, 32)
    def test_parse_size_token_rejects_garbage(self):
        with pytest.raises(ValueError):
            parse_size_token("128xabc")
        with pytest.raises(ValueError):
            parse_size_token("128-32")
    def test_rejects_non_positive_dimensions(self):
        for bad in ("0x32", "-64x32", "64x0", "64x-1"):
            with pytest.raises(ValueError):
                parse_size_token(bad)
        with pytest.raises(ValueError):
            coerce_sizes([[0, 32]])
        with pytest.raises(ValueError):
            coerce_sizes("64x-1")
    def test_coerce_sizes_from_string_and_pairs(self):
        assert coerce_sizes("8x16,64x64") == [(8, 16), (64, 64)]
        assert coerce_sizes([[8, 16], (64, 64)]) == [(8, 16), (64, 64)]
        assert coerce_sizes(None) is None
        assert coerce_sizes("") is None
    def test_resolve_precedence_env_then_spec_then_default(self, monkeypatch):
        monkeypatch.delenv("LEDMATRIX_TEST_SIZES", raising=False)
        assert resolve_test_sizes(None) == list(DEFAULT_TEST_SIZES)
        assert resolve_test_sizes([[8, 16]]) == [(8, 16)]
        monkeypatch.setenv("LEDMATRIX_TEST_SIZES", "5x5")
        # env wins over a per-plugin spec
        assert resolve_test_sizes([[8, 16]]) == [(5, 5)]
 class TestCompareImages:
    def test_identical_images_match(self):
        a = Image.new("RGB", (16, 16), (10, 20, 30))
        b = a.copy()
        ok, diff_pixels, max_delta = compare_images(a, b)
        assert ok and diff_pixels == 0 and max_delta == 0
    def test_different_images_fail_at_zero_tolerance(self):
        a = Image.new("RGB", (16, 16), (0, 0, 0))
        b = a.copy()
        b.putpixel((1, 1), (255, 255, 255))
        ok, diff_pixels, max_delta = compare_images(a, b)
        assert not ok and diff_pixels == 1 and max_delta == 255
    def test_tolerance_absorbs_small_noise(self):
        a = Image.new("RGB", (16, 16), (100, 100, 100))
        b = a.copy()
        b.putpixel((2, 2), (103, 100, 100))  # delta 3
        ok, _, max_delta = compare_images(a, b, max_delta=5, max_diff_pixels=0)
        assert ok and max_delta == 3
    def test_size_mismatch_fails(self):
        a = Image.new("RGB", (16, 16))
        b = Image.new("RGB", (32, 16))
        ok, _, _ = compare_images(a, b)
        assert not ok
 class TestListModes:
    def test_instance_modes_take_precedence(self):
        inst = type("P", (), {"modes": ["a", "b"]})()
        assert list_modes(inst, {"display_modes": ["x"]}, "pid") == ["a", "b"]
    def test_falls_back_to_manifest_display_modes(self):
        inst = type("P", (), {})()
        assert list_modes(inst, {"display_modes": ["x", "y"]}, "pid") == ["x", "y"]
    def test_falls_back_to_plugin_id(self):
        inst = type("P", (), {})()
        assert list_modes(inst, {}, "pid") == ["pid"]
@@ -0,0 +1,115 @@
 """
 Cross-size / cross-screen plugin safety test.
 For every discovered plugin, render every declared screen at every supported
 matrix size and assert it: loads, renders without crashing, stays within the
 panel bounds, and — for plugins that ship golden images — matches them.
 Plugin discovery (first match wins):
  - $LEDMATRIX_PLUGINS_DIR  (os.pathsep-separated list of dirs), else
  - <project_root>/plugin-repos and <project_root>/plugins
 A plugin opts into golden-image checks by adding test/golden/<WxH>/<mode>.png
 (and usually test/harness.json for deterministic config / mock data / time).
 """
 import os
 from pathlib import Path
 from typing import Dict, List
 import pytest
 from src.plugin_system.testing.harness import (
    render_plugin_matrix, compare_to_goldens,
 )
 from src.plugin_system.testing.loading import load_config_defaults, load_harness_spec
 from src.plugin_system.testing.sizes import resolve_test_sizes
 PROJECT_ROOT = Path(__file__).resolve().parents[2]
 # Set LEDMATRIX_REQUIRE_PLUGINS=1 in any CI/hardware pipeline where plugins are
 # expected to be present, so a discovery drift (empty search path) fails loudly
 # instead of silently skipping and losing this safety signal.
 _REQUIRE_PLUGINS = os.environ.get("LEDMATRIX_REQUIRE_PLUGINS") == "1"
 def _plugin_search_dirs() -> List[Path]:
    env = os.environ.get("LEDMATRIX_PLUGINS_DIR")
    if env:
        return [Path(p) for p in env.split(os.pathsep) if p]
    return [PROJECT_ROOT / "plugin-repos", PROJECT_ROOT / "plugins"]
 def _discover() -> Dict[str, Path]:
    """Map plugin_id -> plugin_dir for all plugins on the search path."""
    found: Dict[str, Path] = {}
    for base in _plugin_search_dirs():
        if not base.exists():
            continue
        for child in sorted(base.iterdir()):
            if (child / "manifest.json").exists() and child.name not in found:
                found[child.name] = child
    return found
 _PLUGINS = _discover()
@pytest.mark.plugin
 def test_plugins_were_discovered() -> None:
    """Guard against silently skipping the whole matrix when discovery drifts.
    Local dev and the plugin-less core CI legitimately have no plugins, so we
    skip there; but when LEDMATRIX_REQUIRE_PLUGINS=1 an empty search path is a
    hard failure rather than a green no-op.
    """
    if _PLUGINS:
        return
    search = [str(p) for p in _plugin_search_dirs()]
    if _REQUIRE_PLUGINS:
        pytest.fail(
            "LEDMATRIX_REQUIRE_PLUGINS=1 but no plugins were discovered on the "
            f"search path: {search}"
        )
    pytest.skip(f"no plugins found on the search path: {search}")
@pytest.mark.plugin
@pytest.mark.skipif(not _PLUGINS, reason="no plugins found on the search path")
@pytest.mark.parametrize("plugin_id", sorted(_PLUGINS))
 def test_plugin_renders_across_sizes_and_screens(plugin_id: str) -> None:
    plugin_dir = _PLUGINS[plugin_id]
    spec = load_harness_spec(plugin_dir)
    config = {"enabled": True}
    config.update(load_config_defaults(plugin_dir))
    config.update(spec.get("config", {}))
    # Sizes: LEDMATRIX_TEST_SIZES env (test on real hardware) wins, then the
    # plugin's own harness.json "sizes", else the default representative sample.
    sizes = resolve_test_sizes(spec.get("sizes"))
    results = render_plugin_matrix(
        plugin_id=plugin_id,
        plugin_dir=plugin_dir,
        config=config,
        mock_data=spec.get("mock_data_contents", {}),
        sizes=sizes,
        run_update=not spec.get("skip_update", False),
        freeze_time=spec.get("freeze_time"),
    )
    compare_to_goldens(results, plugin_dir / "test" / "golden")
    failures = []
    for r in results:
        if r.error is not None:
            failures.append(f"{r.size_label} {r.mode}: crashed: {r.error}")
        elif r.overflow is not None:
            failures.append(f"{r.size_label} {r.mode}: overflow past panel bbox={r.overflow}")
        elif r.golden_checked and r.golden_ok is False:
            failures.append(
                f"{r.size_label} {r.mode}: golden drift {r.golden_diff_pixels}px "
                f"(max Δ={r.golden_max_delta})"
            )
    assert not failures, f"{plugin_id} failed:\n  " + "\n  ".join(failures)