Add cross-size/cross-screen plugin safety harness (#361)

* feat(testing): add cross-size/cross-screen plugin safety harness Render every plugin across all supported matrix sizes (64x32, 128x32, 128x64, 256x32) and every declared screen, failing on crashes, content drawn past the panel edge, or visual drift vs committed golden images. - BoundsCheckingDisplayManager: oversized-canvas overflow detection - harness.py: multi-size/multi-screen render engine + golden compare - scripts/check_plugin.py: CLI (functional+bounds, --out-dir, --update-golden, --freeze-time); render_plugin.py refactored onto shared loading helpers - test/plugins/test_harness.py + test_plugin_matrix.py (parametrized, honors per-plugin test/harness.json; skips when no plugins present) - MockCacheManager.cache_dir so cache-dir-using plugins load headlessly - .github/workflows/test.yml + docs/plugin-safety-harness.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(testing): address PR review feedback on plugin safety harness - check_plugin: friendly error for non-numeric --sizes; reject non-object --config / --mock-data JSON; sanitize plugin mode before using as a filename; stop --update-golden from masking crash/overflow failures - bounds_display_manager: pad the canvas out to the largest supported panel (not a fixed 16px) so far-overshoot coordinates are caught, not clipped - harness: merge config_schema defaults inside render_plugin_matrix; surface update() failures as a non-fatal warning + result field instead of a debug log; sanitize mode in golden_path - loading: fail fast when harness.json references a missing mock_data fixture - mocks: clean up the per-instance temp cache dir via weakref.finalize - test_plugin_matrix: add a discovery guard that fails when LEDMATRIX_REQUIRE_PLUGINS=1 but none found (still skips locally); type hints - bound test deps with upper version pins for deterministic CI Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(testing): render plugins across arbitrary panel sizes, not a fixed list Addresses maintainer feedback that there is no canonical set of supported panel sizes — a build can be any size/configuration (square, 2x2, 4x4, 8x2, long strips, tall stacks). - sizes.py: SUPPORTED_SIZES -> DEFAULT_TEST_SIZES (back-compat alias kept), reframed as a representative SAMPLE of real panel-grid arrangements rather than an authoritative list; add parse_size_token / coerce_sizes / resolve_test_sizes helpers - sizes are now fully overridable: LEDMATRIX_TEST_SIZES env (global, e.g. test on your exact hardware) > per-plugin harness.json "sizes" > default sample; CLI --sizes unchanged - bounds_display_manager: pad the canvas to the largest panel IN THE CURRENT RUN (via overflow_extent) instead of a hardcoded max, so cross-size overflow detection scales to whatever sizes a run uses - harness: compute per-run extent and thread it into the bounds manager - tests: arbitrary-shape + size-parsing/precedence coverage - docs: rewrite "Supported sizes" -> "Sizes: a sample, not a fixed list" Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(testing): fail the harness on non-connectivity update() errors Addresses the remaining review thread: recording every update() exception as a non-fatal warning still let a real update() regression pass green as long as display() survived. Now update() failures are classified — a tolerated set of connectivity errors (ConnectionError/TimeoutError/socket/ssl/urllib/http/ requests) is recorded non-fatally (expected with no network in CI), while any other exception is treated as a genuine bug and fails that render. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ci(security): pin actions to SHAs and disable checkout credential persistence Addresses the CodeRabbit/zizmor workflow-hardening finding: pin actions/checkout and actions/setup-python to full commit SHAs and set persist-credentials: false on checkout to reduce supply-chain and token-exposure risk. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(testing): validate positive sizes; narrow requests import except Two review findings: - sizes.py: parse_size_token / coerce_sizes now reject non-positive dimensions (0x32, -64x32) with a clear message instead of passing invalid sizes downstream (CodeRabbit). - harness.py: the optional `requests` import now catches ImportError specifically and logs instead of `except Exception: pass`, clearing the Codacy medium "Try, Except, Pass" (harness.py L52) and Ruff S110/BLE001. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-08-01 08:48:05 +00:00 · 2026-06-05 14:32:52 -04:00
parent 122e6d6863
commit 313e35a98f
13 changed files with 1360 additions and 38 deletions
@@ -7,13 +7,22 @@ Provides base classes and utilities for testing LEDMatrix plugins.
 from .plugin_test_base import PluginTestCase
 from .mocks import MockDisplayManager, MockCacheManager, MockConfigManager, MockPluginManager
 from .visual_display_manager import VisualTestDisplayManager
+from .bounds_display_manager import BoundsCheckingDisplayManager
+from .sizes import (
+    DEFAULT_TEST_SIZES, SUPPORTED_SIZES, resolve_test_sizes, size_label,
+)

 __all__ = [
    'PluginTestCase',
    'VisualTestDisplayManager',
+    'BoundsCheckingDisplayManager',
    'MockDisplayManager',
    'MockCacheManager',
    'MockConfigManager',
    'MockPluginManager',
+    'DEFAULT_TEST_SIZES',
+    'SUPPORTED_SIZES',
+    'resolve_test_sizes',
+    'size_label',
 ]

@@ -0,0 +1,129 @@
+"""
+Bounds-checking display manager.
+
+A VisualTestDisplayManager that draws onto an oversized canvas (the declared
+panel size plus a right/bottom margin) while still reporting the declared size
+to the plugin. Content that a plugin draws past the right or bottom edge lands
+in the margin instead of being silently clipped by PIL, so the harness can
+detect overflow — the classic symptom of hardcoded coordinates or fonts/icons
+that don't scale down to a smaller panel.
+
+Limitations (documented on purpose):
+- Overflow past the LEFT or TOP edge (negative coordinates) is still clipped by
+  PIL and not detected here. The dominant real-world breakage is content that is
+  too wide/tall for a smaller panel, which this catches.
+- BDF text is clipped to the declared bounds by the parent's bitmap drawer, so
+  BDF overflow is not flagged. Golden-image regression covers those plugins.
+- If a plugin replaces the canvas with its own image (display_manager.image = ...),
+  the margin can't be measured and overflow is reported as undetermined (None).
+"""
+
+from typing import Optional, Tuple
+
+from .sizes import DEFAULT_TEST_SIZES
+from .visual_display_manager import VisualTestDisplayManager, _MatrixProxy
+
+# Smallest extra band kept on the right/bottom so a few pixels of overflow are
+# still visible even on the largest panel in a run.
+_BASE_MARGIN = 16
+# Fallback overflow reference when a caller doesn't pass one: the largest shape
+# in the default sample. We extend every (smaller) canvas out to at least this
+# size so content drawn at a coordinate meant for a bigger build — e.g. x=200 on
+# a 64-wide panel — lands in the padded region and is flagged, instead of being
+# clipped off-canvas and read as a false pass.
+_DEFAULT_EXTENT_WIDTH = max(w for w, _ in DEFAULT_TEST_SIZES)
+_DEFAULT_EXTENT_HEIGHT = max(h for _, h in DEFAULT_TEST_SIZES)
+
+
+class BoundsCheckingDisplayManager(VisualTestDisplayManager):
+    """Detects drawing that overflows the declared panel size."""
+
+    # Kept for backwards compatibility; real padding is computed per-axis below.
+    MARGIN = _BASE_MARGIN
+
+    def __init__(self, width: int = 128, height: int = 32,
+                 overflow_extent: Optional[Tuple[int, int]] = None):
+        self._declared_width = int(width)
+        self._declared_height = int(height)
+        # Pad the canvas out to at least `overflow_extent` (the largest panel
+        # this run cares about) plus a base margin, so coordinates meant for a
+        # bigger build are caught — not clipped — when rendering a smaller panel.
+        # Defaults to the largest shape in the sample when no run is known.
+        ext_w, ext_h = overflow_extent or (_DEFAULT_EXTENT_WIDTH, _DEFAULT_EXTENT_HEIGHT)
+        self._canvas_width = max(self._declared_width, int(ext_w)) + _BASE_MARGIN
+        self._canvas_height = max(self._declared_height, int(ext_h)) + _BASE_MARGIN
+        # Parent builds the (oversized) backing canvas + fonts.
+        super().__init__(self._canvas_width, self._canvas_height)
+        # Plugins must see the DECLARED size, not the padded canvas size.
+        self.matrix = _MatrixProxy(self._declared_width, self._declared_height)
+
+    # -- declared dimensions (override parent's image-derived properties) --
+
+    @property
+    def width(self) -> int:
+        return self._declared_width
+
+    @property
+    def height(self) -> int:
+        return self._declared_height
+
+    @property
+    def display_width(self) -> int:
+        return self._declared_width
+
+    @property
+    def display_height(self) -> int:
+        return self._declared_height
+
+    # -- overflow detection --
+
+    def _canvas_is_padded(self) -> bool:
+        return self.image.size == (self._canvas_width, self._canvas_height)
+
+    def check_overflow(self) -> Optional[Tuple[int, int, int, int]]:
+        """Bounding box (in full-canvas coords) of any drawing beyond the
+        declared panel, or None if nothing overflowed / undetermined."""
+        if not self._canvas_is_padded():
+            return None
+
+        exp_w = self._canvas_width
+        exp_h = self._canvas_height
+        boxes = []
+
+        right = self.image.crop((self._declared_width, 0, exp_w, exp_h)).getbbox()
+        if right:
+            boxes.append((right[0] + self._declared_width, right[1],
+                          right[2] + self._declared_width, right[3]))
+
+        bottom = self.image.crop((0, self._declared_height, exp_w, exp_h)).getbbox()
+        if bottom:
+            boxes.append((bottom[0], bottom[1] + self._declared_height,
+                          bottom[2], bottom[3] + self._declared_height))
+
+        if not boxes:
+            return None
+        return (
+            min(b[0] for b in boxes), min(b[1] for b in boxes),
+            max(b[2] for b in boxes), max(b[3] for b in boxes),
+        )
+
+    # -- snapshot/image accessors return the cropped, true-panel image --
+
+    def declared_image(self):
+        """The visible panel: the canvas cropped to the declared size."""
+        if self._canvas_is_padded():
+            return self.image.crop((0, 0, self._declared_width, self._declared_height))
+        return self.image
+
+    def save_snapshot(self, path: str) -> None:
+        self.declared_image().save(path, format='PNG')
+
+    def get_image(self):
+        return self.declared_image()
+
+    def get_image_base64(self) -> str:
+        import base64
+        import io
+        buffer = io.BytesIO()
+        self.declared_image().save(buffer, format='PNG')
+        return base64.b64encode(buffer.getvalue()).decode('utf-8')
@@ -0,0 +1,314 @@
+"""
+Plugin safety harness.
+
+Renders a plugin across every declared screen (mode) and every supported matrix
+size, capturing crashes and overflow. Used by scripts/check_plugin.py and the
+pytest matrix test to guarantee a plugin change doesn't break a screen at a size
+the author didn't try.
+
+The render flow mirrors scripts/render_plugin.py (same PluginLoader call), but
+this module adds: multi-size iteration, per-mode rendering, overflow detection
+via BoundsCheckingDisplayManager, and golden-image comparison.
+"""
+
+import contextlib
+import http.client
+import inspect
+import socket
+import ssl
+import urllib.error
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple
+
+from PIL import Image, ImageChops
+
+from src.logging_config import get_logger
+from .bounds_display_manager import BoundsCheckingDisplayManager
+from .loading import load_config_defaults, load_manifest
+from .sizes import DEFAULT_TEST_SIZES, safe_mode_filename, size_label
+
+logger = get_logger("[Plugin Harness]")
+
+
+def _tolerated_update_errors() -> Tuple[type, ...]:
+    """Exception types from update() we treat as a tolerated no-connectivity
+    failure (expected in CI / headless dev) rather than a real plugin bug.
+
+    Anything NOT in this set is a genuine regression — a plugin that lets a
+    non-network exception escape update() should fail the harness, not pass
+    green because display() happened to survive.
+    """
+    types: List[type] = [
+        ConnectionError, TimeoutError,        # builtins
+        socket.gaierror, socket.timeout,      # DNS / socket timeouts
+        ssl.SSLError,
+        urllib.error.URLError,
+        http.client.HTTPException,
+    ]
+    try:  # requests is optional; cover its whole error tree when present
+        import requests
+        types.append(requests.exceptions.RequestException)
+    except ImportError:  # pragma: no cover - requests not installed
+        logger.debug("requests not installed; its connectivity errors won't be specifically tolerated")
+    return tuple(types)
+
+
+_TOLERATED_UPDATE_ERRORS = _tolerated_update_errors()
+
+
+@dataclass
+class RenderResult:
+    """Outcome of rendering one (size, mode) of a plugin."""
+    plugin_id: str
+    width: int
+    height: int
+    mode: str
+    image: Optional[Image.Image] = None
+    error: Optional[str] = None          # fatal: load/display crash, or a non-network update() error
+    update_error: Optional[str] = None   # tolerated: connectivity error from update() (no network in CI)
+    overflow: Optional[Tuple[int, int, int, int]] = None  # bbox past the panel
+    # golden comparison (populated only when a golden was provided)
+    golden_checked: bool = False
+    golden_ok: Optional[bool] = None
+    golden_diff_pixels: int = 0
+    golden_max_delta: int = 0
+
+    @property
+    def size_label(self) -> str:
+        return size_label(self.width, self.height)
+
+    @property
+    def ok(self) -> bool:
+        """Phase-1 pass: rendered without crashing and without overflow, and if a
+        golden was checked it matched."""
+        if self.error is not None or self.overflow is not None:
+            return False
+        if self.golden_checked and self.golden_ok is False:
+            return False
+        return True
+
+
+def list_modes(plugin_instance: Any, manifest: Dict[str, Any], plugin_id: str) -> List[str]:
+    """Enumerate a plugin's screens: instance.modes wins, then manifest
+    display_modes, then the plugin id as a single mode."""
+    modes = getattr(plugin_instance, "modes", None)
+    if modes:
+        return [str(m) for m in modes]
+    declared = manifest.get("display_modes")
+    if declared:
+        return [str(m) for m in declared]
+    return [plugin_id]
+
+
+def _instantiate(plugin_id: str, manifest: Dict[str, Any], plugin_dir: Path,
+                 config: Dict[str, Any], mock_data: Dict[str, Any],
+                 display_manager: Any) -> Any:
+    """Load and construct a plugin instance with mocked managers."""
+    from src.plugin_system.plugin_loader import PluginLoader
+    from src.plugin_system.testing import MockCacheManager, MockPluginManager
+
+    cache_manager = MockCacheManager()
+    for key, value in (mock_data or {}).items():
+        cache_manager.set(key, value)
+
+    loader = PluginLoader()
+    plugin_instance, _module = loader.load_plugin(
+        plugin_id=plugin_id,
+        manifest=manifest,
+        plugin_dir=plugin_dir,
+        config=config,
+        display_manager=display_manager,
+        cache_manager=cache_manager,
+        plugin_manager=MockPluginManager(),
+        install_deps=False,
+    )
+    return plugin_instance
+
+
+def _render_mode(plugin_instance: Any, mode: str) -> None:
+    """Render a specific screen. Prefer an explicit display_mode kwarg; otherwise
+    drive the plugin's internal mode state machine (first display() call renders
+    modes[current_mode_index] when current_display_mode is None)."""
+    sig = inspect.signature(plugin_instance.display)
+    if "display_mode" in sig.parameters:
+        plugin_instance.display(force_clear=True, display_mode=mode)
+        return
+
+    modes = getattr(plugin_instance, "modes", None)
+    if modes and mode in modes:
+        plugin_instance.current_mode_index = list(modes).index(mode)
+    if hasattr(plugin_instance, "current_display_mode"):
+        plugin_instance.current_display_mode = None
+    plugin_instance.display(force_clear=False)
+
+
+def _freeze(freeze_time: Optional[str]):
+    """Context manager that freezes wall-clock time when freeze_time is given,
+    so time-dependent plugins (clocks, countdowns) render deterministic goldens."""
+    if not freeze_time:
+        return contextlib.nullcontext()
+    try:
+        from freezegun import freeze_time as _ft
+    except ImportError as e:  # pragma: no cover - only hit without the dep
+        raise RuntimeError(
+            "freeze_time requires the 'freezegun' package (pip install freezegun)"
+        ) from e
+    return _ft(freeze_time)
+
+
+def render_plugin_matrix(
+    plugin_id: str,
+    plugin_dir: Path,
+    config: Optional[Dict[str, Any]] = None,
+    mock_data: Optional[Dict[str, Any]] = None,
+    sizes: Optional[List[Tuple[int, int]]] = None,
+    run_update: bool = True,
+    freeze_time: Optional[str] = None,
+) -> List[RenderResult]:
+    """Render every (size, mode) combination for a plugin.
+
+    Returns a flat list of RenderResult. A fresh plugin instance is built per
+    (size, mode) so state never leaks between screens. Pass freeze_time (e.g.
+    "2025-08-01 15:25:00") to make time-dependent plugins reproducible.
+    """
+    plugin_dir = Path(plugin_dir)
+    manifest = load_manifest(plugin_dir)
+    # Start from config_schema.json defaults so the plugin behaves like a real
+    # install; explicit caller config still wins over a schema default.
+    config = {"enabled": True, **load_config_defaults(plugin_dir), **(config or {})}
+    sizes = sizes or DEFAULT_TEST_SIZES
+    results: List[RenderResult] = []
+
+    # The largest panel in this run. Every (smaller) canvas is padded out to it
+    # so a coordinate meant for the biggest configuration is still caught when
+    # rendering a smaller one, instead of being clipped into a false pass.
+    extent = (max(w for w, _ in sizes), max(h for _, h in sizes))
+
+    with _freeze(freeze_time):
+        for width, height in sizes:
+            results.extend(_render_size(
+                plugin_id, manifest, plugin_dir, config, mock_data or {},
+                width, height, run_update, extent,
+            ))
+
+    return results
+
+
+def _render_size(plugin_id, manifest, plugin_dir, config, mock_data,
+                 width, height, run_update, extent) -> List[RenderResult]:
+    """Render every mode at one size. A fresh instance per mode avoids state leaks."""
+    results: List[RenderResult] = []
+
+    # Discover modes once per size (instance build can depend on config).
+    try:
+        probe_dm = BoundsCheckingDisplayManager(width=width, height=height, overflow_extent=extent)
+        probe = _instantiate(plugin_id, manifest, plugin_dir, config, mock_data, probe_dm)
+        modes = list_modes(probe, manifest, plugin_id)
+    except Exception as e:  # noqa: BLE001 — surface any load failure as a result
+        return [RenderResult(plugin_id, width, height, "<load>", error=repr(e))]
+
+    for mode in modes:
+        result = RenderResult(plugin_id, width, height, mode)
+        dm = BoundsCheckingDisplayManager(width=width, height=height, overflow_extent=extent)
+        try:
+            inst = _instantiate(plugin_id, manifest, plugin_dir, config, mock_data, dm)
+            if run_update:
+                try:
+                    inst.update()
+                except _TOLERATED_UPDATE_ERRORS as e:
+                    # Expected when CI / headless dev has no network: record it
+                    # (surfaced in the report) but don't fail the run.
+                    result.update_error = repr(e)
+                    logger.debug("update() connectivity error for %s [%s]: %s", plugin_id, mode, e)
+                except Exception as e:  # noqa: BLE001 — a non-network update() failure is a real bug
+                    # A regression in update() must not pass green just because
+                    # display() survives, so treat it as a failure of this render.
+                    result.error = repr(e)
+                    logger.warning("update() raised a non-connectivity error for %s [%s]: %s",
+                                   plugin_id, mode, e)
+            if result.error is None:
+                _render_mode(inst, mode)
+                result.image = dm.get_image()
+                result.overflow = dm.check_overflow()
+        except Exception as e:  # noqa: BLE001 — a display crash is a real failure
+            result.error = repr(e)
+        results.append(result)
+
+    return results
+
+
+# ---------------------------------------------------------------------------
+# Golden-image comparison
+# ---------------------------------------------------------------------------
+
+def compare_images(rendered: Image.Image, golden: Image.Image,
+                   max_delta: int = 0, max_diff_pixels: int = 0) -> Tuple[bool, int, int]:
+    """Compare two images. Returns (ok, diff_pixel_count, max_per_channel_delta).
+
+    Tolerances default to exact match; bump them only to absorb known platform
+    anti-aliasing noise (requires a pinned Pillow + bundled fonts for stability).
+    """
+    if rendered.size != golden.size:
+        return False, rendered.size[0] * rendered.size[1], 255
+    a = rendered.convert("RGB")
+    b = golden.convert("RGB")
+    diff = ImageChops.difference(a, b)
+    bbox = diff.getbbox()
+    if bbox is None:
+        return True, 0, 0
+    # Count pixels whose largest per-channel delta exceeds the allowed tolerance,
+    # and track the worst delta seen (for reporting).
+    diff_pixels = 0
+    observed_max = 0
+    for px in diff.crop(bbox).getdata():
+        m = max(px) if isinstance(px, tuple) else px
+        if m > observed_max:
+            observed_max = m
+        if m > max_delta:
+            diff_pixels += 1
+    # Pass when the number of out-of-tolerance pixels is within budget.
+    ok = diff_pixels <= max_diff_pixels
+    return ok, diff_pixels, observed_max
+
+
+def golden_path(golden_dir: Path, width: int, height: int, mode: str) -> Path:
+    """Location of a golden image: <golden_dir>/<WxH>/<mode>.png.
+
+    The mode is sanitized to a safe basename so a mode name with '/' or '..'
+    can't read or write outside the golden directory.
+    """
+    return Path(golden_dir) / size_label(width, height) / f"{safe_mode_filename(mode)}.png"
+
+
+def compare_to_goldens(results: List[RenderResult], golden_dir: Path,
+                       max_delta: int = 0, max_diff_pixels: int = 0) -> List[RenderResult]:
+    """Compare rendered results against committed goldens, mutating each result's
+    golden_* fields. Results with no golden file on disk are left unchecked."""
+    for r in results:
+        if r.image is None:
+            continue
+        gp = golden_path(golden_dir, r.width, r.height, r.mode)
+        if not gp.exists():
+            continue
+        r.golden_checked = True
+        with Image.open(gp) as g:
+            ok, diff_pixels, observed_max = compare_images(
+                r.image, g, max_delta=max_delta, max_diff_pixels=max_diff_pixels)
+        r.golden_ok = ok
+        r.golden_diff_pixels = diff_pixels
+        r.golden_max_delta = observed_max
+    return results
+
+
+def write_goldens(results: List[RenderResult], golden_dir: Path) -> int:
+    """Write each successfully-rendered result to its golden path. Returns count."""
+    written = 0
+    for r in results:
+        if r.image is None or r.error is not None:
+            continue
+        gp = golden_path(golden_dir, r.width, r.height, r.mode)
+        gp.parent.mkdir(parents=True, exist_ok=True)
+        r.image.save(gp, format="PNG")
+        written += 1
+    return written
@@ -0,0 +1,82 @@
+"""
+Shared helpers for loading a plugin headlessly.
+
+Used by scripts/render_plugin.py, scripts/check_plugin.py, and the harness so
+plugin discovery / manifest / config-default logic lives in exactly one place.
+"""
+
+import json
+from pathlib import Path
+from typing import Any, Dict, Optional, Sequence, Union
+
+
+def find_plugin_dir(plugin_id: str, search_dirs: Sequence[Union[str, Path]]) -> Optional[Path]:
+    """Find a plugin directory by searching multiple paths."""
+    from src.plugin_system.plugin_loader import PluginLoader
+    loader = PluginLoader()
+    for search_dir in search_dirs:
+        search_path = Path(search_dir)
+        if not search_path.exists():
+            continue
+        result = loader.find_plugin_directory(plugin_id, search_path)
+        if result:
+            return Path(result)
+    return None
+
+
+def load_manifest(plugin_dir: Union[str, Path]) -> Dict[str, Any]:
+    """Load and return manifest.json from a plugin directory."""
+    manifest_path = Path(plugin_dir) / 'manifest.json'
+    if not manifest_path.exists():
+        raise FileNotFoundError(f"No manifest.json in {plugin_dir}")
+    with open(manifest_path, 'r') as f:
+        return json.load(f)
+
+
+def load_config_defaults(plugin_dir: Union[str, Path]) -> Dict[str, Any]:
+    """Extract default values from a plugin's config_schema.json (empty if none)."""
+    schema_path = Path(plugin_dir) / 'config_schema.json'
+    if not schema_path.exists():
+        return {}
+    with open(schema_path, 'r') as f:
+        schema = json.load(f)
+    defaults: Dict[str, Any] = {}
+    for key, prop in schema.get('properties', {}).items():
+        if isinstance(prop, dict) and 'default' in prop:
+            defaults[key] = prop['default']
+    return defaults
+
+
+def load_harness_spec(plugin_dir: Union[str, Path]) -> Dict[str, Any]:
+    """Optional per-plugin harness settings from <plugin>/test/harness.json.
+
+    Lets a plugin opt into golden-image testing by declaring how to render it
+    deterministically. All keys optional:
+        {
+          "config":     {...},            # config overrides
+          "mock_data":  "fixtures/mock.json",  # path (relative to plugin dir) to cache fixtures
+          "freeze_time": "2025-08-01 15:25:00",
+          "skip_update": false
+        }
+    Returns {} when no harness.json exists.
+    """
+    spec_path = Path(plugin_dir) / 'test' / 'harness.json'
+    if not spec_path.exists():
+        return {}
+    with open(spec_path, 'r') as f:
+        spec = json.load(f)
+
+    # Resolve mock_data path and inline its contents for convenience.
+    mock_rel = spec.get('mock_data')
+    if mock_rel:
+        mock_path = Path(plugin_dir) / mock_rel
+        if not mock_path.exists():
+            # A declared-but-missing fixture is a harness config error: failing
+            # loudly beats silently rendering the plugin with no mock data.
+            raise FileNotFoundError(
+                f"harness.json references mock_data '{mock_rel}' but "
+                f"{mock_path} does not exist"
+            )
+        with open(mock_path, 'r') as mf:
+            spec['mock_data_contents'] = json.load(mf)
+    return spec
@@ -63,11 +63,23 @@ class MockCacheManager:
    """Mock cache manager for testing."""
    
    def __init__(self):
+        import shutil
+        import tempfile
+        import weakref
        self._cache: Dict[str, Any] = {}
        self._cache_timestamps: Dict[str, float] = {}
        self.get_calls = []
        self.set_calls = []
        self.delete_calls = []
+        # Real temp dir for plugins that write/read files under cache_dir.
+        # Registered for cleanup so each mock instance doesn't leak a tmp dir.
+        self.cache_dir = tempfile.mkdtemp(prefix="ledmatrix-mock-cache-")
+        self._finalizer = weakref.finalize(
+            self, shutil.rmtree, self.cache_dir, ignore_errors=True)
+
+    def cleanup(self) -> None:
+        """Remove the temp cache directory created for this instance."""
+        self._finalizer()
    
    def get(self, key: str, max_age: Optional[float] = None) -> Optional[Any]:
        """Get a value from cache."""
@@ -0,0 +1,120 @@
+"""
+LED matrix sizes the plugin safety harness renders against.
+
+There is no fixed set of "supported" panel sizes — an RGB matrix build can be
+any width/height and configuration (square, rectangle, 2x2, 4x4, 8x2, long
+strips, tall stacks, ...). Plugins are expected to read width/height
+dynamically and lay themselves out accordingly, so the harness's job is to
+prove a plugin survives a *spread* of shapes, not a canonical list.
+
+`DEFAULT_TEST_SIZES` is therefore a representative SAMPLE chosen to span the
+axes of variation (narrow, wide, square, tall, small, long), not an
+exhaustive or authoritative list. Callers can override it entirely:
+
+  - CLI:        scripts/check_plugin.py --sizes 8x16,64x64,256x32
+  - pytest:     LEDMATRIX_TEST_SIZES="8x16,64x64" env var (all plugins), or
+                per-plugin test/harness.json {"sizes": [[8, 16], [64, 64]]}
+
+so anyone can point the harness at the exact panel(s) their build uses.
+"""
+
+import os
+from typing import Iterable, List, Optional, Sequence, Tuple, Union
+
+# A spread of real panel-grid arrangements (each module is 64x32), not a list of
+# "blessed" sizes. Each entry exercises a different layout assumption a plugin
+# might accidentally bake in. Annotations are the panel grid (cols x rows).
+DEFAULT_TEST_SIZES: List[Tuple[int, int]] = [
+    (64, 32),    # 1x1 — single panel, the tightest common rectangle
+    (128, 32),   # 2x1 — the baseline most plugins are tuned for
+    (64, 64),    # 1x2 — stacked, exercises tall-narrow centering
+    (128, 64),   # 2x2 — block, icon scaling / vertical centering
+    (256, 32),   # 4x1 — long strip, wide horizontal layout
+    (128, 96),   # 2x3 — tall, exercises vertical overflow
+    (256, 128),  # 4x4 — large block, both dimensions big at once
+]
+
+# Backwards-compatible alias. Prefer DEFAULT_TEST_SIZES in new code — the old
+# name implied these were the only valid panel sizes, which they are not.
+SUPPORTED_SIZES = DEFAULT_TEST_SIZES
+
+
+def size_label(width: int, height: int) -> str:
+    """Human/path-friendly label for a size, e.g. '128x32'."""
+    return f"{width}x{height}"
+
+
+def parse_size_token(token: str) -> Tuple[int, int]:
+    """Parse a single 'WxH' token into an (int, int) pair.
+
+    Raises ValueError (with a user-friendly message) on malformed input so
+    callers can surface it however they like.
+    """
+    cleaned = token.strip().lower()
+    if "x" not in cleaned:
+        raise ValueError(f"Invalid size '{token}' (expected WxH, e.g. 128x32)")
+    w, h = cleaned.split("x", 1)
+    try:
+        width, height = int(w), int(h)
+    except ValueError as exc:
+        raise ValueError(
+            f"Invalid size '{token}' (expected numeric WxH, e.g. 128x32)"
+        ) from exc
+    if width <= 0 or height <= 0:
+        raise ValueError(
+            f"Invalid size '{token}' (width and height must be positive, e.g. 128x32)"
+        )
+    return (width, height)
+
+
+def coerce_sizes(
+    value: Union[str, Iterable[Sequence[int]], None]
+) -> Optional[List[Tuple[int, int]]]:
+    """Normalize a size spec into a list of (w, h) tuples, or None if empty.
+
+    Accepts a comma-separated 'WxH,WxH' string (CLI / env var) or an iterable
+    of [w, h] / (w, h) pairs (harness.json). Returns None when value is falsy
+    so callers can fall back to the default sample.
+    """
+    if not value:
+        return None
+    if isinstance(value, str):
+        return [parse_size_token(tok) for tok in value.split(",") if tok.strip()]
+    sizes: List[Tuple[int, int]] = []
+    for pair in value:
+        w, h = pair  # raises if not a 2-element sequence
+        width, height = int(w), int(h)
+        if width <= 0 or height <= 0:
+            raise ValueError(f"Invalid size pair {pair!r} (width and height must be positive)")
+        sizes.append((width, height))
+    return sizes or None
+
+
+def resolve_test_sizes(
+    spec_sizes: Union[str, Iterable[Sequence[int]], None] = None,
+) -> List[Tuple[int, int]]:
+    """Decide which sizes to render, by precedence:
+
+    1. LEDMATRIX_TEST_SIZES env var — a global "test on my hardware" override
+       that wins for every plugin.
+    2. spec_sizes — e.g. a per-plugin harness.json "sizes" list.
+    3. DEFAULT_TEST_SIZES — the representative sample.
+    """
+    env = coerce_sizes(os.environ.get("LEDMATRIX_TEST_SIZES"))
+    if env:
+        return env
+    spec = coerce_sizes(spec_sizes)
+    if spec:
+        return spec
+    return list(DEFAULT_TEST_SIZES)
+
+
+def safe_mode_filename(mode: str) -> str:
+    """A filesystem-safe basename for a plugin mode.
+
+    Mode names come from plugin metadata/render state, so a value containing
+    '/' or '..' could otherwise escape the intended output directory. Collapse
+    anything that isn't alphanumeric / dash / underscore to '_'.
+    """
+    cleaned = "".join(ch if ch.isalnum() or ch in ("-", "_") else "_" for ch in mode)
+    return cleaned or "mode"