Add cross-size/cross-screen plugin safety harness (#361)

* feat(testing): add cross-size/cross-screen plugin safety harness Render every plugin across all supported matrix sizes (64x32, 128x32, 128x64, 256x32) and every declared screen, failing on crashes, content drawn past the panel edge, or visual drift vs committed golden images. - BoundsCheckingDisplayManager: oversized-canvas overflow detection - harness.py: multi-size/multi-screen render engine + golden compare - scripts/check_plugin.py: CLI (functional+bounds, --out-dir, --update-golden, --freeze-time); render_plugin.py refactored onto shared loading helpers - test/plugins/test_harness.py + test_plugin_matrix.py (parametrized, honors per-plugin test/harness.json; skips when no plugins present) - MockCacheManager.cache_dir so cache-dir-using plugins load headlessly - .github/workflows/test.yml + docs/plugin-safety-harness.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(testing): address PR review feedback on plugin safety harness - check_plugin: friendly error for non-numeric --sizes; reject non-object --config / --mock-data JSON; sanitize plugin mode before using as a filename; stop --update-golden from masking crash/overflow failures - bounds_display_manager: pad the canvas out to the largest supported panel (not a fixed 16px) so far-overshoot coordinates are caught, not clipped - harness: merge config_schema defaults inside render_plugin_matrix; surface update() failures as a non-fatal warning + result field instead of a debug log; sanitize mode in golden_path - loading: fail fast when harness.json references a missing mock_data fixture - mocks: clean up the per-instance temp cache dir via weakref.finalize - test_plugin_matrix: add a discovery guard that fails when LEDMATRIX_REQUIRE_PLUGINS=1 but none found (still skips locally); type hints - bound test deps with upper version pins for deterministic CI Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(testing): render plugins across arbitrary panel sizes, not a fixed list Addresses maintainer feedback that there is no canonical set of supported panel sizes — a build can be any size/configuration (square, 2x2, 4x4, 8x2, long strips, tall stacks). - sizes.py: SUPPORTED_SIZES -> DEFAULT_TEST_SIZES (back-compat alias kept), reframed as a representative SAMPLE of real panel-grid arrangements rather than an authoritative list; add parse_size_token / coerce_sizes / resolve_test_sizes helpers - sizes are now fully overridable: LEDMATRIX_TEST_SIZES env (global, e.g. test on your exact hardware) > per-plugin harness.json "sizes" > default sample; CLI --sizes unchanged - bounds_display_manager: pad the canvas to the largest panel IN THE CURRENT RUN (via overflow_extent) instead of a hardcoded max, so cross-size overflow detection scales to whatever sizes a run uses - harness: compute per-run extent and thread it into the bounds manager - tests: arbitrary-shape + size-parsing/precedence coverage - docs: rewrite "Supported sizes" -> "Sizes: a sample, not a fixed list" Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(testing): fail the harness on non-connectivity update() errors Addresses the remaining review thread: recording every update() exception as a non-fatal warning still let a real update() regression pass green as long as display() survived. Now update() failures are classified — a tolerated set of connectivity errors (ConnectionError/TimeoutError/socket/ssl/urllib/http/ requests) is recorded non-fatally (expected with no network in CI), while any other exception is treated as a genuine bug and fails that render. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ci(security): pin actions to SHAs and disable checkout credential persistence Addresses the CodeRabbit/zizmor workflow-hardening finding: pin actions/checkout and actions/setup-python to full commit SHAs and set persist-credentials: false on checkout to reduce supply-chain and token-exposure risk. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(testing): validate positive sizes; narrow requests import except Two review findings: - sizes.py: parse_size_token / coerce_sizes now reject non-positive dimensions (0x32, -64x32) with a clear message instead of passing invalid sizes downstream (CodeRabbit). - harness.py: the optional `requests` import now catches ImportError specifically and logs instead of `except Exception: pass`, clearing the Codacy medium "Try, Except, Pass" (harness.py L52) and Ruff S110/BLE001. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-08-01 08:48:05 +00:00 · 2026-06-05 14:32:52 -04:00
parent 122e6d6863
commit 313e35a98f
13 changed files with 1360 additions and 38 deletions
@@ -0,0 +1,182 @@
+"""
+Unit tests for the plugin safety harness primitives:
+bounds detection, image comparison, and mode enumeration.
+
+These don't load real plugins, so they run anywhere (including core CI where
+plugin-repos is empty).
+"""
+
+import pytest
+from PIL import Image
+
+from src.plugin_system.testing.bounds_display_manager import BoundsCheckingDisplayManager
+from src.plugin_system.testing.harness import (
+    _TOLERATED_UPDATE_ERRORS, compare_images, list_modes,
+)
+from src.plugin_system.testing.sizes import (
+    DEFAULT_TEST_SIZES, coerce_sizes, parse_size_token, resolve_test_sizes,
+)
+
+
+class TestBoundsDetection:
+    def test_reports_declared_size_not_canvas_size(self):
+        dm = BoundsCheckingDisplayManager(width=64, height=32)
+        assert dm.width == 64 and dm.height == 32
+        assert dm.matrix.width == 64 and dm.matrix.height == 32
+        # Backing canvas is padded out past the declared panel so far-overshoot
+        # coordinates land on-canvas and get flagged instead of clipped.
+        canvas_w, canvas_h = dm.image.size
+        assert canvas_w > 64 and canvas_h > 32
+
+    def test_far_overshoot_on_small_panel_is_detected(self):
+        # A coordinate meant for a wide build (x past 64) must still be caught
+        # when the declared panel is only 64 wide.
+        dm = BoundsCheckingDisplayManager(width=64, height=32)
+        dm.draw.rectangle([200, 5, 210, 10], fill=(255, 0, 0))
+        bbox = dm.check_overflow()
+        assert bbox is not None
+        assert bbox[0] >= 64
+
+    def test_in_bounds_drawing_has_no_overflow(self):
+        dm = BoundsCheckingDisplayManager(width=64, height=32)
+        dm.draw.rectangle([0, 0, 63, 31], fill=(255, 255, 255))
+        assert dm.check_overflow() is None
+
+    def test_right_overflow_is_detected(self):
+        dm = BoundsCheckingDisplayManager(width=64, height=32)
+        # Draw a few pixels past the right edge.
+        dm.draw.rectangle([60, 5, 70, 10], fill=(255, 0, 0))
+        bbox = dm.check_overflow()
+        assert bbox is not None
+        assert bbox[0] >= 64  # overflow starts at or past the declared width
+
+    def test_bottom_overflow_is_detected(self):
+        dm = BoundsCheckingDisplayManager(width=64, height=32)
+        dm.draw.rectangle([5, 30, 10, 40], fill=(0, 255, 0))
+        bbox = dm.check_overflow()
+        assert bbox is not None
+        assert bbox[3] > 32  # overflow extends past the declared height
+
+    def test_declared_image_is_cropped_to_panel(self):
+        dm = BoundsCheckingDisplayManager(width=64, height=32)
+        assert dm.get_image().size == (64, 32)
+
+    def test_snapshot_saves_cropped_panel(self, tmp_path):
+        dm = BoundsCheckingDisplayManager(width=128, height=32)
+        out = tmp_path / "snap.png"
+        dm.save_snapshot(str(out))
+        with Image.open(out) as img:
+            assert img.size == (128, 32)
+
+
+class TestArbitraryPanelSizes:
+    """The harness must handle any panel shape, not a fixed supported list."""
+
+    def test_overflow_extent_pads_to_largest_in_run(self):
+        # A wide run (extent 256) means content at x=200 on a 64-wide panel is
+        # caught; the same draw with a small extent would be clipped (false pass).
+        wide = BoundsCheckingDisplayManager(width=64, height=32, overflow_extent=(256, 32))
+        wide.draw.rectangle([200, 5, 210, 10], fill=(255, 0, 0))
+        assert wide.check_overflow() is not None
+
+        tight = BoundsCheckingDisplayManager(width=64, height=32, overflow_extent=(64, 32))
+        tight.draw.rectangle([200, 5, 210, 10], fill=(255, 0, 0))
+        assert tight.check_overflow() is None  # clipped beyond the small canvas
+
+    def test_unusual_shapes_report_their_declared_size(self):
+        for w, h in [(8, 2), (6, 6), (200, 8), (64, 96)]:
+            dm = BoundsCheckingDisplayManager(width=w, height=h)
+            assert dm.width == w and dm.height == h
+            assert dm.matrix.width == w and dm.matrix.height == h
+
+
+class TestUpdateErrorClassification:
+    """update() may fail for lack of network (tolerated) but a logic bug must
+    not pass green just because display() survives."""
+
+    def test_connectivity_errors_are_tolerated(self):
+        import socket
+        import urllib.error
+        for exc in (ConnectionError("x"), TimeoutError("x"), socket.gaierror("x"),
+                    urllib.error.URLError("x")):
+            assert isinstance(exc, _TOLERATED_UPDATE_ERRORS)
+
+    def test_logic_errors_are_not_tolerated(self):
+        for exc in (ValueError("x"), KeyError("x"), AttributeError("x"), TypeError("x")):
+            assert not isinstance(exc, _TOLERATED_UPDATE_ERRORS)
+
+
+class TestSizeParsing:
+    def test_parse_size_token_ok(self):
+        assert parse_size_token(" 128X32 ") == (128, 32)
+
+    def test_parse_size_token_rejects_garbage(self):
+        with pytest.raises(ValueError):
+            parse_size_token("128xabc")
+        with pytest.raises(ValueError):
+            parse_size_token("128-32")
+
+    def test_rejects_non_positive_dimensions(self):
+        for bad in ("0x32", "-64x32", "64x0", "64x-1"):
+            with pytest.raises(ValueError):
+                parse_size_token(bad)
+        with pytest.raises(ValueError):
+            coerce_sizes([[0, 32]])
+        with pytest.raises(ValueError):
+            coerce_sizes("64x-1")
+
+    def test_coerce_sizes_from_string_and_pairs(self):
+        assert coerce_sizes("8x16,64x64") == [(8, 16), (64, 64)]
+        assert coerce_sizes([[8, 16], (64, 64)]) == [(8, 16), (64, 64)]
+        assert coerce_sizes(None) is None
+        assert coerce_sizes("") is None
+
+    def test_resolve_precedence_env_then_spec_then_default(self, monkeypatch):
+        monkeypatch.delenv("LEDMATRIX_TEST_SIZES", raising=False)
+        assert resolve_test_sizes(None) == list(DEFAULT_TEST_SIZES)
+        assert resolve_test_sizes([[8, 16]]) == [(8, 16)]
+        monkeypatch.setenv("LEDMATRIX_TEST_SIZES", "5x5")
+        # env wins over a per-plugin spec
+        assert resolve_test_sizes([[8, 16]]) == [(5, 5)]
+
+
+class TestCompareImages:
+    def test_identical_images_match(self):
+        a = Image.new("RGB", (16, 16), (10, 20, 30))
+        b = a.copy()
+        ok, diff_pixels, max_delta = compare_images(a, b)
+        assert ok and diff_pixels == 0 and max_delta == 0
+
+    def test_different_images_fail_at_zero_tolerance(self):
+        a = Image.new("RGB", (16, 16), (0, 0, 0))
+        b = a.copy()
+        b.putpixel((1, 1), (255, 255, 255))
+        ok, diff_pixels, max_delta = compare_images(a, b)
+        assert not ok and diff_pixels == 1 and max_delta == 255
+
+    def test_tolerance_absorbs_small_noise(self):
+        a = Image.new("RGB", (16, 16), (100, 100, 100))
+        b = a.copy()
+        b.putpixel((2, 2), (103, 100, 100))  # delta 3
+        ok, _, max_delta = compare_images(a, b, max_delta=5, max_diff_pixels=0)
+        assert ok and max_delta == 3
+
+    def test_size_mismatch_fails(self):
+        a = Image.new("RGB", (16, 16))
+        b = Image.new("RGB", (32, 16))
+        ok, _, _ = compare_images(a, b)
+        assert not ok
+
+
+class TestListModes:
+    def test_instance_modes_take_precedence(self):
+        inst = type("P", (), {"modes": ["a", "b"]})()
+        assert list_modes(inst, {"display_modes": ["x"]}, "pid") == ["a", "b"]
+
+    def test_falls_back_to_manifest_display_modes(self):
+        inst = type("P", (), {})()
+        assert list_modes(inst, {"display_modes": ["x", "y"]}, "pid") == ["x", "y"]
+
+    def test_falls_back_to_plugin_id(self):
+        inst = type("P", (), {})()
+        assert list_modes(inst, {}, "pid") == ["pid"]
@@ -0,0 +1,115 @@
+"""
+Cross-size / cross-screen plugin safety test.
+
+For every discovered plugin, render every declared screen at every supported
+matrix size and assert it: loads, renders without crashing, stays within the
+panel bounds, and — for plugins that ship golden images — matches them.
+
+Plugin discovery (first match wins):
+  - $LEDMATRIX_PLUGINS_DIR  (os.pathsep-separated list of dirs), else
+  - <project_root>/plugin-repos and <project_root>/plugins
+
+A plugin opts into golden-image checks by adding test/golden/<WxH>/<mode>.png
+(and usually test/harness.json for deterministic config / mock data / time).
+"""
+
+import os
+from pathlib import Path
+from typing import Dict, List
+
+import pytest
+
+from src.plugin_system.testing.harness import (
+    render_plugin_matrix, compare_to_goldens,
+)
+from src.plugin_system.testing.loading import load_config_defaults, load_harness_spec
+from src.plugin_system.testing.sizes import resolve_test_sizes
+
+PROJECT_ROOT = Path(__file__).resolve().parents[2]
+
+# Set LEDMATRIX_REQUIRE_PLUGINS=1 in any CI/hardware pipeline where plugins are
+# expected to be present, so a discovery drift (empty search path) fails loudly
+# instead of silently skipping and losing this safety signal.
+_REQUIRE_PLUGINS = os.environ.get("LEDMATRIX_REQUIRE_PLUGINS") == "1"
+
+
+def _plugin_search_dirs() -> List[Path]:
+    env = os.environ.get("LEDMATRIX_PLUGINS_DIR")
+    if env:
+        return [Path(p) for p in env.split(os.pathsep) if p]
+    return [PROJECT_ROOT / "plugin-repos", PROJECT_ROOT / "plugins"]
+
+
+def _discover() -> Dict[str, Path]:
+    """Map plugin_id -> plugin_dir for all plugins on the search path."""
+    found: Dict[str, Path] = {}
+    for base in _plugin_search_dirs():
+        if not base.exists():
+            continue
+        for child in sorted(base.iterdir()):
+            if (child / "manifest.json").exists() and child.name not in found:
+                found[child.name] = child
+    return found
+
+
+_PLUGINS = _discover()
+
+
+@pytest.mark.plugin
+def test_plugins_were_discovered() -> None:
+    """Guard against silently skipping the whole matrix when discovery drifts.
+
+    Local dev and the plugin-less core CI legitimately have no plugins, so we
+    skip there; but when LEDMATRIX_REQUIRE_PLUGINS=1 an empty search path is a
+    hard failure rather than a green no-op.
+    """
+    if _PLUGINS:
+        return
+    search = [str(p) for p in _plugin_search_dirs()]
+    if _REQUIRE_PLUGINS:
+        pytest.fail(
+            "LEDMATRIX_REQUIRE_PLUGINS=1 but no plugins were discovered on the "
+            f"search path: {search}"
+        )
+    pytest.skip(f"no plugins found on the search path: {search}")
+
+
+@pytest.mark.plugin
+@pytest.mark.skipif(not _PLUGINS, reason="no plugins found on the search path")
+@pytest.mark.parametrize("plugin_id", sorted(_PLUGINS))
+def test_plugin_renders_across_sizes_and_screens(plugin_id: str) -> None:
+    plugin_dir = _PLUGINS[plugin_id]
+    spec = load_harness_spec(plugin_dir)
+
+    config = {"enabled": True}
+    config.update(load_config_defaults(plugin_dir))
+    config.update(spec.get("config", {}))
+
+    # Sizes: LEDMATRIX_TEST_SIZES env (test on real hardware) wins, then the
+    # plugin's own harness.json "sizes", else the default representative sample.
+    sizes = resolve_test_sizes(spec.get("sizes"))
+
+    results = render_plugin_matrix(
+        plugin_id=plugin_id,
+        plugin_dir=plugin_dir,
+        config=config,
+        mock_data=spec.get("mock_data_contents", {}),
+        sizes=sizes,
+        run_update=not spec.get("skip_update", False),
+        freeze_time=spec.get("freeze_time"),
+    )
+    compare_to_goldens(results, plugin_dir / "test" / "golden")
+
+    failures = []
+    for r in results:
+        if r.error is not None:
+            failures.append(f"{r.size_label} {r.mode}: crashed: {r.error}")
+        elif r.overflow is not None:
+            failures.append(f"{r.size_label} {r.mode}: overflow past panel bbox={r.overflow}")
+        elif r.golden_checked and r.golden_ok is False:
+            failures.append(
+                f"{r.size_label} {r.mode}: golden drift {r.golden_diff_pixels}px "
+                f"(max Δ={r.golden_max_delta})"
+            )
+
+    assert not failures, f"{plugin_id} failed:\n  " + "\n  ".join(failures)