diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
new file mode 100644
index 00000000..1b3a7f71
--- /dev/null
+++ b/.github/workflows/test.yml
@@ -0,0 +1,33 @@
+name: Tests
+
+on:
+  pull_request:
+  push:
+    branches: [main]
+
+jobs:
+  plugin-safety:
+    name: Plugin safety harness + unit tests
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+        with:
+          persist-credentials: false
+
+      - uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
+        with:
+          python-version: "3.12"
+          cache: pip
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -r requirements.txt -r requirements-test.txt
+          pip install RGBMatrixEmulator
+
+      - name: Run harness + visual rendering tests
+        run: |
+          pytest --no-cov \
+            test/plugins/test_harness.py \
+            test/plugins/test_visual_rendering.py \
+            test/plugins/test_plugin_matrix.py
diff --git a/docs/plugin-safety-harness.md b/docs/plugin-safety-harness.md
new file mode 100644
index 00000000..eb127f18
--- /dev/null
+++ b/docs/plugin-safety-harness.md
@@ -0,0 +1,136 @@
+# Plugin Safety Harness
+
+Renders a plugin across **every declared screen (mode)** and **a spread of
+matrix sizes**, and fails if any combination crashes, draws past the panel edge,
+or — for plugins that ship golden images — drifts visually. The goal: change a
+plugin without breaking a size or screen you didn't think to test.
+
+## Sizes: a sample, not a fixed list
+
+There is **no fixed set of supported panel sizes** — an RGB matrix build can be
+any width/height and configuration (square, rectangle, 2×2, 4×4, 8×2, long
+strips, tall stacks). Plugins are expected to read dimensions dynamically
+(`self.display_manager.matrix.width/height`) and lay themselves out
+accordingly, so a hardcoded coordinate or unscaled font shows up as a failure
+here.
+
+The harness therefore renders against a **representative sample** that spans the
+axes of variation (`DEFAULT_TEST_SIZES` in `src/plugin_system/testing/sizes.py`),
+not an authoritative list:
+
+Each module is 64×32; entries are real panel-grid arrangements (cols × rows):
+
+| Size    | Grid | Why it's in the sample                     |
+|---------|------|--------------------------------------------|
+| 64×32   | 1×1  | single panel — tightest common rectangle   |
+| 128×32  | 2×1  | the baseline most plugins are tuned for    |
+| 64×64   | 1×2  | stacked — tall-narrow centering            |
+| 128×64  | 2×2  | block — icon scaling / vertical centering  |
+| 256×32  | 4×1  | long strip — wide horizontal layout        |
+| 128×96  | 2×3  | tall — vertical overflow                   |
+| 256×128 | 4×4  | large block — both dimensions big at once  |
+
+**Override the sizes entirely** to test your actual hardware (or any shape):
+
+```bash
+# CLI — one-off:
+python scripts/check_plugin.py --plugin clock-simple --sizes 8x16,64x64,256x32
+
+# pytest — force every plugin onto your panel(s):
+LEDMATRIX_TEST_SIZES="8x16,128x128" pytest test/plugins/test_plugin_matrix.py
+
+# Per-plugin — declare the shapes a plugin targets in its test/harness.json:
+#   { "sizes": [[8, 16], [64, 64]] }
+```
+
+Precedence: `LEDMATRIX_TEST_SIZES` env (global) → per-plugin `harness.json`
+`sizes` → the default sample. Bounds checking adapts to whatever sizes a run
+uses — the backing canvas is padded out to the **largest** panel in the run, so
+a coordinate meant for a big build is still caught when rendering a small one.
+
+## Quick start
+
+```bash
+# Functional + bounds check across all sizes/screens:
+python scripts/check_plugin.py --plugin clock-simple
+
+# Every discovered plugin:
+python scripts/check_plugin.py --all
+
+# Dump PNGs to eyeball each size/screen:
+python scripts/check_plugin.py --plugin ledmatrix-weather --out-dir /tmp/preview
+```
+
+Exit code is non-zero if any `(plugin, size, screen)` fails. Plugins are
+discovered in `plugin-repos/` and `plugins/` (override with `--plugin-dir`).
+
+## What it checks (Phase 1 — always on)
+
+1. **Loads** and builds its mode list.
+2. **Renders every screen** at every size without raising. `update()` may fail
+   (no network in CI) and is tolerated; a crash in `display()` is a failure —
+   `display()` must handle the no-data state.
+3. **Bounds**: nothing is drawn past the right/bottom edge. Implemented by
+   `BoundsCheckingDisplayManager`, which backs the declared panel with an
+   oversized canvas and flags any pixels that land in the margin. (Left/top
+   overflow at negative coordinates and BDF text are not flagged — golden images
+   cover those.)
+
+## Golden images (Phase 2 — opt-in per plugin)
+
+A plugin opts in by committing reference PNGs and (usually) a small harness spec:
+
+```
+plugins/<id>/test/harness.json          # how to render deterministically
+plugins/<id>/test/fixtures/mock.json     # optional cached data
+plugins/<id>/test/golden/<WxH>/<mode>.png
+```
+
+`test/harness.json` keys (all optional):
+
+```json
+{
+  "config":      { "timezone": "UTC" },
+  "mock_data":   "fixtures/mock.json",
+  "freeze_time": "2025-08-01 15:25:00",
+  "skip_update": false,
+  "sizes":       [[128, 32], [128, 64]]
+}
+```
+
+Generate / refresh goldens after an intentional visual change, then review the
+diff before committing:
+
+```bash
+python scripts/check_plugin.py --plugin clock-simple --update-golden \
+  --config '{"timezone":"UTC"}' --freeze-time "2025-08-01 15:25:00"
+```
+
+Comparison is exact by default (`compare_images` in `harness.py` accepts a
+tolerance for known anti-aliasing noise). Determinism requires a pinned Pillow
+and the bundled fonts — keep both stable when regenerating goldens.
+
+## Tests & CI
+
+- `test/plugins/test_harness.py` — unit tests for bounds detection, image
+  comparison, and mode enumeration (run anywhere).
+- `test/plugins/test_plugin_matrix.py` — parametrized over discovered plugins ×
+  sizes × screens; honors each plugin's `test/harness.json` and goldens. Skips
+  when no plugins are present (e.g. a fresh core checkout); set
+  `LEDMATRIX_REQUIRE_PLUGINS=1` in a pipeline where plugins must be present to
+  turn an empty discovery into a hard failure instead. Point it at the monorepo
+  with `LEDMATRIX_PLUGINS_DIR=/path/to/ledmatrix-plugins/plugins`.
+- `.github/workflows/test.yml` — runs the harness + visual tests on every PR.
+
+The plugin monorepo has its own `Plugin Safety` workflow that runs this harness
+against changed plugins on every PR.
+
+## Developer workflow
+
+1. Change the plugin on a branch.
+2. `python scripts/check_plugin.py --plugin <id> --out-dir /tmp/preview` and
+   eyeball the PNGs.
+3. Intentional visual change? `--update-golden`, review diffs, commit goldens.
+4. (Monorepo) bump `manifest.json` version and let the pre-commit hook sync
+   `plugins.json`.
+5. Push — CI re-runs the harness across all sizes and gates the PR.
diff --git a/requirements-test.txt b/requirements-test.txt
new file mode 100644
index 00000000..f0b5eb27
--- /dev/null
+++ b/requirements-test.txt
@@ -0,0 +1,8 @@
+# Test-only dependencies for the plugin safety harness and pytest suite.
+# Install alongside requirements.txt:  pip install -r requirements.txt -r requirements-test.txt
+# Upper bounds pin the major version so a new release can't silently change
+# golden-image / time-sensitive test behavior between CI runs.
+pytest>=7.4,<9
+pytest-cov>=4.1,<7
+jsonschema>=4.0,<5       # manifest validation
+freezegun>=1.2,<2        # deterministic time for golden-image tests
diff --git a/scripts/check_plugin.py b/scripts/check_plugin.py
new file mode 100644
index 00000000..e8afbe44
--- /dev/null
+++ b/scripts/check_plugin.py
@@ -0,0 +1,217 @@
+#!/usr/bin/env python3
+"""
+Plugin safety checker.
+
+Renders a plugin across every declared screen (mode) and every supported matrix
+size, and fails if any screen crashes, overflows the panel, or (for plugins with
+committed golden images) drifts visually.
+
+Usage:
+    # Functional + bounds check across all sizes/modes:
+    python scripts/check_plugin.py --plugin clock-simple
+
+    # Every discovered plugin:
+    python scripts/check_plugin.py --all
+
+    # Dump PNGs for each size/mode so you can eyeball them:
+    python scripts/check_plugin.py --plugin ledmatrix-weather --out-dir /tmp/preview
+
+    # Refresh committed golden images after an intentional visual change:
+    python scripts/check_plugin.py --plugin clock-simple --update-golden \
+        --mock-data plugins/clock-simple/test/fixtures/mock.json
+
+Exit code is non-zero if any (plugin, size, mode) fails.
+"""
+
+import argparse
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Dict, List, Optional
+
+PROJECT_ROOT = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(PROJECT_ROOT))
+
+os.environ['EMULATOR'] = 'true'
+
+from src.logging_config import get_logger  # noqa: E402
+from src.plugin_system.testing.loading import (  # noqa: E402
+    find_plugin_dir, load_config_defaults,
+)
+from src.plugin_system.testing.harness import (  # noqa: E402
+    RenderResult, render_plugin_matrix, compare_to_goldens, write_goldens,
+)
+from src.plugin_system.testing.sizes import (  # noqa: E402
+    DEFAULT_TEST_SIZES, parse_size_token, safe_mode_filename, size_label,
+)
+
+logger = get_logger("[Check Plugin]")
+
+DEFAULT_SEARCH_DIRS = [
+    str(PROJECT_ROOT / 'plugins'),
+    str(PROJECT_ROOT / 'plugin-repos'),
+]
+
+
+def discover_plugins(search_dirs: List[str]) -> List[str]:
+    """All plugin ids found across the search dirs (dirs containing manifest.json)."""
+    found = []
+    for d in search_dirs:
+        base = Path(d)
+        if not base.exists():
+            continue
+        for child in sorted(base.iterdir()):
+            if (child / 'manifest.json').exists() and child.name not in found:
+                found.append(child.name)
+    return found
+
+
+def parse_sizes(spec: Optional[str]):
+    if not spec:
+        return DEFAULT_TEST_SIZES
+    sizes = []
+    for token in spec.split(','):
+        if not token.strip():
+            continue
+        try:
+            sizes.append(parse_size_token(token))
+        except ValueError as exc:
+            raise SystemExit(str(exc)) from exc
+    return sizes
+
+
+def check_one(plugin_id: str, search_dirs: List[str], sizes, mock_data: Dict,
+              config: Dict, run_update: bool, out_dir: Optional[Path],
+              update_golden: bool, golden_dir_override: Optional[Path],
+              freeze_time: Optional[str]) -> List[RenderResult]:
+    plugin_dir = find_plugin_dir(plugin_id, search_dirs)
+    if not plugin_dir:
+        logger.error("Plugin '%s' not found in: %s", plugin_id, search_dirs)
+        return [RenderResult(plugin_id, 0, 0, "<not-found>", error="plugin directory not found")]
+
+    # Start from config_schema defaults so plugins behave like a real install.
+    full_config = {"enabled": True}
+    full_config.update(load_config_defaults(plugin_dir))
+    full_config.update(config)
+
+    results = render_plugin_matrix(
+        plugin_id=plugin_id, plugin_dir=plugin_dir, config=full_config,
+        mock_data=mock_data, sizes=sizes, run_update=run_update,
+        freeze_time=freeze_time,
+    )
+
+    golden_dir = golden_dir_override or (plugin_dir / 'test' / 'golden')
+    if update_golden:
+        written = write_goldens(results, golden_dir)
+        logger.info("Wrote %d golden image(s) for %s to %s", written, plugin_id, golden_dir)
+    else:
+        compare_to_goldens(results, golden_dir)
+
+    if out_dir:
+        for r in results:
+            if r.image is None:
+                continue
+            dest = out_dir / plugin_id / size_label(r.width, r.height)
+            dest.mkdir(parents=True, exist_ok=True)
+            r.image.save(dest / f"{safe_mode_filename(r.mode)}.png", format="PNG")
+
+    return results
+
+
+def print_report(all_results: Dict[str, List[RenderResult]]) -> bool:
+    """Print a per-plugin grid. Returns True if everything passed."""
+    everything_ok = True
+    for plugin_id, results in all_results.items():
+        print(f"\n=== {plugin_id} ===")
+        for r in results:
+            if r.ok:
+                status = "PASS"
+                detail = ""
+                if r.golden_checked:
+                    detail = " (golden ✓)"
+                if r.update_error is not None:
+                    detail += f" (update warn: {r.update_error})"
+            else:
+                everything_ok = False
+                if r.error is not None:
+                    status, detail = "FAIL", f" error={r.error}"
+                elif r.overflow is not None:
+                    status, detail = "FAIL", f" overflow bbox={r.overflow}"
+                elif r.golden_ok is False:
+                    status = "FAIL"
+                    detail = f" golden drift: {r.golden_diff_pixels}px (max Δ={r.golden_max_delta})"
+                else:
+                    status, detail = "FAIL", ""
+            print(f"  [{status}] {r.size_label:>7}  {r.mode}{detail}")
+    print()
+    return everything_ok
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Check a plugin renders safely across sizes & screens")
+    group = parser.add_mutually_exclusive_group(required=True)
+    group.add_argument('--plugin', '-p', help='Plugin id to check')
+    group.add_argument('--all', action='store_true', help='Check every discovered plugin')
+    parser.add_argument('--plugin-dir', '-d', default=None, help='Directory to search for plugins')
+    parser.add_argument('--sizes', default=None, help='Comma-separated WxH list (default: all supported)')
+    parser.add_argument('--config', '-c', default='{}', help='Plugin config overrides as JSON')
+    parser.add_argument('--mock-data', '-m', default=None, help='Path to JSON file with mock cache data')
+    parser.add_argument('--out-dir', '-o', default=None, help='Also dump rendered PNGs here')
+    parser.add_argument('--skip-update', action='store_true', help='Skip calling update()')
+    parser.add_argument('--update-golden', action='store_true', help='Write/refresh golden images')
+    parser.add_argument('--golden-dir', default=None, help='Override golden dir (default: <plugin>/test/golden)')
+    parser.add_argument('--freeze-time', default=None,
+                        help='Freeze wall clock, e.g. "2025-08-01 15:25:00" (for time-dependent plugins)')
+    args = parser.parse_args()
+
+    search_dirs = [args.plugin_dir] if args.plugin_dir else DEFAULT_SEARCH_DIRS
+    sizes = parse_sizes(args.sizes)
+
+    try:
+        config = json.loads(args.config)
+    except json.JSONDecodeError as e:
+        logger.error("Invalid --config JSON: %s", e)
+        return 2
+    if not isinstance(config, dict):
+        logger.error("--config must be a JSON object, got %s", type(config).__name__)
+        return 2
+
+    mock_data = {}
+    if args.mock_data:
+        mock_path = Path(args.mock_data)
+        if not mock_path.exists():
+            logger.error("Mock data file not found: %s", args.mock_data)
+            return 2
+        with open(mock_path) as f:
+            mock_data = json.load(f)
+        if not isinstance(mock_data, dict):
+            logger.error("--mock-data must be a JSON object (key -> cache value), got %s",
+                         type(mock_data).__name__)
+            return 2
+
+    plugin_ids = discover_plugins(search_dirs) if args.all else [args.plugin]
+    if not plugin_ids:
+        logger.error("No plugins found in: %s", search_dirs)
+        return 2
+
+    out_dir = Path(args.out_dir) if args.out_dir else None
+    golden_dir_override = Path(args.golden_dir) if args.golden_dir else None
+
+    all_results: Dict[str, List[RenderResult]] = {}
+    for plugin_id in plugin_ids:
+        all_results[plugin_id] = check_one(
+            plugin_id=plugin_id, search_dirs=search_dirs, sizes=sizes,
+            mock_data=mock_data, config=config, run_update=not args.skip_update,
+            out_dir=out_dir, update_golden=args.update_golden,
+            golden_dir_override=golden_dir_override, freeze_time=args.freeze_time,
+        )
+
+    # When refreshing goldens we skip drift comparison, but a crash or overflow
+    # still means the plugin is broken — never let --update-golden mask that.
+    ok = print_report(all_results)
+    return 0 if ok else 1
+
+
+if __name__ == '__main__':
+    sys.exit(main())
diff --git a/scripts/render_plugin.py b/scripts/render_plugin.py
index 818941a3..bcacc4b8 100644
--- a/scripts/render_plugin.py
+++ b/scripts/render_plugin.py
@@ -17,7 +17,6 @@ import os
 import json
 import argparse
 from pathlib import Path
-from typing import Any, Dict, Optional, Sequence, Union
 
 # Add project root to path
 PROJECT_ROOT = Path(__file__).resolve().parent.parent
@@ -28,49 +27,15 @@ os.environ['EMULATOR'] = 'true'
 
 # Import logger after path setup so src.logging_config is importable
 from src.logging_config import get_logger  # noqa: E402
+from src.plugin_system.testing.loading import (  # noqa: E402
+    find_plugin_dir, load_manifest, load_config_defaults,
+)
 logger = get_logger("[Render Plugin]")
 
 MIN_DIMENSION = 1
 MAX_DIMENSION = 512
 
 
-def find_plugin_dir(plugin_id: str, search_dirs: Sequence[Union[str, Path]]) -> Optional[Path]:
-    """Find a plugin directory by searching multiple paths."""
-    from src.plugin_system.plugin_loader import PluginLoader
-    loader = PluginLoader()
-    for search_dir in search_dirs:
-        search_path = Path(search_dir)
-        if not search_path.exists():
-            continue
-        result = loader.find_plugin_directory(plugin_id, search_path)
-        if result:
-            return Path(result)
-    return None
-
-
-def load_manifest(plugin_dir: Path) -> Dict[str, Any]:
-    """Load and return manifest.json from plugin directory."""
-    manifest_path = plugin_dir / 'manifest.json'
-    if not manifest_path.exists():
-        raise FileNotFoundError(f"No manifest.json in {plugin_dir}")
-    with open(manifest_path, 'r') as f:
-        return json.load(f)
-
-
-def load_config_defaults(plugin_dir: Path) -> Dict[str, Any]:
-    """Extract default values from config_schema.json."""
-    schema_path = plugin_dir / 'config_schema.json'
-    if not schema_path.exists():
-        return {}
-    with open(schema_path, 'r') as f:
-        schema = json.load(f)
-    defaults: Dict[str, Any] = {}
-    for key, prop in schema.get('properties', {}).items():
-        if 'default' in prop:
-            defaults[key] = prop['default']
-    return defaults
-
-
 def main() -> int:
     """Load a plugin, call update() + display(), and save the result as a PNG image."""
     parser = argparse.ArgumentParser(description='Render a plugin display to a PNG image')
diff --git a/src/plugin_system/testing/__init__.py b/src/plugin_system/testing/__init__.py
index ebcf5d60..f436d772 100644
--- a/src/plugin_system/testing/__init__.py
+++ b/src/plugin_system/testing/__init__.py
@@ -7,13 +7,22 @@ Provides base classes and utilities for testing LEDMatrix plugins.
 from .plugin_test_base import PluginTestCase
 from .mocks import MockDisplayManager, MockCacheManager, MockConfigManager, MockPluginManager
 from .visual_display_manager import VisualTestDisplayManager
+from .bounds_display_manager import BoundsCheckingDisplayManager
+from .sizes import (
+    DEFAULT_TEST_SIZES, SUPPORTED_SIZES, resolve_test_sizes, size_label,
+)
 
 __all__ = [
     'PluginTestCase',
     'VisualTestDisplayManager',
+    'BoundsCheckingDisplayManager',
     'MockDisplayManager',
     'MockCacheManager',
     'MockConfigManager',
     'MockPluginManager',
+    'DEFAULT_TEST_SIZES',
+    'SUPPORTED_SIZES',
+    'resolve_test_sizes',
+    'size_label',
 ]
 
diff --git a/src/plugin_system/testing/bounds_display_manager.py b/src/plugin_system/testing/bounds_display_manager.py
new file mode 100644
index 00000000..ccbf393f
--- /dev/null
+++ b/src/plugin_system/testing/bounds_display_manager.py
@@ -0,0 +1,129 @@
+"""
+Bounds-checking display manager.
+
+A VisualTestDisplayManager that draws onto an oversized canvas (the declared
+panel size plus a right/bottom margin) while still reporting the declared size
+to the plugin. Content that a plugin draws past the right or bottom edge lands
+in the margin instead of being silently clipped by PIL, so the harness can
+detect overflow — the classic symptom of hardcoded coordinates or fonts/icons
+that don't scale down to a smaller panel.
+
+Limitations (documented on purpose):
+- Overflow past the LEFT or TOP edge (negative coordinates) is still clipped by
+  PIL and not detected here. The dominant real-world breakage is content that is
+  too wide/tall for a smaller panel, which this catches.
+- BDF text is clipped to the declared bounds by the parent's bitmap drawer, so
+  BDF overflow is not flagged. Golden-image regression covers those plugins.
+- If a plugin replaces the canvas with its own image (display_manager.image = ...),
+  the margin can't be measured and overflow is reported as undetermined (None).
+"""
+
+from typing import Optional, Tuple
+
+from .sizes import DEFAULT_TEST_SIZES
+from .visual_display_manager import VisualTestDisplayManager, _MatrixProxy
+
+# Smallest extra band kept on the right/bottom so a few pixels of overflow are
+# still visible even on the largest panel in a run.
+_BASE_MARGIN = 16
+# Fallback overflow reference when a caller doesn't pass one: the largest shape
+# in the default sample. We extend every (smaller) canvas out to at least this
+# size so content drawn at a coordinate meant for a bigger build — e.g. x=200 on
+# a 64-wide panel — lands in the padded region and is flagged, instead of being
+# clipped off-canvas and read as a false pass.
+_DEFAULT_EXTENT_WIDTH = max(w for w, _ in DEFAULT_TEST_SIZES)
+_DEFAULT_EXTENT_HEIGHT = max(h for _, h in DEFAULT_TEST_SIZES)
+
+
+class BoundsCheckingDisplayManager(VisualTestDisplayManager):
+    """Detects drawing that overflows the declared panel size."""
+
+    # Kept for backwards compatibility; real padding is computed per-axis below.
+    MARGIN = _BASE_MARGIN
+
+    def __init__(self, width: int = 128, height: int = 32,
+                 overflow_extent: Optional[Tuple[int, int]] = None):
+        self._declared_width = int(width)
+        self._declared_height = int(height)
+        # Pad the canvas out to at least `overflow_extent` (the largest panel
+        # this run cares about) plus a base margin, so coordinates meant for a
+        # bigger build are caught — not clipped — when rendering a smaller panel.
+        # Defaults to the largest shape in the sample when no run is known.
+        ext_w, ext_h = overflow_extent or (_DEFAULT_EXTENT_WIDTH, _DEFAULT_EXTENT_HEIGHT)
+        self._canvas_width = max(self._declared_width, int(ext_w)) + _BASE_MARGIN
+        self._canvas_height = max(self._declared_height, int(ext_h)) + _BASE_MARGIN
+        # Parent builds the (oversized) backing canvas + fonts.
+        super().__init__(self._canvas_width, self._canvas_height)
+        # Plugins must see the DECLARED size, not the padded canvas size.
+        self.matrix = _MatrixProxy(self._declared_width, self._declared_height)
+
+    # -- declared dimensions (override parent's image-derived properties) --
+
+    @property
+    def width(self) -> int:
+        return self._declared_width
+
+    @property
+    def height(self) -> int:
+        return self._declared_height
+
+    @property
+    def display_width(self) -> int:
+        return self._declared_width
+
+    @property
+    def display_height(self) -> int:
+        return self._declared_height
+
+    # -- overflow detection --
+
+    def _canvas_is_padded(self) -> bool:
+        return self.image.size == (self._canvas_width, self._canvas_height)
+
+    def check_overflow(self) -> Optional[Tuple[int, int, int, int]]:
+        """Bounding box (in full-canvas coords) of any drawing beyond the
+        declared panel, or None if nothing overflowed / undetermined."""
+        if not self._canvas_is_padded():
+            return None
+
+        exp_w = self._canvas_width
+        exp_h = self._canvas_height
+        boxes = []
+
+        right = self.image.crop((self._declared_width, 0, exp_w, exp_h)).getbbox()
+        if right:
+            boxes.append((right[0] + self._declared_width, right[1],
+                          right[2] + self._declared_width, right[3]))
+
+        bottom = self.image.crop((0, self._declared_height, exp_w, exp_h)).getbbox()
+        if bottom:
+            boxes.append((bottom[0], bottom[1] + self._declared_height,
+                          bottom[2], bottom[3] + self._declared_height))
+
+        if not boxes:
+            return None
+        return (
+            min(b[0] for b in boxes), min(b[1] for b in boxes),
+            max(b[2] for b in boxes), max(b[3] for b in boxes),
+        )
+
+    # -- snapshot/image accessors return the cropped, true-panel image --
+
+    def declared_image(self):
+        """The visible panel: the canvas cropped to the declared size."""
+        if self._canvas_is_padded():
+            return self.image.crop((0, 0, self._declared_width, self._declared_height))
+        return self.image
+
+    def save_snapshot(self, path: str) -> None:
+        self.declared_image().save(path, format='PNG')
+
+    def get_image(self):
+        return self.declared_image()
+
+    def get_image_base64(self) -> str:
+        import base64
+        import io
+        buffer = io.BytesIO()
+        self.declared_image().save(buffer, format='PNG')
+        return base64.b64encode(buffer.getvalue()).decode('utf-8')
diff --git a/src/plugin_system/testing/harness.py b/src/plugin_system/testing/harness.py
new file mode 100644
index 00000000..21c44235
--- /dev/null
+++ b/src/plugin_system/testing/harness.py
@@ -0,0 +1,314 @@
+"""
+Plugin safety harness.
+
+Renders a plugin across every declared screen (mode) and every supported matrix
+size, capturing crashes and overflow. Used by scripts/check_plugin.py and the
+pytest matrix test to guarantee a plugin change doesn't break a screen at a size
+the author didn't try.
+
+The render flow mirrors scripts/render_plugin.py (same PluginLoader call), but
+this module adds: multi-size iteration, per-mode rendering, overflow detection
+via BoundsCheckingDisplayManager, and golden-image comparison.
+"""
+
+import contextlib
+import http.client
+import inspect
+import socket
+import ssl
+import urllib.error
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple
+
+from PIL import Image, ImageChops
+
+from src.logging_config import get_logger
+from .bounds_display_manager import BoundsCheckingDisplayManager
+from .loading import load_config_defaults, load_manifest
+from .sizes import DEFAULT_TEST_SIZES, safe_mode_filename, size_label
+
+logger = get_logger("[Plugin Harness]")
+
+
+def _tolerated_update_errors() -> Tuple[type, ...]:
+    """Exception types from update() we treat as a tolerated no-connectivity
+    failure (expected in CI / headless dev) rather than a real plugin bug.
+
+    Anything NOT in this set is a genuine regression — a plugin that lets a
+    non-network exception escape update() should fail the harness, not pass
+    green because display() happened to survive.
+    """
+    types: List[type] = [
+        ConnectionError, TimeoutError,        # builtins
+        socket.gaierror, socket.timeout,      # DNS / socket timeouts
+        ssl.SSLError,
+        urllib.error.URLError,
+        http.client.HTTPException,
+    ]
+    try:  # requests is optional; cover its whole error tree when present
+        import requests
+        types.append(requests.exceptions.RequestException)
+    except ImportError:  # pragma: no cover - requests not installed
+        logger.debug("requests not installed; its connectivity errors won't be specifically tolerated")
+    return tuple(types)
+
+
+_TOLERATED_UPDATE_ERRORS = _tolerated_update_errors()
+
+
+@dataclass
+class RenderResult:
+    """Outcome of rendering one (size, mode) of a plugin."""
+    plugin_id: str
+    width: int
+    height: int
+    mode: str
+    image: Optional[Image.Image] = None
+    error: Optional[str] = None          # fatal: load/display crash, or a non-network update() error
+    update_error: Optional[str] = None   # tolerated: connectivity error from update() (no network in CI)
+    overflow: Optional[Tuple[int, int, int, int]] = None  # bbox past the panel
+    # golden comparison (populated only when a golden was provided)
+    golden_checked: bool = False
+    golden_ok: Optional[bool] = None
+    golden_diff_pixels: int = 0
+    golden_max_delta: int = 0
+
+    @property
+    def size_label(self) -> str:
+        return size_label(self.width, self.height)
+
+    @property
+    def ok(self) -> bool:
+        """Phase-1 pass: rendered without crashing and without overflow, and if a
+        golden was checked it matched."""
+        if self.error is not None or self.overflow is not None:
+            return False
+        if self.golden_checked and self.golden_ok is False:
+            return False
+        return True
+
+
+def list_modes(plugin_instance: Any, manifest: Dict[str, Any], plugin_id: str) -> List[str]:
+    """Enumerate a plugin's screens: instance.modes wins, then manifest
+    display_modes, then the plugin id as a single mode."""
+    modes = getattr(plugin_instance, "modes", None)
+    if modes:
+        return [str(m) for m in modes]
+    declared = manifest.get("display_modes")
+    if declared:
+        return [str(m) for m in declared]
+    return [plugin_id]
+
+
+def _instantiate(plugin_id: str, manifest: Dict[str, Any], plugin_dir: Path,
+                 config: Dict[str, Any], mock_data: Dict[str, Any],
+                 display_manager: Any) -> Any:
+    """Load and construct a plugin instance with mocked managers."""
+    from src.plugin_system.plugin_loader import PluginLoader
+    from src.plugin_system.testing import MockCacheManager, MockPluginManager
+
+    cache_manager = MockCacheManager()
+    for key, value in (mock_data or {}).items():
+        cache_manager.set(key, value)
+
+    loader = PluginLoader()
+    plugin_instance, _module = loader.load_plugin(
+        plugin_id=plugin_id,
+        manifest=manifest,
+        plugin_dir=plugin_dir,
+        config=config,
+        display_manager=display_manager,
+        cache_manager=cache_manager,
+        plugin_manager=MockPluginManager(),
+        install_deps=False,
+    )
+    return plugin_instance
+
+
+def _render_mode(plugin_instance: Any, mode: str) -> None:
+    """Render a specific screen. Prefer an explicit display_mode kwarg; otherwise
+    drive the plugin's internal mode state machine (first display() call renders
+    modes[current_mode_index] when current_display_mode is None)."""
+    sig = inspect.signature(plugin_instance.display)
+    if "display_mode" in sig.parameters:
+        plugin_instance.display(force_clear=True, display_mode=mode)
+        return
+
+    modes = getattr(plugin_instance, "modes", None)
+    if modes and mode in modes:
+        plugin_instance.current_mode_index = list(modes).index(mode)
+    if hasattr(plugin_instance, "current_display_mode"):
+        plugin_instance.current_display_mode = None
+    plugin_instance.display(force_clear=False)
+
+
+def _freeze(freeze_time: Optional[str]):
+    """Context manager that freezes wall-clock time when freeze_time is given,
+    so time-dependent plugins (clocks, countdowns) render deterministic goldens."""
+    if not freeze_time:
+        return contextlib.nullcontext()
+    try:
+        from freezegun import freeze_time as _ft
+    except ImportError as e:  # pragma: no cover - only hit without the dep
+        raise RuntimeError(
+            "freeze_time requires the 'freezegun' package (pip install freezegun)"
+        ) from e
+    return _ft(freeze_time)
+
+
+def render_plugin_matrix(
+    plugin_id: str,
+    plugin_dir: Path,
+    config: Optional[Dict[str, Any]] = None,
+    mock_data: Optional[Dict[str, Any]] = None,
+    sizes: Optional[List[Tuple[int, int]]] = None,
+    run_update: bool = True,
+    freeze_time: Optional[str] = None,
+) -> List[RenderResult]:
+    """Render every (size, mode) combination for a plugin.
+
+    Returns a flat list of RenderResult. A fresh plugin instance is built per
+    (size, mode) so state never leaks between screens. Pass freeze_time (e.g.
+    "2025-08-01 15:25:00") to make time-dependent plugins reproducible.
+    """
+    plugin_dir = Path(plugin_dir)
+    manifest = load_manifest(plugin_dir)
+    # Start from config_schema.json defaults so the plugin behaves like a real
+    # install; explicit caller config still wins over a schema default.
+    config = {"enabled": True, **load_config_defaults(plugin_dir), **(config or {})}
+    sizes = sizes or DEFAULT_TEST_SIZES
+    results: List[RenderResult] = []
+
+    # The largest panel in this run. Every (smaller) canvas is padded out to it
+    # so a coordinate meant for the biggest configuration is still caught when
+    # rendering a smaller one, instead of being clipped into a false pass.
+    extent = (max(w for w, _ in sizes), max(h for _, h in sizes))
+
+    with _freeze(freeze_time):
+        for width, height in sizes:
+            results.extend(_render_size(
+                plugin_id, manifest, plugin_dir, config, mock_data or {},
+                width, height, run_update, extent,
+            ))
+
+    return results
+
+
+def _render_size(plugin_id, manifest, plugin_dir, config, mock_data,
+                 width, height, run_update, extent) -> List[RenderResult]:
+    """Render every mode at one size. A fresh instance per mode avoids state leaks."""
+    results: List[RenderResult] = []
+
+    # Discover modes once per size (instance build can depend on config).
+    try:
+        probe_dm = BoundsCheckingDisplayManager(width=width, height=height, overflow_extent=extent)
+        probe = _instantiate(plugin_id, manifest, plugin_dir, config, mock_data, probe_dm)
+        modes = list_modes(probe, manifest, plugin_id)
+    except Exception as e:  # noqa: BLE001 — surface any load failure as a result
+        return [RenderResult(plugin_id, width, height, "<load>", error=repr(e))]
+
+    for mode in modes:
+        result = RenderResult(plugin_id, width, height, mode)
+        dm = BoundsCheckingDisplayManager(width=width, height=height, overflow_extent=extent)
+        try:
+            inst = _instantiate(plugin_id, manifest, plugin_dir, config, mock_data, dm)
+            if run_update:
+                try:
+                    inst.update()
+                except _TOLERATED_UPDATE_ERRORS as e:
+                    # Expected when CI / headless dev has no network: record it
+                    # (surfaced in the report) but don't fail the run.
+                    result.update_error = repr(e)
+                    logger.debug("update() connectivity error for %s [%s]: %s", plugin_id, mode, e)
+                except Exception as e:  # noqa: BLE001 — a non-network update() failure is a real bug
+                    # A regression in update() must not pass green just because
+                    # display() survives, so treat it as a failure of this render.
+                    result.error = repr(e)
+                    logger.warning("update() raised a non-connectivity error for %s [%s]: %s",
+                                   plugin_id, mode, e)
+            if result.error is None:
+                _render_mode(inst, mode)
+                result.image = dm.get_image()
+                result.overflow = dm.check_overflow()
+        except Exception as e:  # noqa: BLE001 — a display crash is a real failure
+            result.error = repr(e)
+        results.append(result)
+
+    return results
+
+
+# ---------------------------------------------------------------------------
+# Golden-image comparison
+# ---------------------------------------------------------------------------
+
+def compare_images(rendered: Image.Image, golden: Image.Image,
+                   max_delta: int = 0, max_diff_pixels: int = 0) -> Tuple[bool, int, int]:
+    """Compare two images. Returns (ok, diff_pixel_count, max_per_channel_delta).
+
+    Tolerances default to exact match; bump them only to absorb known platform
+    anti-aliasing noise (requires a pinned Pillow + bundled fonts for stability).
+    """
+    if rendered.size != golden.size:
+        return False, rendered.size[0] * rendered.size[1], 255
+    a = rendered.convert("RGB")
+    b = golden.convert("RGB")
+    diff = ImageChops.difference(a, b)
+    bbox = diff.getbbox()
+    if bbox is None:
+        return True, 0, 0
+    # Count pixels whose largest per-channel delta exceeds the allowed tolerance,
+    # and track the worst delta seen (for reporting).
+    diff_pixels = 0
+    observed_max = 0
+    for px in diff.crop(bbox).getdata():
+        m = max(px) if isinstance(px, tuple) else px
+        if m > observed_max:
+            observed_max = m
+        if m > max_delta:
+            diff_pixels += 1
+    # Pass when the number of out-of-tolerance pixels is within budget.
+    ok = diff_pixels <= max_diff_pixels
+    return ok, diff_pixels, observed_max
+
+
+def golden_path(golden_dir: Path, width: int, height: int, mode: str) -> Path:
+    """Location of a golden image: <golden_dir>/<WxH>/<mode>.png.
+
+    The mode is sanitized to a safe basename so a mode name with '/' or '..'
+    can't read or write outside the golden directory.
+    """
+    return Path(golden_dir) / size_label(width, height) / f"{safe_mode_filename(mode)}.png"
+
+
+def compare_to_goldens(results: List[RenderResult], golden_dir: Path,
+                       max_delta: int = 0, max_diff_pixels: int = 0) -> List[RenderResult]:
+    """Compare rendered results against committed goldens, mutating each result's
+    golden_* fields. Results with no golden file on disk are left unchecked."""
+    for r in results:
+        if r.image is None:
+            continue
+        gp = golden_path(golden_dir, r.width, r.height, r.mode)
+        if not gp.exists():
+            continue
+        r.golden_checked = True
+        with Image.open(gp) as g:
+            ok, diff_pixels, observed_max = compare_images(
+                r.image, g, max_delta=max_delta, max_diff_pixels=max_diff_pixels)
+        r.golden_ok = ok
+        r.golden_diff_pixels = diff_pixels
+        r.golden_max_delta = observed_max
+    return results
+
+
+def write_goldens(results: List[RenderResult], golden_dir: Path) -> int:
+    """Write each successfully-rendered result to its golden path. Returns count."""
+    written = 0
+    for r in results:
+        if r.image is None or r.error is not None:
+            continue
+        gp = golden_path(golden_dir, r.width, r.height, r.mode)
+        gp.parent.mkdir(parents=True, exist_ok=True)
+        r.image.save(gp, format="PNG")
+        written += 1
+    return written
diff --git a/src/plugin_system/testing/loading.py b/src/plugin_system/testing/loading.py
new file mode 100644
index 00000000..ed692d5f
--- /dev/null
+++ b/src/plugin_system/testing/loading.py
@@ -0,0 +1,82 @@
+"""
+Shared helpers for loading a plugin headlessly.
+
+Used by scripts/render_plugin.py, scripts/check_plugin.py, and the harness so
+plugin discovery / manifest / config-default logic lives in exactly one place.
+"""
+
+import json
+from pathlib import Path
+from typing import Any, Dict, Optional, Sequence, Union
+
+
+def find_plugin_dir(plugin_id: str, search_dirs: Sequence[Union[str, Path]]) -> Optional[Path]:
+    """Find a plugin directory by searching multiple paths."""
+    from src.plugin_system.plugin_loader import PluginLoader
+    loader = PluginLoader()
+    for search_dir in search_dirs:
+        search_path = Path(search_dir)
+        if not search_path.exists():
+            continue
+        result = loader.find_plugin_directory(plugin_id, search_path)
+        if result:
+            return Path(result)
+    return None
+
+
+def load_manifest(plugin_dir: Union[str, Path]) -> Dict[str, Any]:
+    """Load and return manifest.json from a plugin directory."""
+    manifest_path = Path(plugin_dir) / 'manifest.json'
+    if not manifest_path.exists():
+        raise FileNotFoundError(f"No manifest.json in {plugin_dir}")
+    with open(manifest_path, 'r') as f:
+        return json.load(f)
+
+
+def load_config_defaults(plugin_dir: Union[str, Path]) -> Dict[str, Any]:
+    """Extract default values from a plugin's config_schema.json (empty if none)."""
+    schema_path = Path(plugin_dir) / 'config_schema.json'
+    if not schema_path.exists():
+        return {}
+    with open(schema_path, 'r') as f:
+        schema = json.load(f)
+    defaults: Dict[str, Any] = {}
+    for key, prop in schema.get('properties', {}).items():
+        if isinstance(prop, dict) and 'default' in prop:
+            defaults[key] = prop['default']
+    return defaults
+
+
+def load_harness_spec(plugin_dir: Union[str, Path]) -> Dict[str, Any]:
+    """Optional per-plugin harness settings from <plugin>/test/harness.json.
+
+    Lets a plugin opt into golden-image testing by declaring how to render it
+    deterministically. All keys optional:
+        {
+          "config":     {...},            # config overrides
+          "mock_data":  "fixtures/mock.json",  # path (relative to plugin dir) to cache fixtures
+          "freeze_time": "2025-08-01 15:25:00",
+          "skip_update": false
+        }
+    Returns {} when no harness.json exists.
+    """
+    spec_path = Path(plugin_dir) / 'test' / 'harness.json'
+    if not spec_path.exists():
+        return {}
+    with open(spec_path, 'r') as f:
+        spec = json.load(f)
+
+    # Resolve mock_data path and inline its contents for convenience.
+    mock_rel = spec.get('mock_data')
+    if mock_rel:
+        mock_path = Path(plugin_dir) / mock_rel
+        if not mock_path.exists():
+            # A declared-but-missing fixture is a harness config error: failing
+            # loudly beats silently rendering the plugin with no mock data.
+            raise FileNotFoundError(
+                f"harness.json references mock_data '{mock_rel}' but "
+                f"{mock_path} does not exist"
+            )
+        with open(mock_path, 'r') as mf:
+            spec['mock_data_contents'] = json.load(mf)
+    return spec
diff --git a/src/plugin_system/testing/mocks.py b/src/plugin_system/testing/mocks.py
index a9b0f2bd..df436acd 100644
--- a/src/plugin_system/testing/mocks.py
+++ b/src/plugin_system/testing/mocks.py
@@ -63,11 +63,23 @@ class MockCacheManager:
     """Mock cache manager for testing."""
     
     def __init__(self):
+        import shutil
+        import tempfile
+        import weakref
         self._cache: Dict[str, Any] = {}
         self._cache_timestamps: Dict[str, float] = {}
         self.get_calls = []
         self.set_calls = []
         self.delete_calls = []
+        # Real temp dir for plugins that write/read files under cache_dir.
+        # Registered for cleanup so each mock instance doesn't leak a tmp dir.
+        self.cache_dir = tempfile.mkdtemp(prefix="ledmatrix-mock-cache-")
+        self._finalizer = weakref.finalize(
+            self, shutil.rmtree, self.cache_dir, ignore_errors=True)
+
+    def cleanup(self) -> None:
+        """Remove the temp cache directory created for this instance."""
+        self._finalizer()
     
     def get(self, key: str, max_age: Optional[float] = None) -> Optional[Any]:
         """Get a value from cache."""
diff --git a/src/plugin_system/testing/sizes.py b/src/plugin_system/testing/sizes.py
new file mode 100644
index 00000000..793dcbc3
--- /dev/null
+++ b/src/plugin_system/testing/sizes.py
@@ -0,0 +1,120 @@
+"""
+LED matrix sizes the plugin safety harness renders against.
+
+There is no fixed set of "supported" panel sizes — an RGB matrix build can be
+any width/height and configuration (square, rectangle, 2x2, 4x4, 8x2, long
+strips, tall stacks, ...). Plugins are expected to read width/height
+dynamically and lay themselves out accordingly, so the harness's job is to
+prove a plugin survives a *spread* of shapes, not a canonical list.
+
+`DEFAULT_TEST_SIZES` is therefore a representative SAMPLE chosen to span the
+axes of variation (narrow, wide, square, tall, small, long), not an
+exhaustive or authoritative list. Callers can override it entirely:
+
+  - CLI:        scripts/check_plugin.py --sizes 8x16,64x64,256x32
+  - pytest:     LEDMATRIX_TEST_SIZES="8x16,64x64" env var (all plugins), or
+                per-plugin test/harness.json {"sizes": [[8, 16], [64, 64]]}
+
+so anyone can point the harness at the exact panel(s) their build uses.
+"""
+
+import os
+from typing import Iterable, List, Optional, Sequence, Tuple, Union
+
+# A spread of real panel-grid arrangements (each module is 64x32), not a list of
+# "blessed" sizes. Each entry exercises a different layout assumption a plugin
+# might accidentally bake in. Annotations are the panel grid (cols x rows).
+DEFAULT_TEST_SIZES: List[Tuple[int, int]] = [
+    (64, 32),    # 1x1 — single panel, the tightest common rectangle
+    (128, 32),   # 2x1 — the baseline most plugins are tuned for
+    (64, 64),    # 1x2 — stacked, exercises tall-narrow centering
+    (128, 64),   # 2x2 — block, icon scaling / vertical centering
+    (256, 32),   # 4x1 — long strip, wide horizontal layout
+    (128, 96),   # 2x3 — tall, exercises vertical overflow
+    (256, 128),  # 4x4 — large block, both dimensions big at once
+]
+
+# Backwards-compatible alias. Prefer DEFAULT_TEST_SIZES in new code — the old
+# name implied these were the only valid panel sizes, which they are not.
+SUPPORTED_SIZES = DEFAULT_TEST_SIZES
+
+
+def size_label(width: int, height: int) -> str:
+    """Human/path-friendly label for a size, e.g. '128x32'."""
+    return f"{width}x{height}"
+
+
+def parse_size_token(token: str) -> Tuple[int, int]:
+    """Parse a single 'WxH' token into an (int, int) pair.
+
+    Raises ValueError (with a user-friendly message) on malformed input so
+    callers can surface it however they like.
+    """
+    cleaned = token.strip().lower()
+    if "x" not in cleaned:
+        raise ValueError(f"Invalid size '{token}' (expected WxH, e.g. 128x32)")
+    w, h = cleaned.split("x", 1)
+    try:
+        width, height = int(w), int(h)
+    except ValueError as exc:
+        raise ValueError(
+            f"Invalid size '{token}' (expected numeric WxH, e.g. 128x32)"
+        ) from exc
+    if width <= 0 or height <= 0:
+        raise ValueError(
+            f"Invalid size '{token}' (width and height must be positive, e.g. 128x32)"
+        )
+    return (width, height)
+
+
+def coerce_sizes(
+    value: Union[str, Iterable[Sequence[int]], None]
+) -> Optional[List[Tuple[int, int]]]:
+    """Normalize a size spec into a list of (w, h) tuples, or None if empty.
+
+    Accepts a comma-separated 'WxH,WxH' string (CLI / env var) or an iterable
+    of [w, h] / (w, h) pairs (harness.json). Returns None when value is falsy
+    so callers can fall back to the default sample.
+    """
+    if not value:
+        return None
+    if isinstance(value, str):
+        return [parse_size_token(tok) for tok in value.split(",") if tok.strip()]
+    sizes: List[Tuple[int, int]] = []
+    for pair in value:
+        w, h = pair  # raises if not a 2-element sequence
+        width, height = int(w), int(h)
+        if width <= 0 or height <= 0:
+            raise ValueError(f"Invalid size pair {pair!r} (width and height must be positive)")
+        sizes.append((width, height))
+    return sizes or None
+
+
+def resolve_test_sizes(
+    spec_sizes: Union[str, Iterable[Sequence[int]], None] = None,
+) -> List[Tuple[int, int]]:
+    """Decide which sizes to render, by precedence:
+
+    1. LEDMATRIX_TEST_SIZES env var — a global "test on my hardware" override
+       that wins for every plugin.
+    2. spec_sizes — e.g. a per-plugin harness.json "sizes" list.
+    3. DEFAULT_TEST_SIZES — the representative sample.
+    """
+    env = coerce_sizes(os.environ.get("LEDMATRIX_TEST_SIZES"))
+    if env:
+        return env
+    spec = coerce_sizes(spec_sizes)
+    if spec:
+        return spec
+    return list(DEFAULT_TEST_SIZES)
+
+
+def safe_mode_filename(mode: str) -> str:
+    """A filesystem-safe basename for a plugin mode.
+
+    Mode names come from plugin metadata/render state, so a value containing
+    '/' or '..' could otherwise escape the intended output directory. Collapse
+    anything that isn't alphanumeric / dash / underscore to '_'.
+    """
+    cleaned = "".join(ch if ch.isalnum() or ch in ("-", "_") else "_" for ch in mode)
+    return cleaned or "mode"
diff --git a/test/plugins/test_harness.py b/test/plugins/test_harness.py
new file mode 100644
index 00000000..11dec7e7
--- /dev/null
+++ b/test/plugins/test_harness.py
@@ -0,0 +1,182 @@
+"""
+Unit tests for the plugin safety harness primitives:
+bounds detection, image comparison, and mode enumeration.
+
+These don't load real plugins, so they run anywhere (including core CI where
+plugin-repos is empty).
+"""
+
+import pytest
+from PIL import Image
+
+from src.plugin_system.testing.bounds_display_manager import BoundsCheckingDisplayManager
+from src.plugin_system.testing.harness import (
+    _TOLERATED_UPDATE_ERRORS, compare_images, list_modes,
+)
+from src.plugin_system.testing.sizes import (
+    DEFAULT_TEST_SIZES, coerce_sizes, parse_size_token, resolve_test_sizes,
+)
+
+
+class TestBoundsDetection:
+    def test_reports_declared_size_not_canvas_size(self):
+        dm = BoundsCheckingDisplayManager(width=64, height=32)
+        assert dm.width == 64 and dm.height == 32
+        assert dm.matrix.width == 64 and dm.matrix.height == 32
+        # Backing canvas is padded out past the declared panel so far-overshoot
+        # coordinates land on-canvas and get flagged instead of clipped.
+        canvas_w, canvas_h = dm.image.size
+        assert canvas_w > 64 and canvas_h > 32
+
+    def test_far_overshoot_on_small_panel_is_detected(self):
+        # A coordinate meant for a wide build (x past 64) must still be caught
+        # when the declared panel is only 64 wide.
+        dm = BoundsCheckingDisplayManager(width=64, height=32)
+        dm.draw.rectangle([200, 5, 210, 10], fill=(255, 0, 0))
+        bbox = dm.check_overflow()
+        assert bbox is not None
+        assert bbox[0] >= 64
+
+    def test_in_bounds_drawing_has_no_overflow(self):
+        dm = BoundsCheckingDisplayManager(width=64, height=32)
+        dm.draw.rectangle([0, 0, 63, 31], fill=(255, 255, 255))
+        assert dm.check_overflow() is None
+
+    def test_right_overflow_is_detected(self):
+        dm = BoundsCheckingDisplayManager(width=64, height=32)
+        # Draw a few pixels past the right edge.
+        dm.draw.rectangle([60, 5, 70, 10], fill=(255, 0, 0))
+        bbox = dm.check_overflow()
+        assert bbox is not None
+        assert bbox[0] >= 64  # overflow starts at or past the declared width
+
+    def test_bottom_overflow_is_detected(self):
+        dm = BoundsCheckingDisplayManager(width=64, height=32)
+        dm.draw.rectangle([5, 30, 10, 40], fill=(0, 255, 0))
+        bbox = dm.check_overflow()
+        assert bbox is not None
+        assert bbox[3] > 32  # overflow extends past the declared height
+
+    def test_declared_image_is_cropped_to_panel(self):
+        dm = BoundsCheckingDisplayManager(width=64, height=32)
+        assert dm.get_image().size == (64, 32)
+
+    def test_snapshot_saves_cropped_panel(self, tmp_path):
+        dm = BoundsCheckingDisplayManager(width=128, height=32)
+        out = tmp_path / "snap.png"
+        dm.save_snapshot(str(out))
+        with Image.open(out) as img:
+            assert img.size == (128, 32)
+
+
+class TestArbitraryPanelSizes:
+    """The harness must handle any panel shape, not a fixed supported list."""
+
+    def test_overflow_extent_pads_to_largest_in_run(self):
+        # A wide run (extent 256) means content at x=200 on a 64-wide panel is
+        # caught; the same draw with a small extent would be clipped (false pass).
+        wide = BoundsCheckingDisplayManager(width=64, height=32, overflow_extent=(256, 32))
+        wide.draw.rectangle([200, 5, 210, 10], fill=(255, 0, 0))
+        assert wide.check_overflow() is not None
+
+        tight = BoundsCheckingDisplayManager(width=64, height=32, overflow_extent=(64, 32))
+        tight.draw.rectangle([200, 5, 210, 10], fill=(255, 0, 0))
+        assert tight.check_overflow() is None  # clipped beyond the small canvas
+
+    def test_unusual_shapes_report_their_declared_size(self):
+        for w, h in [(8, 2), (6, 6), (200, 8), (64, 96)]:
+            dm = BoundsCheckingDisplayManager(width=w, height=h)
+            assert dm.width == w and dm.height == h
+            assert dm.matrix.width == w and dm.matrix.height == h
+
+
+class TestUpdateErrorClassification:
+    """update() may fail for lack of network (tolerated) but a logic bug must
+    not pass green just because display() survives."""
+
+    def test_connectivity_errors_are_tolerated(self):
+        import socket
+        import urllib.error
+        for exc in (ConnectionError("x"), TimeoutError("x"), socket.gaierror("x"),
+                    urllib.error.URLError("x")):
+            assert isinstance(exc, _TOLERATED_UPDATE_ERRORS)
+
+    def test_logic_errors_are_not_tolerated(self):
+        for exc in (ValueError("x"), KeyError("x"), AttributeError("x"), TypeError("x")):
+            assert not isinstance(exc, _TOLERATED_UPDATE_ERRORS)
+
+
+class TestSizeParsing:
+    def test_parse_size_token_ok(self):
+        assert parse_size_token(" 128X32 ") == (128, 32)
+
+    def test_parse_size_token_rejects_garbage(self):
+        with pytest.raises(ValueError):
+            parse_size_token("128xabc")
+        with pytest.raises(ValueError):
+            parse_size_token("128-32")
+
+    def test_rejects_non_positive_dimensions(self):
+        for bad in ("0x32", "-64x32", "64x0", "64x-1"):
+            with pytest.raises(ValueError):
+                parse_size_token(bad)
+        with pytest.raises(ValueError):
+            coerce_sizes([[0, 32]])
+        with pytest.raises(ValueError):
+            coerce_sizes("64x-1")
+
+    def test_coerce_sizes_from_string_and_pairs(self):
+        assert coerce_sizes("8x16,64x64") == [(8, 16), (64, 64)]
+        assert coerce_sizes([[8, 16], (64, 64)]) == [(8, 16), (64, 64)]
+        assert coerce_sizes(None) is None
+        assert coerce_sizes("") is None
+
+    def test_resolve_precedence_env_then_spec_then_default(self, monkeypatch):
+        monkeypatch.delenv("LEDMATRIX_TEST_SIZES", raising=False)
+        assert resolve_test_sizes(None) == list(DEFAULT_TEST_SIZES)
+        assert resolve_test_sizes([[8, 16]]) == [(8, 16)]
+        monkeypatch.setenv("LEDMATRIX_TEST_SIZES", "5x5")
+        # env wins over a per-plugin spec
+        assert resolve_test_sizes([[8, 16]]) == [(5, 5)]
+
+
+class TestCompareImages:
+    def test_identical_images_match(self):
+        a = Image.new("RGB", (16, 16), (10, 20, 30))
+        b = a.copy()
+        ok, diff_pixels, max_delta = compare_images(a, b)
+        assert ok and diff_pixels == 0 and max_delta == 0
+
+    def test_different_images_fail_at_zero_tolerance(self):
+        a = Image.new("RGB", (16, 16), (0, 0, 0))
+        b = a.copy()
+        b.putpixel((1, 1), (255, 255, 255))
+        ok, diff_pixels, max_delta = compare_images(a, b)
+        assert not ok and diff_pixels == 1 and max_delta == 255
+
+    def test_tolerance_absorbs_small_noise(self):
+        a = Image.new("RGB", (16, 16), (100, 100, 100))
+        b = a.copy()
+        b.putpixel((2, 2), (103, 100, 100))  # delta 3
+        ok, _, max_delta = compare_images(a, b, max_delta=5, max_diff_pixels=0)
+        assert ok and max_delta == 3
+
+    def test_size_mismatch_fails(self):
+        a = Image.new("RGB", (16, 16))
+        b = Image.new("RGB", (32, 16))
+        ok, _, _ = compare_images(a, b)
+        assert not ok
+
+
+class TestListModes:
+    def test_instance_modes_take_precedence(self):
+        inst = type("P", (), {"modes": ["a", "b"]})()
+        assert list_modes(inst, {"display_modes": ["x"]}, "pid") == ["a", "b"]
+
+    def test_falls_back_to_manifest_display_modes(self):
+        inst = type("P", (), {})()
+        assert list_modes(inst, {"display_modes": ["x", "y"]}, "pid") == ["x", "y"]
+
+    def test_falls_back_to_plugin_id(self):
+        inst = type("P", (), {})()
+        assert list_modes(inst, {}, "pid") == ["pid"]
diff --git a/test/plugins/test_plugin_matrix.py b/test/plugins/test_plugin_matrix.py
new file mode 100644
index 00000000..788754fa
--- /dev/null
+++ b/test/plugins/test_plugin_matrix.py
@@ -0,0 +1,115 @@
+"""
+Cross-size / cross-screen plugin safety test.
+
+For every discovered plugin, render every declared screen at every supported
+matrix size and assert it: loads, renders without crashing, stays within the
+panel bounds, and — for plugins that ship golden images — matches them.
+
+Plugin discovery (first match wins):
+  - $LEDMATRIX_PLUGINS_DIR  (os.pathsep-separated list of dirs), else
+  - <project_root>/plugin-repos and <project_root>/plugins
+
+A plugin opts into golden-image checks by adding test/golden/<WxH>/<mode>.png
+(and usually test/harness.json for deterministic config / mock data / time).
+"""
+
+import os
+from pathlib import Path
+from typing import Dict, List
+
+import pytest
+
+from src.plugin_system.testing.harness import (
+    render_plugin_matrix, compare_to_goldens,
+)
+from src.plugin_system.testing.loading import load_config_defaults, load_harness_spec
+from src.plugin_system.testing.sizes import resolve_test_sizes
+
+PROJECT_ROOT = Path(__file__).resolve().parents[2]
+
+# Set LEDMATRIX_REQUIRE_PLUGINS=1 in any CI/hardware pipeline where plugins are
+# expected to be present, so a discovery drift (empty search path) fails loudly
+# instead of silently skipping and losing this safety signal.
+_REQUIRE_PLUGINS = os.environ.get("LEDMATRIX_REQUIRE_PLUGINS") == "1"
+
+
+def _plugin_search_dirs() -> List[Path]:
+    env = os.environ.get("LEDMATRIX_PLUGINS_DIR")
+    if env:
+        return [Path(p) for p in env.split(os.pathsep) if p]
+    return [PROJECT_ROOT / "plugin-repos", PROJECT_ROOT / "plugins"]
+
+
+def _discover() -> Dict[str, Path]:
+    """Map plugin_id -> plugin_dir for all plugins on the search path."""
+    found: Dict[str, Path] = {}
+    for base in _plugin_search_dirs():
+        if not base.exists():
+            continue
+        for child in sorted(base.iterdir()):
+            if (child / "manifest.json").exists() and child.name not in found:
+                found[child.name] = child
+    return found
+
+
+_PLUGINS = _discover()
+
+
+@pytest.mark.plugin
+def test_plugins_were_discovered() -> None:
+    """Guard against silently skipping the whole matrix when discovery drifts.
+
+    Local dev and the plugin-less core CI legitimately have no plugins, so we
+    skip there; but when LEDMATRIX_REQUIRE_PLUGINS=1 an empty search path is a
+    hard failure rather than a green no-op.
+    """
+    if _PLUGINS:
+        return
+    search = [str(p) for p in _plugin_search_dirs()]
+    if _REQUIRE_PLUGINS:
+        pytest.fail(
+            "LEDMATRIX_REQUIRE_PLUGINS=1 but no plugins were discovered on the "
+            f"search path: {search}"
+        )
+    pytest.skip(f"no plugins found on the search path: {search}")
+
+
+@pytest.mark.plugin
+@pytest.mark.skipif(not _PLUGINS, reason="no plugins found on the search path")
+@pytest.mark.parametrize("plugin_id", sorted(_PLUGINS))
+def test_plugin_renders_across_sizes_and_screens(plugin_id: str) -> None:
+    plugin_dir = _PLUGINS[plugin_id]
+    spec = load_harness_spec(plugin_dir)
+
+    config = {"enabled": True}
+    config.update(load_config_defaults(plugin_dir))
+    config.update(spec.get("config", {}))
+
+    # Sizes: LEDMATRIX_TEST_SIZES env (test on real hardware) wins, then the
+    # plugin's own harness.json "sizes", else the default representative sample.
+    sizes = resolve_test_sizes(spec.get("sizes"))
+
+    results = render_plugin_matrix(
+        plugin_id=plugin_id,
+        plugin_dir=plugin_dir,
+        config=config,
+        mock_data=spec.get("mock_data_contents", {}),
+        sizes=sizes,
+        run_update=not spec.get("skip_update", False),
+        freeze_time=spec.get("freeze_time"),
+    )
+    compare_to_goldens(results, plugin_dir / "test" / "golden")
+
+    failures = []
+    for r in results:
+        if r.error is not None:
+            failures.append(f"{r.size_label} {r.mode}: crashed: {r.error}")
+        elif r.overflow is not None:
+            failures.append(f"{r.size_label} {r.mode}: overflow past panel bbox={r.overflow}")
+        elif r.golden_checked and r.golden_ok is False:
+            failures.append(
+                f"{r.size_label} {r.mode}: golden drift {r.golden_diff_pixels}px "
+                f"(max Δ={r.golden_max_delta})"
+            )
+
+    assert not failures, f"{plugin_id} failed:\n  " + "\n  ".join(failures)