Files
LEDMatrix/src/plugin_system
Chuck 65e3e8319b fix(plugin_manager): prevent permanent ERROR state after update timeout (#316)
* fix(plugin_manager): prevent permanent ERROR state after update timeout

When execute_update() fails (timeout or unhandled exception), the plugin
state was set to ERROR with no recovery path. can_execute() returns False
for ERROR state, so the plugin's update() was never called again, leaving
it showing stale data indefinitely.

Instead, update plugin_last_update so the plugin waits one configured
interval before retrying, and keep the state ENABLED so recovery is
automatic on the next cycle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(plugin_manager): address PR review — failure timestamp and error context

- Use time.time() at the point of failure instead of reusing current_time
  (captured before execution), so the full retry interval always elapses
  after a timeout rather than one execution-duration shorter

- Add PluginStateManager.set_error_info() to persist structured error context
  without changing plugin state; call it in both failure branches so
  get_error_info() / get_state_info() surface recoverable errors alongside
  ERROR-state errors

- Add warning log on the success=False branch (was previously silent)

- Pass a descriptive Exception (not a generic "Plugin execution failed") to
  health_tracker.record_failure() in the timeout/executor-error path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(plugin_manager): atomic state+error write via set_state_with_error

The two-step set_state() / set_error_info() sequence left a window where
readers could observe ENABLED state without the accompanying error context.

Add threading.RLock to PluginStateManager and a new set_state_with_error()
method that holds the lock for both the state-transition write and the
_error_info write together. The method inlines the state-transition logic
rather than calling set_state() internally to intentionally skip the
"clear _error_info for non-ERROR states" side effect — the recoverable
error dict is exactly what we want stored.

Replace both paired set_state / set_error_info call sites in
run_scheduled_updates() with the single atomic method.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(plugin_state): lock _error_info accesses and store defensive copies

Three verified issues:

- set_error_info wrote _error_info without holding _lock and stored the
  caller's dict by reference, allowing races and post-write mutation
- set_state_with_error stored error_info by reference (lock was already held)
- get_error_info read _error_info without _lock and returned the live
  reference, letting callers mutate the stored snapshot

Implicit fourth fix: set_state also wrote _error_info without _lock; locking
get_error_info while leaving that writer unguarded would have created a new
race, so set_state is now wrapped in _lock too for consistency.

Changes:
- set_state: wrap entire body in self._lock (covers _states, _state_history,
  and _error_info writes atomically; ERROR-path _error_info value was already
  a fresh dict literal so no copy needed)
- set_error_info: acquire self._lock + store dict(error_info) shallow copy
- set_state_with_error: store dict(error_info) shallow copy (lock already held)
- get_error_info: acquire self._lock + return dict(info) copy or None

All stored values are flat dicts of strings/floats/bools, so shallow copy
is sufficient — deepcopy is not needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(plugin_manager): apply recovery logic to update_all_plugins; extract helper

update_all_plugins still set PluginState.ERROR on both failure paths, leaving
it inconsistent with the run_scheduled_updates fix from the same PR.

Extract _record_update_failure(plugin_id, exc=None) to hold all shared failure
logic: capture actual failure time, build structured error_info, log the retry
warning, stamp plugin_last_update, call set_state_with_error(ENABLED), and
forward to health_tracker. Replace all four failure sites (two in
run_scheduled_updates, two in update_all_plugins) with calls to this helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Chuck <chuck@example.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:22:12 -04:00
..
2025-12-27 14:15:49 -05:00
2025-12-27 14:15:49 -05:00
2025-12-27 14:15:49 -05:00
2025-12-27 14:15:49 -05:00
2025-12-27 14:15:49 -05:00
2025-12-27 14:15:49 -05:00
2025-12-27 14:15:49 -05:00
2025-12-27 14:15:49 -05:00