mirror of
https://github.com/ChuckBuilds/LEDMatrix.git
synced 2026-04-10 13:02:59 +00:00
fix(plugins): stop reconciliation install loop, slow plugin list, uninstall resurrection (#309)
* fix(plugins): stop reconciliation install loop, slow plugin list, and uninstall resurrection Three interacting bugs reported by a user (Discord/ericepe) on a fresh install: 1. The state reconciler retried failed auto-repairs on every HTTP request, pegging CPU and flooding logs with "Plugin not found in registry: github / youtube". Root cause: ``_run_startup_reconciliation`` reset ``_reconciliation_started`` to False on any unresolved inconsistency, so ``@app.before_request`` re-fired the entire pass on the next request. Fix: run reconciliation exactly once per process; cache per-plugin unrecoverable failures inside the reconciler so even an explicit re-trigger stays cheap; add a registry pre-check to skip the expensive GitHub fetch when we already know the plugin is missing; expose ``force=True`` on ``/plugins/state/reconcile`` so users can retry after fixing the underlying issue. 2. Uninstalling a plugin via the UI succeeded but the plugin reappeared. Root cause: a race between ``store_manager.uninstall_plugin`` (removes files) and ``cleanup_plugin_config`` (removes config entry) — if reconciliation fired in the gap it saw "config entry with no files" and reinstalled. Fix: reorder uninstall to clean config FIRST, drop a short-lived "recently uninstalled" tombstone on the store manager that the reconciler honors, and pass ``store_manager`` to the manual ``/plugins/state/reconcile`` endpoint (it was previously omitted, which silently disabled auto-repair entirely). 3. ``GET /plugins/installed`` was very slow on a Pi4 (UI hung on "connecting to display" for minutes, ~98% CPU). Root causes: per-request ``discover_plugins()`` + manifest re-read + four ``git`` subprocesses per plugin (``rev-parse``, ``--abbrev-ref``, ``config``, ``log``). Fix: mtime-gate ``discover_plugins()`` and drop the per-plugin manifest re-read in the endpoint; cache ``_get_local_git_info`` keyed on ``.git/HEAD`` mtime so subprocesses only run when the working copy actually moved; bump registry cache TTL from 5 to 15 minutes and fall back to stale cache on transient network failure. Tests: 16 reconciliation cases (including 5 new ones covering the unrecoverable cache, force-reconcile path, transient-failure handling, and recently-uninstalled tombstone) and 8 new store_manager cache tests covering tombstone TTL, git-info mtime cache hit/miss, and the registry stale-cache fallback. All 24 pass; the broader 288-test suite continues to pass with no new failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf(plugins): parallelize Plugin Store browse and extend metadata cache TTLs Follow-up to the previous commit addressing the Plugin Store browse path specifically. Most users install plugins via the store (ZIP extraction, no .git directory) so the git-info mtime cache from the previous commit was a no-op for them; their pain was coming from /plugins/store/list. Root cause. search_plugins() enriched each returned plugin with three serial GitHub fetches: _get_github_repo_info (repo API), _get_latest_commit_info (commits API), _fetch_manifest_from_github (raw.githubusercontent.com). Fifteen plugins × three requests × serial HTTP = 30–45 sequential round trips on every cold browse. On a Pi4 over WiFi that translated directly into the "connecting to display" hang users reported. The commit and manifest caches had a 5-minute TTL, so even a brief absence re-paid the full cost. Changes. - ``search_plugins``: fan out per-plugin enrichment through a ``ThreadPoolExecutor`` (max 10 workers, stays well under unauthenticated GitHub rate limits). Apply category/tag/query filters before enrichment so we never waste requests on plugins that will be filtered out. ``executor.map`` preserves input order, which the UI depends on. - ``commit_cache_timeout`` and ``manifest_cache_timeout``: 5 min → 30 min. Keeps the cache warm across a realistic session while still picking up upstream updates in a reasonable window. - ``_get_github_repo_info`` and ``_get_latest_commit_info``: stale-on-error fallback. On a network failure or a 403 we now prefer a previously- cached value over the zero-default, matching the pattern already in ``fetch_registry``. Flaky Pi WiFi no longer causes star counts to flip to 0 and commit info to disappear. Tests (5 new in test_store_manager_caches.py). - ``test_results_preserve_registry_order`` — the parallel map must still return plugins in input order. - ``test_filters_applied_before_enrichment`` — category/tag/query filters run first so we don't waste HTTP calls. - ``test_enrichment_runs_concurrently`` — peak-concurrency check plus a wall-time bound that would fail if the code regressed to serial. - ``test_repo_info_stale_on_network_error`` — repo info falls back to stale cache on RequestException. - ``test_commit_info_stale_on_network_error`` — commit info falls back to stale cache on RequestException. All 29 tests (16 reconciliation, 13 store_manager caches) pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf(plugins): drop redundant per-plugin manifest.json fetch in search_plugins Benchmarking the previous parallelization commit on a real Pi4 revealed that the 10x speedup I expected was only ~1.1x. Profiling showed two plugins (football-scoreboard, ledmatrix-flights) each spent 5 seconds inside _fetch_manifest_from_github — not on the initial HTTP call, but on the three retries in _http_get_with_retries with exponential backoff after transient DNS failures. Even with the thread pool, those 5-second tail latencies stayed in the wave and dominated wall time. The per-plugin manifest fetch in search_plugins is redundant anyway. The registry's plugins.json already carries ``description`` (it is generated from each plugin's manifest by update_registry.py at release time), and ``last_updated`` is filled in from the commit info that we already fetch in the same loop. Dropping the manifest fetch eliminates one of the three per-plugin HTTPS round trips entirely, which also eliminates the DNS-retry tail. The _fetch_manifest_from_github helper itself is preserved — it is still used by the install path. Tests unchanged (the search_plugins tests mock all three helpers and still pass); this drop only affects the hot-path call sequence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: lock down install/update/uninstall invariants Regression guard for the caching and tombstone changes in this PR: - ``install_plugin`` must not be gated by the uninstall tombstone. The tombstone only exists to keep the state reconciler from resurrecting a freshly-uninstalled plugin; explicit user-initiated installs via the store UI go straight to ``install_plugin()`` and must never be blocked. Test: mark a plugin recently uninstalled, stub out the download, call ``install_plugin``, and assert the download step was reached. - ``get_plugin_info(force_refresh=True)`` must forward force_refresh through to both ``_get_latest_commit_info`` and ``_fetch_manifest_from_github``, so that install_plugin and update_plugin (both of which call get_plugin_info with force_refresh=True) continue to bypass the 30-min cache TTLs introduced inc03eb8db. Without this, bumping the commit cache TTL could cause users to install or update to a commit older than what GitHub actually has. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(plugins): address review findings — transactional uninstall, registry error propagation, payload hardening Three real bugs surfaced by review, plus one nitpick. Each was verified against the current code before fixing. 1. fetch_registry silently swallowed network errors, breaking the reconciler (CONFIRMED BUG). The stale-cache fallback I added inc03eb8dbmade fetch_registry return {"plugins": []} on network failure when no cache existed — which is exactly the state on a fresh boot with flaky WiFi. The reconciler's _auto_repair_missing_plugin code assumed an exception meant "transient, don't mark unrecoverable" and expected to never see a silent empty-dict result. With the silent fallback in place on a fresh boot, it would see "no candidates in registry" and mark every config-referenced plugin permanently unrecoverable. Fix: add ``raise_on_failure: bool = False`` to fetch_registry. UI callers keep the stale-cache-fallback default. The reconciler's _auto_repair_missing_plugin now calls it with raise_on_failure=True so it can distinguish a genuine registry miss from a network error. 2. Uninstall was not transactional (CONFIRMED BUG). Two distinct failure modes silently left the system in an inconsistent state: (a) If ``cleanup_plugin_config`` raised, the code logged a warning and proceeded to delete files anyway, leaving an orphan install with no config entry. (b) If ``uninstall_plugin`` returned False or raised AFTER cleanup had already succeeded, the config was gone but the files were still on disk — another orphan state. Fix: introduce ``_do_transactional_uninstall`` shared by both the queue and direct paths. Flow: - snapshot plugin's entries in main config + secrets - cleanup_plugin_config; on failure, ABORT before touching files - uninstall_plugin; on failure, RESTORE the snapshot, then raise Both queue and direct endpoints now delegate to this helper and surface clean errors to the user instead of proceeding past failure. 3. /plugins/state/reconcile crashed on non-object JSON bodies (CONFIRMED BUG). The previous code did ``payload.get('force', False)`` after ``request.get_json(silent=True) or {}``. If a client sent a bare string or array as the JSON body, payload would be that string or list and .get() would raise AttributeError. Separately, ``bool("false")`` is True, so string-encoded booleans were mis-handled. Fix: guard ``isinstance(payload, dict)`` and route the value through the existing ``_coerce_to_bool`` helper. 4. Nitpick: use ``assert_called_once_with`` in test_force_reconcile_clears_unrecoverable_cache. The existing test worked in practice (we call reset_mock right before) but the stricter assertion catches any future regression where force=True might double-fire the install. Tests added (19 new, 48 total passing): - TestFetchRegistryRaiseOnFailure (4): flag propagates both RequestException and JSONDecodeError, wins over stale cache, and the default behavior is unchanged for existing callers. - test_real_store_manager_empty_registry_on_network_failure (1): the key regression test — uses the REAL PluginStoreManager (not a Mock) with ConnectionError at the HTTP helper layer, and verifies the reconciler does NOT poison _unrecoverable_missing_on_disk. - TestTransactionalUninstall (4): cleanup failure aborts before file removal; file removal failure (both False return and raise) restores the config snapshot; happy path still succeeds. - TestReconcileEndpointPayload (8): bare string / array / null JSON bodies, missing force key, boolean true/false, and string-encoded "true"/"false" all handled correctly. All 342 tests in the broader sweep still pass (2 pre-existing TestDottedKeyNormalization failures reproduce on main and are unrelated). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: address review nitpicks in store_manager + test Four small cleanups, each verified against current code: 1. ``_git_info_cache`` type annotation was ``Dict[str, tuple]`` — too loose. Tightened to ``Dict[str, Tuple[float, Dict[str, str]]]`` to match what ``_get_local_git_info`` actually stores (mtime + the sha/short_sha/branch/... dict it returns). Added ``Tuple`` to the typing imports. 2. The ``search_plugins`` early-return condition ``if len(filtered) == 1 or not fetch_commit_info and len(filtered) < 4`` parses correctly under Python's precedence (``and`` > ``or``) but is visually ambiguous. Added explicit parentheses to make the intent — "single plugin, OR small batch that doesn't need commit info" — obvious at a glance. Semantics unchanged. 3. Replaced a Unicode multiplication sign (×) with ASCII 'x' in the commit_cache_timeout comment. 4. Removed a dead ``concurrent_workers = []`` declaration from ``test_enrichment_runs_concurrently``. It was left over from an earlier sketch of the concurrency check — the final test uses only ``peak_lock`` and ``peak``. All 48 tests still pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(plugins): address second review pass — cache correctness and rollback Verified each finding against the current code. All four inline issues were real bugs; nitpicks 5-7 were valid improvements. 1. _get_latest_commit_info overwrote a good cached value with None on all-branches-404 (CONFIRMED BUG). The final line of the branch loop unconditionally wrote ``self.commit_info_cache[cache_key] = (time.time(), None)``, which clobbered any previously-good entry on a single transient failure (e.g. an odd 5xx, a temporary DNS hiccup during the branches_to_try loop). Fix: if there's already a good prior value, bump its timestamp into the backoff window and return it instead. Only cache None when we never had a good value. 2. _get_local_git_info cache did not invalidate on fast-forward (CONFIRMED BUG). Caching on ``.git/HEAD`` mtime alone is wrong: a ``git pull`` that fast-forwards the current branch updates ``.git/refs/heads/<branch>`` (or packed-refs) but leaves HEAD's contents and mtime untouched. The cache would then serve a stale SHA indefinitely. Fix: introduce ``_git_cache_signature`` which reads HEAD contents, resolves ``ref: refs/heads/<name>`` to the corresponding loose ref file, and builds a signature tuple of (head_contents, head_mtime, resolved_ref_mtime, packed_refs_mtime). A fast-forward bumps the ref file's mtime, which invalidates the signature and re-runs git. 3. test_install_plugin_is_not_blocked_by_tombstone swallowed all exceptions (CONFIRMED BUG in test). ``try: self.sm.install_plugin("bar") except Exception: pass`` could hide a real regression in install_plugin that happens to raise. Fix: the test now writes a COMPLETE valid manifest stub (id, name, class_name, display_modes, entry_point) and stubs _install_dependencies, so install_plugin runs all the way through and returns True. The assertion is now ``assertTrue(result)`` with no exception handling. 4. Uninstall rollback missed unload/reload (CONFIRMED BUG). Previous flow: cleanup → unload (outside try/except) → uninstall → rollback config on failure. Problem: if ``unload_plugin`` raised, the exception propagated without restoring config. And if ``uninstall_plugin`` failed after a successful unload, the rollback restored config but left the plugin unloaded at runtime — inconsistent. Fix: record ``was_loaded`` before touching runtime state, wrap ``unload_plugin`` in the same try/except that covers ``uninstall_plugin``, and on any failure call a ``_rollback`` local that (a) restores the config snapshot and (b) calls ``load_plugin`` to reload the plugin if it was loaded before we touched it. 5. Nitpick: ``_unrecoverable_missing_on_disk: set`` → ``Set[str]``. Matches the existing ``Dict``/``List`` style in state_reconciliation.py. 6. Nitpick: stale-cache fallbacks in _get_github_repo_info and _get_latest_commit_info now bump the cached entry's timestamp by a 60s failure backoff. Without this, a cache entry whose TTL just expired would cause every subsequent request to re-hit the network until it came back, amplifying the failure. Introduced ``_record_cache_backoff`` helper and applied it consistently. 7. Nitpick: replaced the flaky wall-time assertion in test_enrichment_runs_concurrently with just the deterministic ``peak["count"] >= 2`` signal. ``peak["count"]`` can only exceed 1 if two workers were inside the critical section simultaneously, which is definitive proof of parallelism. The wall-time check was tight enough (<200ms) to occasionally fail on CI / low-power boxes. Tests (6 new, 54 total passing): - test_cache_invalidates_on_fast_forward_of_current_branch: builds a loose-ref layout under a temp .git/, verifies a first call populates the cache, a second call with unchanged state hits the cache, and a simulated fast-forward (overwriting ``.git/refs/heads/main`` with a new SHA and mtime) correctly re-runs git. - test_commit_info_preserves_good_cache_on_all_branches_404: seeds a good cached entry, mocks requests.get to always return 404, and verifies the cache still contains the good value afterwards. - test_repo_info_stale_bumps_timestamp_into_backoff: seeds an expired cache, triggers a ConnectionError, then verifies a second lookup does NOT re-hit the network (proves the timestamp bump happened). - test_repo_info_stale_on_403_also_backs_off: same for the 403 path. - test_file_removal_failure_reloads_previously_loaded_plugin: plugin starts loaded, uninstall_plugin returns False, asserts load_plugin was called during rollback. - test_unload_failure_restores_config_and_does_not_call_uninstall: unload_plugin raises, asserts uninstall_plugin was never called AND config was restored AND load_plugin was NOT called (runtime state never changed, so no reload needed). Broader test sweep: 348/348 pass (2 pre-existing TestDottedKeyNormalization failures reproduce on main, unrelated). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(plugins): address third review pass — cache signatures, backoff, isolation All four findings verified as real issues against the current code. 1. _git_cache_signature was missing .git/config (CONFIRMED GAP). The cached ``result`` dict from _get_local_git_info includes ``remote_url``, which is read from ``.git/config``. But the cache signature only tracked HEAD + refs — so a config-only change (e.g. ``git remote set-url origin https://...``) would leave the stale URL cached indefinitely. This matters for the monorepo-migration detection in update_plugin. Fix: add ``config_contents`` and ``config_mtime`` to the signature tuple. Config reads use the same OSError-guarded pattern as the HEAD read. 2. fetch_registry stale fallback didn't bump registry_cache_time (CONFIRMED BUG). The other caches already had the failure-backoff pattern added in the previous review pass (via ``_record_cache_backoff``), but the registry cache's stale-fallback branches silently returned the cached payload without updating ``registry_cache_time``. Next request saw the same expired TTL, re-hit the network, failed again — amplifying the original transient failure. Fix: bump ``self.registry_cache_time`` forward by the existing ``self._failure_backoff_seconds`` (reused — no new constant needed) in both the RequestException and JSONDecodeError stale branches. Kept the ``raise_on_failure=True`` path untouched so the reconciler still gets the exception. 3. _make_client() in the uninstall/reconcile test helper leaked MagicMocks into the api_v3 singleton (CONFIRMED RISK). Every test call replaced api_v3.config_manager, .plugin_manager, .plugin_store_manager, etc. with MagicMocks and never restored them. If any later test in the same pytest run imported api_v3 expecting original state (or None), it would see the leftover mocks. Fix: _make_client now snapshots the original attributes (with a sentinel to distinguish "didn't exist" from "was None") and returns a cleanup callable. Both setUp methods call self.addCleanup(cleanup) so state is restored even if the test raises. On cleanup, sentinel entries trigger delattr rather than setattr to preserve the "attribute was never set" case. 4. Snapshot helpers used broad ``except Exception`` (CONFIRMED). _snapshot_plugin_config caught any exception from get_raw_file_content, which could hide programmer errors (TypeError, AttributeError) behind the "best-effort snapshot" fallback. The legitimate failure modes are filesystem errors (covered by OSError; FileNotFoundError is a subclass, IOError is an alias in Python 3) and ConfigError (what config_manager wraps all load failures in). Fix: narrow to ``(OSError, ConfigError)`` in both snapshot blocks. ConfigError was already imported at line 20 of api_v3.py. Tests added (4 new, 58 total passing): - test_cache_invalidates_on_git_config_change: builds a realistic loose-ref layout, writes .git/config with an "old" remote URL, exercises _get_local_git_info, then rewrites .git/config with a "new" remote URL + new mtime, calls again, and asserts the cache invalidated and returned the new URL. - test_stale_fallback_bumps_timestamp_into_backoff: seeds an expired registry cache, triggers ConnectionError, verifies first call serves stale, then asserts a second call makes ZERO new HTTP requests (proves registry_cache_time was bumped forward). - test_snapshot_survives_config_read_error: raises ConfigError from get_raw_file_content and asserts the uninstall still completes successfully — the narrow exception list still catches this case. - test_snapshot_does_not_swallow_programmer_errors: raises a TypeError from get_raw_file_content (not in the narrow list) and asserts it propagates up to a 500, AND that uninstall_plugin was never called (proves the exception was caught at the right level). Broader test sweep: 352/352 pass (2 pre-existing TestDottedKeyNormalization failures reproduce on main, unrelated). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Chuck <chuck@example.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -667,8 +667,20 @@ import threading as _threading
|
||||
_reconciliation_lock = _threading.Lock()
|
||||
|
||||
def _run_startup_reconciliation() -> None:
|
||||
"""Run state reconciliation in background to auto-repair missing plugins."""
|
||||
global _reconciliation_done, _reconciliation_started
|
||||
"""Run state reconciliation in background to auto-repair missing plugins.
|
||||
|
||||
Reconciliation runs exactly once per process lifetime, regardless of
|
||||
whether every inconsistency could be auto-fixed. Previously, a failed
|
||||
auto-repair (e.g. a config entry referencing a plugin that no longer
|
||||
exists in the registry) would reset ``_reconciliation_started`` to False,
|
||||
causing the ``@app.before_request`` hook to re-trigger reconciliation on
|
||||
every single HTTP request — an infinite install-retry loop that pegged
|
||||
the CPU and flooded the log. Unresolved issues are now left in place for
|
||||
the user to address via the UI; the reconciler itself also caches
|
||||
per-plugin unrecoverable failures internally so repeated reconcile calls
|
||||
stay cheap.
|
||||
"""
|
||||
global _reconciliation_done
|
||||
from src.logging_config import get_logger
|
||||
_logger = get_logger('reconciliation')
|
||||
|
||||
@@ -684,18 +696,22 @@ def _run_startup_reconciliation() -> None:
|
||||
result = reconciler.reconcile_state()
|
||||
if result.inconsistencies_found:
|
||||
_logger.info("[Reconciliation] %s", result.message)
|
||||
if result.reconciliation_successful:
|
||||
if result.inconsistencies_fixed:
|
||||
plugin_manager.discover_plugins()
|
||||
_reconciliation_done = True
|
||||
else:
|
||||
_logger.warning("[Reconciliation] Finished with unresolved issues, will retry")
|
||||
with _reconciliation_lock:
|
||||
_reconciliation_started = False
|
||||
if result.inconsistencies_fixed:
|
||||
plugin_manager.discover_plugins()
|
||||
if not result.reconciliation_successful:
|
||||
_logger.warning(
|
||||
"[Reconciliation] Finished with %d unresolved issue(s); "
|
||||
"will not retry automatically. Use the Plugin Store or the "
|
||||
"manual 'Reconcile' action to resolve.",
|
||||
len(result.inconsistencies_manual),
|
||||
)
|
||||
except Exception as e:
|
||||
_logger.error("[Reconciliation] Error: %s", e, exc_info=True)
|
||||
with _reconciliation_lock:
|
||||
_reconciliation_started = False
|
||||
finally:
|
||||
# Always mark done — we do not want an unhandled exception (or an
|
||||
# unresolved inconsistency) to cause the @before_request hook to
|
||||
# retrigger reconciliation on every subsequent request.
|
||||
_reconciliation_done = True
|
||||
|
||||
# Initialize health monitor and run reconciliation on first request
|
||||
@app.before_request
|
||||
|
||||
@@ -1714,9 +1714,23 @@ def get_installed_plugins():
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
# Re-discover plugins to ensure we have the latest list
|
||||
# This handles cases where plugins are added/removed after app startup
|
||||
api_v3.plugin_manager.discover_plugins()
|
||||
# Re-discover plugins only if the plugins directory has actually
|
||||
# changed since our last scan, or if the caller explicitly asked
|
||||
# for a refresh. The previous unconditional ``discover_plugins()``
|
||||
# call (plus a per-plugin manifest re-read) made this endpoint
|
||||
# O(plugins) in disk I/O on every page refresh, which on an SD-card
|
||||
# Pi4 with ~15 plugins was pegging the CPU and blocking the UI
|
||||
# "connecting to display" spinner for minutes.
|
||||
force_refresh = request.args.get('refresh', '').lower() in ('1', 'true', 'yes')
|
||||
plugins_dir_path = Path(api_v3.plugin_manager.plugins_dir)
|
||||
try:
|
||||
current_mtime = plugins_dir_path.stat().st_mtime if plugins_dir_path.exists() else 0
|
||||
except OSError:
|
||||
current_mtime = 0
|
||||
last_mtime = getattr(api_v3, '_installed_plugins_dir_mtime', None)
|
||||
if force_refresh or last_mtime != current_mtime:
|
||||
api_v3.plugin_manager.discover_plugins()
|
||||
api_v3._installed_plugins_dir_mtime = current_mtime
|
||||
|
||||
# Get all installed plugin info from the plugin manager
|
||||
all_plugin_info = api_v3.plugin_manager.get_all_plugin_info()
|
||||
@@ -1729,17 +1743,10 @@ def get_installed_plugins():
|
||||
for plugin_info in all_plugin_info:
|
||||
plugin_id = plugin_info.get('id')
|
||||
|
||||
# Re-read manifest from disk to ensure we have the latest metadata
|
||||
manifest_path = Path(api_v3.plugin_manager.plugins_dir) / plugin_id / "manifest.json"
|
||||
if manifest_path.exists():
|
||||
try:
|
||||
with open(manifest_path, 'r', encoding='utf-8') as f:
|
||||
fresh_manifest = json.load(f)
|
||||
# Update plugin_info with fresh manifest data
|
||||
plugin_info.update(fresh_manifest)
|
||||
except Exception as e:
|
||||
# If we can't read the fresh manifest, use the cached one
|
||||
logger.warning("[PluginStore] Could not read fresh manifest for %s: %s", plugin_id, e)
|
||||
# Note: we intentionally do NOT re-read manifest.json here.
|
||||
# discover_plugins() above already reparses manifests on change;
|
||||
# re-reading on every request added ~1 syscall+json.loads per
|
||||
# plugin per request for no benefit.
|
||||
|
||||
# Get enabled status from config (source of truth)
|
||||
# Read from config file first, fall back to plugin instance if config doesn't have the key
|
||||
@@ -2369,14 +2376,30 @@ def reconcile_plugin_state():
|
||||
|
||||
from src.plugin_system.state_reconciliation import StateReconciliation
|
||||
|
||||
# Pass the store manager so auto-repair of missing-on-disk plugins
|
||||
# can actually run. Previously this endpoint silently degraded to
|
||||
# MANUAL_FIX_REQUIRED because store_manager was omitted.
|
||||
reconciler = StateReconciliation(
|
||||
state_manager=api_v3.plugin_state_manager,
|
||||
config_manager=api_v3.config_manager,
|
||||
plugin_manager=api_v3.plugin_manager,
|
||||
plugins_dir=Path(api_v3.plugin_manager.plugins_dir)
|
||||
plugins_dir=Path(api_v3.plugin_manager.plugins_dir),
|
||||
store_manager=api_v3.plugin_store_manager,
|
||||
)
|
||||
|
||||
result = reconciler.reconcile_state()
|
||||
# Allow the caller to force a retry of previously-unrecoverable
|
||||
# plugins (e.g. after the registry has been updated or a typo fixed).
|
||||
# Non-object JSON bodies (e.g. a bare string or array) must fall
|
||||
# through to the default False instead of raising AttributeError,
|
||||
# and string booleans like "false" must coerce correctly — hence
|
||||
# the isinstance guard plus _coerce_to_bool.
|
||||
force = False
|
||||
if request.is_json:
|
||||
payload = request.get_json(silent=True)
|
||||
if isinstance(payload, dict):
|
||||
force = _coerce_to_bool(payload.get('force', False))
|
||||
|
||||
result = reconciler.reconcile_state(force=force)
|
||||
|
||||
return success_response(
|
||||
data={
|
||||
@@ -2799,6 +2822,181 @@ def update_plugin():
|
||||
status_code=500
|
||||
)
|
||||
|
||||
def _snapshot_plugin_config(plugin_id: str):
|
||||
"""Capture the plugin's current config and secrets entries for rollback.
|
||||
|
||||
Returns a tuple ``(main_entry, secrets_entry)`` where each element is
|
||||
the plugin's dict from the respective file, or ``None`` if the plugin
|
||||
was not present there. Used by the transactional uninstall path so we
|
||||
can restore state if file removal fails after config cleanup has
|
||||
already succeeded.
|
||||
"""
|
||||
main_entry = None
|
||||
secrets_entry = None
|
||||
# Narrow exception list: filesystem errors (FileNotFoundError is a
|
||||
# subclass of OSError, IOError is an alias for OSError in Python 3)
|
||||
# and ConfigError, which is what ``get_raw_file_content`` wraps all
|
||||
# load failures in. Programmer errors (TypeError, AttributeError,
|
||||
# etc.) are intentionally NOT caught — they should surface loudly.
|
||||
try:
|
||||
main_config = api_v3.config_manager.get_raw_file_content('main')
|
||||
if plugin_id in main_config:
|
||||
import copy as _copy
|
||||
main_entry = _copy.deepcopy(main_config[plugin_id])
|
||||
except (OSError, ConfigError) as e:
|
||||
logger.warning("[PluginUninstall] Could not snapshot main config for %s: %s", plugin_id, e)
|
||||
try:
|
||||
import os as _os
|
||||
if _os.path.exists(api_v3.config_manager.secrets_path):
|
||||
secrets_config = api_v3.config_manager.get_raw_file_content('secrets')
|
||||
if plugin_id in secrets_config:
|
||||
import copy as _copy
|
||||
secrets_entry = _copy.deepcopy(secrets_config[plugin_id])
|
||||
except (OSError, ConfigError) as e:
|
||||
logger.warning("[PluginUninstall] Could not snapshot secrets for %s: %s", plugin_id, e)
|
||||
return (main_entry, secrets_entry)
|
||||
|
||||
|
||||
def _restore_plugin_config(plugin_id: str, snapshot) -> None:
|
||||
"""Best-effort restoration of a snapshot taken by ``_snapshot_plugin_config``.
|
||||
|
||||
Called on the unhappy path when ``cleanup_plugin_config`` already
|
||||
succeeded but the subsequent file removal failed. If the restore
|
||||
itself fails, we log loudly — the caller still sees the original
|
||||
uninstall error and the user can reconcile manually.
|
||||
"""
|
||||
main_entry, secrets_entry = snapshot
|
||||
if main_entry is not None:
|
||||
try:
|
||||
main_config = api_v3.config_manager.get_raw_file_content('main')
|
||||
main_config[plugin_id] = main_entry
|
||||
api_v3.config_manager.save_raw_file_content('main', main_config)
|
||||
logger.warning("[PluginUninstall] Restored main config entry for %s after uninstall failure", plugin_id)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
"[PluginUninstall] FAILED to restore main config entry for %s after uninstall failure: %s",
|
||||
plugin_id, e, exc_info=True,
|
||||
)
|
||||
if secrets_entry is not None:
|
||||
try:
|
||||
import os as _os
|
||||
if _os.path.exists(api_v3.config_manager.secrets_path):
|
||||
secrets_config = api_v3.config_manager.get_raw_file_content('secrets')
|
||||
else:
|
||||
secrets_config = {}
|
||||
secrets_config[plugin_id] = secrets_entry
|
||||
api_v3.config_manager.save_raw_file_content('secrets', secrets_config)
|
||||
logger.warning("[PluginUninstall] Restored secrets entry for %s after uninstall failure", plugin_id)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
"[PluginUninstall] FAILED to restore secrets entry for %s after uninstall failure: %s",
|
||||
plugin_id, e, exc_info=True,
|
||||
)
|
||||
|
||||
|
||||
def _do_transactional_uninstall(plugin_id: str, preserve_config: bool) -> None:
|
||||
"""Run the full uninstall as a best-effort transaction.
|
||||
|
||||
Order:
|
||||
1. Mark tombstone (so any reconciler racing with us cannot resurrect
|
||||
the plugin mid-flight).
|
||||
2. Snapshot existing config + secrets entries (for rollback).
|
||||
3. Run ``cleanup_plugin_config``. If this raises, re-raise — files
|
||||
have NOT been touched, so aborting here leaves a fully consistent
|
||||
state: plugin is still installed and still in config.
|
||||
4. Unload the plugin from the running plugin manager.
|
||||
5. Call ``store_manager.uninstall_plugin``. If it returns False or
|
||||
raises, RESTORE the snapshot (so config matches disk) and then
|
||||
propagate the failure.
|
||||
6. Invalidate schema cache and remove from the state manager only
|
||||
after the file removal succeeds.
|
||||
|
||||
Raises on any failure so the caller can return an error to the user.
|
||||
"""
|
||||
if hasattr(api_v3.plugin_store_manager, 'mark_recently_uninstalled'):
|
||||
api_v3.plugin_store_manager.mark_recently_uninstalled(plugin_id)
|
||||
|
||||
snapshot = _snapshot_plugin_config(plugin_id) if not preserve_config else (None, None)
|
||||
|
||||
# Step 1: config cleanup. If this fails, bail out early — the plugin
|
||||
# files on disk are still intact and the caller will get a clear
|
||||
# error.
|
||||
if not preserve_config:
|
||||
try:
|
||||
api_v3.config_manager.cleanup_plugin_config(plugin_id, remove_secrets=True)
|
||||
except Exception as cleanup_err:
|
||||
logger.error(
|
||||
"[PluginUninstall] Config cleanup failed for %s; aborting uninstall (files untouched): %s",
|
||||
plugin_id, cleanup_err, exc_info=True,
|
||||
)
|
||||
raise
|
||||
|
||||
# Remember whether the plugin was loaded *before* we touched runtime
|
||||
# state — we need this so we can reload it on rollback if file
|
||||
# removal fails after we've already unloaded it.
|
||||
was_loaded = bool(
|
||||
api_v3.plugin_manager and plugin_id in api_v3.plugin_manager.plugins
|
||||
)
|
||||
|
||||
def _rollback(reason_err):
|
||||
"""Undo both the config cleanup AND the unload."""
|
||||
if not preserve_config:
|
||||
_restore_plugin_config(plugin_id, snapshot)
|
||||
if was_loaded and api_v3.plugin_manager:
|
||||
try:
|
||||
api_v3.plugin_manager.load_plugin(plugin_id)
|
||||
except Exception as reload_err:
|
||||
logger.error(
|
||||
"[PluginUninstall] FAILED to reload %s after uninstall rollback: %s",
|
||||
plugin_id, reload_err, exc_info=True,
|
||||
)
|
||||
|
||||
# Step 2: unload if loaded. Also part of the rollback boundary — if
|
||||
# unload itself raises, restore config and surface the error.
|
||||
if was_loaded:
|
||||
try:
|
||||
api_v3.plugin_manager.unload_plugin(plugin_id)
|
||||
except Exception as unload_err:
|
||||
logger.error(
|
||||
"[PluginUninstall] unload_plugin raised for %s; restoring config snapshot: %s",
|
||||
plugin_id, unload_err, exc_info=True,
|
||||
)
|
||||
if not preserve_config:
|
||||
_restore_plugin_config(plugin_id, snapshot)
|
||||
# Plugin was never successfully unloaded, so no reload is
|
||||
# needed here — runtime state is still what it was before.
|
||||
raise
|
||||
|
||||
# Step 3: remove files. If this fails, roll back the config cleanup
|
||||
# AND reload the plugin so the user doesn't end up with an orphaned
|
||||
# install (files on disk + no config entry + plugin no longer
|
||||
# loaded at runtime).
|
||||
try:
|
||||
success = api_v3.plugin_store_manager.uninstall_plugin(plugin_id)
|
||||
except Exception as uninstall_err:
|
||||
logger.error(
|
||||
"[PluginUninstall] uninstall_plugin raised for %s; rolling back: %s",
|
||||
plugin_id, uninstall_err, exc_info=True,
|
||||
)
|
||||
_rollback(uninstall_err)
|
||||
raise
|
||||
|
||||
if not success:
|
||||
logger.error(
|
||||
"[PluginUninstall] uninstall_plugin returned False for %s; rolling back",
|
||||
plugin_id,
|
||||
)
|
||||
_rollback(None)
|
||||
raise RuntimeError(f"Failed to uninstall plugin {plugin_id}")
|
||||
|
||||
# Past this point the filesystem and config are both in the
|
||||
# "uninstalled" state. Clean up the cheap in-memory bookkeeping.
|
||||
if api_v3.schema_manager:
|
||||
api_v3.schema_manager.invalidate_cache(plugin_id)
|
||||
if api_v3.plugin_state_manager:
|
||||
api_v3.plugin_state_manager.remove_plugin_state(plugin_id)
|
||||
|
||||
|
||||
@api_v3.route('/plugins/uninstall', methods=['POST'])
|
||||
def uninstall_plugin():
|
||||
"""Uninstall plugin"""
|
||||
@@ -2821,49 +3019,28 @@ def uninstall_plugin():
|
||||
# Use operation queue if available
|
||||
if api_v3.operation_queue:
|
||||
def uninstall_callback(operation):
|
||||
"""Callback to execute plugin uninstallation."""
|
||||
# Unload the plugin first if it's loaded
|
||||
if api_v3.plugin_manager and plugin_id in api_v3.plugin_manager.plugins:
|
||||
api_v3.plugin_manager.unload_plugin(plugin_id)
|
||||
|
||||
# Uninstall the plugin
|
||||
success = api_v3.plugin_store_manager.uninstall_plugin(plugin_id)
|
||||
|
||||
if not success:
|
||||
error_msg = f'Failed to uninstall plugin {plugin_id}'
|
||||
"""Callback to execute plugin uninstallation transactionally."""
|
||||
try:
|
||||
_do_transactional_uninstall(plugin_id, preserve_config)
|
||||
except Exception as err:
|
||||
error_msg = f'Failed to uninstall plugin {plugin_id}: {err}'
|
||||
if api_v3.operation_history:
|
||||
api_v3.operation_history.record_operation(
|
||||
"uninstall",
|
||||
plugin_id=plugin_id,
|
||||
status="failed",
|
||||
error=error_msg
|
||||
error=error_msg,
|
||||
)
|
||||
raise Exception(error_msg)
|
||||
# Re-raise so the operation_queue marks this op as failed.
|
||||
raise
|
||||
|
||||
# Invalidate schema cache
|
||||
if api_v3.schema_manager:
|
||||
api_v3.schema_manager.invalidate_cache(plugin_id)
|
||||
|
||||
# Clean up plugin configuration if not preserving
|
||||
if not preserve_config:
|
||||
try:
|
||||
api_v3.config_manager.cleanup_plugin_config(plugin_id, remove_secrets=True)
|
||||
except Exception as cleanup_err:
|
||||
logger.warning("[PluginUninstall] Failed to cleanup config for %s: %s", plugin_id, cleanup_err)
|
||||
|
||||
# Remove from state manager
|
||||
if api_v3.plugin_state_manager:
|
||||
api_v3.plugin_state_manager.remove_plugin_state(plugin_id)
|
||||
|
||||
# Record in history
|
||||
if api_v3.operation_history:
|
||||
api_v3.operation_history.record_operation(
|
||||
"uninstall",
|
||||
plugin_id=plugin_id,
|
||||
status="success",
|
||||
details={"preserve_config": preserve_config}
|
||||
details={"preserve_config": preserve_config},
|
||||
)
|
||||
|
||||
return {'success': True, 'message': f'Plugin {plugin_id} uninstalled successfully'}
|
||||
|
||||
# Enqueue operation
|
||||
@@ -2878,55 +3055,32 @@ def uninstall_plugin():
|
||||
message=f'Plugin {plugin_id} uninstallation queued'
|
||||
)
|
||||
else:
|
||||
# Fallback to direct uninstall
|
||||
# Unload the plugin first if it's loaded
|
||||
if api_v3.plugin_manager and plugin_id in api_v3.plugin_manager.plugins:
|
||||
api_v3.plugin_manager.unload_plugin(plugin_id)
|
||||
|
||||
# Uninstall the plugin
|
||||
success = api_v3.plugin_store_manager.uninstall_plugin(plugin_id)
|
||||
|
||||
if success:
|
||||
# Invalidate schema cache
|
||||
if api_v3.schema_manager:
|
||||
api_v3.schema_manager.invalidate_cache(plugin_id)
|
||||
|
||||
# Clean up plugin configuration if not preserving
|
||||
if not preserve_config:
|
||||
try:
|
||||
api_v3.config_manager.cleanup_plugin_config(plugin_id, remove_secrets=True)
|
||||
except Exception as cleanup_err:
|
||||
logger.warning("[PluginUninstall] Failed to cleanup config for %s: %s", plugin_id, cleanup_err)
|
||||
|
||||
# Remove from state manager
|
||||
if api_v3.plugin_state_manager:
|
||||
api_v3.plugin_state_manager.remove_plugin_state(plugin_id)
|
||||
|
||||
# Record in history
|
||||
if api_v3.operation_history:
|
||||
api_v3.operation_history.record_operation(
|
||||
"uninstall",
|
||||
plugin_id=plugin_id,
|
||||
status="success",
|
||||
details={"preserve_config": preserve_config}
|
||||
)
|
||||
|
||||
return success_response(message=f'Plugin {plugin_id} uninstalled successfully')
|
||||
else:
|
||||
# Fallback to direct uninstall — same transactional helper.
|
||||
try:
|
||||
_do_transactional_uninstall(plugin_id, preserve_config)
|
||||
except Exception as err:
|
||||
if api_v3.operation_history:
|
||||
api_v3.operation_history.record_operation(
|
||||
"uninstall",
|
||||
plugin_id=plugin_id,
|
||||
status="failed",
|
||||
error=f'Failed to uninstall plugin {plugin_id}'
|
||||
error=f'Failed to uninstall plugin {plugin_id}: {err}',
|
||||
)
|
||||
|
||||
return error_response(
|
||||
ErrorCode.PLUGIN_UNINSTALL_FAILED,
|
||||
f'Failed to uninstall plugin {plugin_id}',
|
||||
status_code=500
|
||||
f'Failed to uninstall plugin {plugin_id}: {err}',
|
||||
status_code=500,
|
||||
)
|
||||
|
||||
if api_v3.operation_history:
|
||||
api_v3.operation_history.record_operation(
|
||||
"uninstall",
|
||||
plugin_id=plugin_id,
|
||||
status="success",
|
||||
details={"preserve_config": preserve_config},
|
||||
)
|
||||
return success_response(message=f'Plugin {plugin_id} uninstalled successfully')
|
||||
|
||||
except Exception as e:
|
||||
logger.exception("[PluginUninstall] Unhandled exception")
|
||||
from src.web_interface.errors import WebInterfaceError
|
||||
|
||||
Reference in New Issue
Block a user