Files
LEDMatrix/web_interface/static/v3/plugins_manager.js
Chuck 1a0f1c8015 fix: service control buttons and AP-mode SSH lockout post-install (#326)
* fix: service control buttons and AP-mode SSH lockout post-install

Two user-reported issues after fresh install:

1. All service buttons (Start/Stop/Restart Display, Restart Web Service)
   failed silently — only Reboot worked.

   Root cause: sudoers rules use `ledmatrix.service` (with suffix) but
   api_v3.py called `sudo systemctl start ledmatrix` (no suffix). sudo
   does exact string matching, so every service action was rejected with
   returncode=1. Also missing from sudoers: ledmatrix-web, journalctl,
   and is-active entries.

   Fix:
   - Add `.service` suffix to all 8 sudo systemctl call sites in
     api_v3.py (_ensure_display_service_running, _stop_display_service,
     and all execute_system_action branches).
   - Add timeout=15 to all subprocess.run calls in execute_system_action
     (previously could hang indefinitely).
   - Add missing sudoers rules to first_time_install.sh and
     configure_web_sudo.sh: ledmatrix-web.service start/stop/restart,
     is-active for both name forms, and journalctl -u/-t ledmatrix rules.

2. SSH and web UI became inaccessible after ~1 hour even though the
   display kept running.

   Root cause: wifi_monitor_daemon restarts NetworkManager after 5
   consecutive internet failures (~2.5 min). Each NM restart drops WiFi
   briefly. During that window check_and_manage_ap_mode() increments
   _disconnected_checks but the daemon never reset it after the restart.
   After 3 such NM-restart cycles, _disconnected_checks reached 3 and
   AP mode activated — changing the Pi from WiFi client to hotspot
   (192.168.4.1) and killing SSH on the old IP.

   Fix:
   - Reset wifi_manager._disconnected_checks = 0 in the daemon
     immediately after a successful NM restart so the brief drop it
     causes doesn't count toward AP-mode activation.
   - Increase _disconnected_checks_required from 3 to 6 (90s → 3min)
     as an additional buffer against transient network flaps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* revert: restore AP-mode grace period to 90s (3 checks)

The counter reset after NM restart already fully prevents the SSH-lockout
cascade: _disconnected_checks can never accumulate across NM restarts
because it is reset to 0 before the next daemon iteration runs.

The 3→6 increase provided no additional fix for the described problem and
caused a UX regression: fresh Pi devices with no WiFi configured would
wait 3 minutes instead of 90 seconds for the LEDMatrix-Setup hotspot to
appear.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address five valid review findings; skip two

Fixed:
- march-madness/requirements.txt: Pillow>=10.3.0 (patches CVE-2024-28219;
  10.3.0 is the actual fix version — reviewer cited 12.2.0 but that risks
  breaking API changes without test coverage)
- wifi_monitor_daemon.py: add missing `import subprocess`; subprocess.run
  and CalledProcessError would NameError at runtime on the NM restart path
- wifi_manager.py: validate ap_idle_timeout_minutes before arithmetic —
  coerce to int, clamp 1–1440, fall back to 15 on bad config values
- wifi_manager.py: call _remove_nm_dnsmasq_captive_conf() on all three
  rollback paths in _enable_ap_mode_nmcli_hotspot() and in the top-level
  except block so stale dnsmasq drop-ins are never left behind
- api_v3.py: fix wrong_password prefix strip — removeprefix("wrong_password:")
  then lstrip() handles both "wrong_password: msg" and "wrong_password:msg"
- plugins_manager.js: add .catch() to loadInstalledPlugins().then() to
  surface failures instead of silently dropping unhandled rejections

Skipped:
- WiFiManager AP state persistence: architectural overhaul; _is_ap_mode_active()
  already derives from live system state, not in-memory variables
- Absolute subprocess paths in api_v3.py: paths vary by distro (/usr/bin vs
  /bin); web service has a normal PATH; sudoers already use resolved paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address five review findings (NM retry loop, start_display message, code quality)

- wifi_monitor_daemon: reset _consecutive_internet_failures = 0 in both
  NM-restart exception handlers; previously both left the counter at threshold,
  causing an immediate retry on the next iteration instead of waiting another
  full backoff period

- api_v3: fix start_display failure message — when mode is set and systemctl
  returns non-zero, message now includes the failure reason and a hint rather
  than always reporting success phrasing

- wifi_manager: move _redirect_backend from class variable to instance variable
  in __init__ alongside _ap_enabled_at; class-level default shadowed correctly
  in practice (single instance) but was misleading

- wifi_manager: narrow broad except Exception in _check_internet_connectivity
  to (subprocess.SubprocessError, OSError) for ping and OSError for HTTP
  (urllib.error.URLError is an OSError subclass in Python 3)

- wifi_manager: remove redundant local 'import re as _re' in _validate_ap_config;
  re is already imported at module level (line 37)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address five review findings (Pillow CVEs, daemon exception narrowing, timeout handling, plugin store)

- march-madness/requirements.txt: Pillow>=12.2.0 (patches CVE-2026-42308
  and CVE-2026-42310; previous floor of 10.3.0 was insufficient)

- wifi_monitor_daemon: narrow final except Exception to
  (subprocess.SubprocessError, OSError) so programming errors in the NM
  restart block are no longer silently swallowed

- api_v3/execute_system_action: add explicit subprocess.TimeoutExpired
  handler before the generic Exception catch; returns action-specific
  message with 'status','message','returncode','stdout','stderr' fields
  so the UI receives a precise, actionable payload instead of the generic
  'Failed to execute system action' string

- plugins_manager.js: move searchPluginStore into .finally() so the
  plugin store renders regardless of whether loadInstalledPlugins succeeds
  or fails; .catch() still logs the error

- first_time_install.sh: add safe_plugin_rm.sh NOPASSWD rule to the
  /tmp/ledmatrix_web_sudoers block; configure_web_sudo.sh had this rule
  but the standalone installer never granted it, leaving plugin removal
  broken after first-time install

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(api): resolve sudo/systemctl/reboot/poweroff paths at startup

Use shutil.which() with safe fallbacks for the four privileged binaries
instead of relying on bare names being resolved by the subprocess shell
search. Resolves paths once at module load rather than per-call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Chuck <chuck@example.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 17:58:51 -04:00

358 KiB