4 Commits

Author SHA1 Message Date
Chuck
1a3e6f0685 fix: address five review findings (NM retry loop, start_display message, code quality)
- wifi_monitor_daemon: reset _consecutive_internet_failures = 0 in both
  NM-restart exception handlers; previously both left the counter at threshold,
  causing an immediate retry on the next iteration instead of waiting another
  full backoff period

- api_v3: fix start_display failure message — when mode is set and systemctl
  returns non-zero, message now includes the failure reason and a hint rather
  than always reporting success phrasing

- wifi_manager: move _redirect_backend from class variable to instance variable
  in __init__ alongside _ap_enabled_at; class-level default shadowed correctly
  in practice (single instance) but was misleading

- wifi_manager: narrow broad except Exception in _check_internet_connectivity
  to (subprocess.SubprocessError, OSError) for ping and OSError for HTTP
  (urllib.error.URLError is an OSError subclass in Python 3)

- wifi_manager: remove redundant local 'import re as _re' in _validate_ap_config;
  re is already imported at module level (line 37)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 13:25:12 -04:00
Chuck
5b6137f5f4 fix: address five valid review findings; skip two
Fixed:
- march-madness/requirements.txt: Pillow>=10.3.0 (patches CVE-2024-28219;
  10.3.0 is the actual fix version — reviewer cited 12.2.0 but that risks
  breaking API changes without test coverage)
- wifi_monitor_daemon.py: add missing `import subprocess`; subprocess.run
  and CalledProcessError would NameError at runtime on the NM restart path
- wifi_manager.py: validate ap_idle_timeout_minutes before arithmetic —
  coerce to int, clamp 1–1440, fall back to 15 on bad config values
- wifi_manager.py: call _remove_nm_dnsmasq_captive_conf() on all three
  rollback paths in _enable_ap_mode_nmcli_hotspot() and in the top-level
  except block so stale dnsmasq drop-ins are never left behind
- api_v3.py: fix wrong_password prefix strip — removeprefix("wrong_password:")
  then lstrip() handles both "wrong_password: msg" and "wrong_password:msg"
- plugins_manager.js: add .catch() to loadInstalledPlugins().then() to
  surface failures instead of silently dropping unhandled rejections

Skipped:
- WiFiManager AP state persistence: architectural overhaul; _is_ap_mode_active()
  already derives from live system state, not in-memory variables
- Absolute subprocess paths in api_v3.py: paths vary by distro (/usr/bin vs
  /bin); web service has a normal PATH; sudoers already use resolved paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 12:27:41 -04:00
Chuck
f97573c368 revert: restore AP-mode grace period to 90s (3 checks)
The counter reset after NM restart already fully prevents the SSH-lockout
cascade: _disconnected_checks can never accumulate across NM restarts
because it is reset to 0 before the next daemon iteration runs.

The 3→6 increase provided no additional fix for the described problem and
caused a UX regression: fresh Pi devices with no WiFi configured would
wait 3 minutes instead of 90 seconds for the LEDMatrix-Setup hotspot to
appear.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 12:03:17 -04:00
Chuck
9a74db6de3 fix: service control buttons and AP-mode SSH lockout post-install
Two user-reported issues after fresh install:

1. All service buttons (Start/Stop/Restart Display, Restart Web Service)
   failed silently — only Reboot worked.

   Root cause: sudoers rules use `ledmatrix.service` (with suffix) but
   api_v3.py called `sudo systemctl start ledmatrix` (no suffix). sudo
   does exact string matching, so every service action was rejected with
   returncode=1. Also missing from sudoers: ledmatrix-web, journalctl,
   and is-active entries.

   Fix:
   - Add `.service` suffix to all 8 sudo systemctl call sites in
     api_v3.py (_ensure_display_service_running, _stop_display_service,
     and all execute_system_action branches).
   - Add timeout=15 to all subprocess.run calls in execute_system_action
     (previously could hang indefinitely).
   - Add missing sudoers rules to first_time_install.sh and
     configure_web_sudo.sh: ledmatrix-web.service start/stop/restart,
     is-active for both name forms, and journalctl -u/-t ledmatrix rules.

2. SSH and web UI became inaccessible after ~1 hour even though the
   display kept running.

   Root cause: wifi_monitor_daemon restarts NetworkManager after 5
   consecutive internet failures (~2.5 min). Each NM restart drops WiFi
   briefly. During that window check_and_manage_ap_mode() increments
   _disconnected_checks but the daemon never reset it after the restart.
   After 3 such NM-restart cycles, _disconnected_checks reached 3 and
   AP mode activated — changing the Pi from WiFi client to hotspot
   (192.168.4.1) and killing SSH on the old IP.

   Fix:
   - Reset wifi_manager._disconnected_checks = 0 in the daemon
     immediately after a successful NM restart so the brief drop it
     causes doesn't count toward AP-mode activation.
   - Increase _disconnected_checks_required from 3 to 6 (90s → 3min)
     as an additional buffer against transient network flaps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 11:47:56 -04:00
7 changed files with 62 additions and 36 deletions

View File

@@ -1086,6 +1086,7 @@ SYSTEMCTL_PATH=$(which systemctl)
REBOOT_PATH=$(which reboot)
POWEROFF_PATH=$(which poweroff)
BASH_PATH=$(which bash)
JOURNALCTL_PATH=$(which journalctl 2>/dev/null || true)
# Create sudoers content
cat > /tmp/ledmatrix_web_sudoers << EOF
@@ -1101,10 +1102,22 @@ $ACTUAL_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH restart ledmatrix.service
$ACTUAL_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH enable ledmatrix.service
$ACTUAL_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH disable ledmatrix.service
$ACTUAL_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH status ledmatrix.service
$ACTUAL_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH is-active ledmatrix
$ACTUAL_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH is-active ledmatrix.service
$ACTUAL_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH start ledmatrix-web.service
$ACTUAL_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH stop ledmatrix-web.service
$ACTUAL_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH restart ledmatrix-web.service
$ACTUAL_USER ALL=(ALL) NOPASSWD: $PYTHON_PATH $PROJECT_ROOT_DIR/display_controller.py
$ACTUAL_USER ALL=(ALL) NOPASSWD: $BASH_PATH $PROJECT_ROOT_DIR/start_display.sh
$ACTUAL_USER ALL=(ALL) NOPASSWD: $BASH_PATH $PROJECT_ROOT_DIR/stop_display.sh
EOF
if [ -n "$JOURNALCTL_PATH" ]; then
cat >> /tmp/ledmatrix_web_sudoers << EOF
$ACTUAL_USER ALL=(ALL) NOPASSWD: $JOURNALCTL_PATH -u ledmatrix.service *
$ACTUAL_USER ALL=(ALL) NOPASSWD: $JOURNALCTL_PATH -u ledmatrix *
$ACTUAL_USER ALL=(ALL) NOPASSWD: $JOURNALCTL_PATH -t ledmatrix *
EOF
fi
if [ -f "$SUDOERS_FILE" ] && cmp -s /tmp/ledmatrix_web_sudoers "$SUDOERS_FILE"; then
echo "Sudoers configuration already up to date"

View File

@@ -1,4 +1,4 @@
requests>=2.28.0
Pillow>=10.2.0
Pillow>=10.3.0
pytz>=2022.1
numpy>=1.24.0

View File

@@ -89,9 +89,9 @@ TEMP_SUDOERS="/tmp/ledmatrix_web_sudoers_$$"
echo "$WEB_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH status ledmatrix.service"
echo "$WEB_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH is-active ledmatrix"
echo "$WEB_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH is-active ledmatrix.service"
echo "$WEB_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH start ledmatrix-web"
echo "$WEB_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH stop ledmatrix-web"
echo "$WEB_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH restart ledmatrix-web"
echo "$WEB_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH start ledmatrix-web.service"
echo "$WEB_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH stop ledmatrix-web.service"
echo "$WEB_USER ALL=(ALL) NOPASSWD: $SYSTEMCTL_PATH restart ledmatrix-web.service"
# Optional: journalctl (non-critical — skip if not found)
if [ -n "$JOURNALCTL_PATH" ]; then

View File

@@ -10,6 +10,7 @@ import sys
import time
import logging
import signal
import subprocess
from pathlib import Path
# Add project root to path (parent of scripts/utils/)
@@ -146,12 +147,18 @@ class WiFiMonitorDaemon:
capture_output=True, timeout=20, check=True
)
self._consecutive_internet_failures = 0
# NM restart causes a brief WiFi drop; reset the AP-mode grace
# counter so that transient disconnect doesn't count toward
# triggering AP mode.
self.wifi_manager._disconnected_checks = 0
except subprocess.CalledProcessError as e:
logger.error(f"NetworkManager restart failed (rc={e.returncode}); "
"keeping failure counter unchanged")
"resetting failure counter to avoid tight retry loop")
self._consecutive_internet_failures = 0
except Exception as e:
logger.error(f"NetworkManager restart error: {e}; "
"keeping failure counter unchanged")
"resetting failure counter to avoid tight retry loop")
self._consecutive_internet_failures = 0
else:
self._consecutive_internet_failures = 0
else:

View File

@@ -144,6 +144,8 @@ class WiFiManager:
# Timestamp set when AP mode is enabled; used for the idle-timeout check
self._ap_enabled_at: Optional[float] = None
# Which redirect backend was used (iptables/nftables/None); set per-instance
self._redirect_backend: Optional[str] = None
logger.info(f"WiFi Manager initialized - nmcli: {self.has_nmcli}, iwlist: {self.has_iwlist}, "
f"hostapd: {self.has_hostapd}, dnsmasq: {self.has_dnsmasq}, "
@@ -691,9 +693,8 @@ class WiFiManager:
def _validate_ap_config(self) -> Tuple[str, int]:
"""Return a sanitized (ssid, channel) pair from config, falling back to defaults."""
import re as _re
ssid = str(self.config.get("ap_ssid", DEFAULT_AP_SSID))
if not ssid or len(ssid) > 32 or not _re.match(r'^[\x20-\x7E]+$', ssid):
if not ssid or len(ssid) > 32 or not re.match(r'^[\x20-\x7E]+$', ssid):
logger.warning(f"AP SSID '{ssid}' is invalid, falling back to default")
ssid = DEFAULT_AP_SSID
try:
@@ -705,10 +706,6 @@ class WiFiManager:
channel = DEFAULT_AP_CHANNEL
return ssid, channel
# Tracks which redirect backend was used so teardown uses the same one.
# Value is "iptables", "nftables", or None (not set up).
_redirect_backend: Optional[str] = None
def _setup_iptables_redirect(self) -> bool:
"""
Add port 80 → 5000 redirect rules for the captive portal.
@@ -936,14 +933,14 @@ class WiFiManager:
if r.returncode == 0:
logger.debug("Internet connectivity confirmed via ping 8.8.8.8")
return True
except Exception:
except (subprocess.SubprocessError, OSError):
pass
try:
import urllib.request as _ureq
_ureq.urlopen("http://connectivity-check.ubuntu.com/", timeout=timeout)
logger.debug("Internet connectivity confirmed via HTTP check")
return True
except Exception:
except OSError:
pass
logger.debug("Internet connectivity check failed (both ping and HTTP)")
return False
@@ -2074,6 +2071,7 @@ class WiFiManager:
if up_result.returncode != 0:
error_msg = up_result.stderr.strip() or up_result.stdout.strip()
logger.error(f"Failed to bring up AP connection: {error_msg}")
self._remove_nm_dnsmasq_captive_conf()
subprocess.run(["nmcli", "connection", "delete", "LEDMatrix-Setup-AP"],
capture_output=True, timeout=10)
self._show_led_message("AP mode failed", duration=5)
@@ -2085,6 +2083,7 @@ class WiFiManager:
# need to add the iptables port-redirect rules for the captive portal.
if not self._setup_iptables_redirect():
logger.error("Captive-portal redirect setup failed; rolling back AP profile")
self._remove_nm_dnsmasq_captive_conf()
subprocess.run(["nmcli", "connection", "down", "LEDMatrix-Setup-AP"],
capture_output=True, timeout=10)
subprocess.run(["nmcli", "connection", "delete", "LEDMatrix-Setup-AP"],
@@ -2102,6 +2101,7 @@ class WiFiManager:
else:
logger.error("AP mode started but not verified by status check — rolling back")
self._teardown_iptables_redirect()
self._remove_nm_dnsmasq_captive_conf()
subprocess.run(["nmcli", "connection", "down", "LEDMatrix-Setup-AP"],
capture_output=True, timeout=10)
subprocess.run(["nmcli", "connection", "delete", "LEDMatrix-Setup-AP"],
@@ -2111,6 +2111,7 @@ class WiFiManager:
except Exception as e:
logger.error(f"Error starting AP mode with nmcli: {e}")
self._remove_nm_dnsmasq_captive_conf()
self._show_led_message("Setup mode error", duration=5)
return False, str(e)
@@ -2498,7 +2499,10 @@ address=/detectportal.firefox.com/192.168.4.1
# Idle-timeout check: disable AP if no client has connected within the window.
# Only applies when AP is active and we haven't just decided to enable/disable it.
if ap_active and self._ap_enabled_at is not None:
idle_timeout_min = self.config.get("ap_idle_timeout_minutes", 15)
try:
idle_timeout_min = max(1, min(1440, int(self.config.get("ap_idle_timeout_minutes", 15))))
except (TypeError, ValueError):
idle_timeout_min = 15
elapsed = time.time() - self._ap_enabled_at
if elapsed > idle_timeout_min * 60 and not self._has_ap_clients():
logger.info(

View File

@@ -218,7 +218,7 @@ def _ensure_display_service_running():
if status.get('active'):
status['started'] = False
return status
result = _run_systemctl_command(['sudo', 'systemctl', 'start', 'ledmatrix'])
result = _run_systemctl_command(['sudo', 'systemctl', 'start', 'ledmatrix.service'])
service_status = _get_display_service_status()
result['started'] = result.get('returncode') == 0
result['active'] = service_status.get('active')
@@ -227,7 +227,7 @@ def _ensure_display_service_running():
def _stop_display_service():
"""Stop the ledmatrix display service."""
result = _run_systemctl_command(['sudo', 'systemctl', 'stop', 'ledmatrix'])
result = _run_systemctl_command(['sudo', 'systemctl', 'stop', 'ledmatrix.service'])
status = _get_display_service_status()
result['active'] = status.get('active')
result['status'] = status
@@ -1716,33 +1716,34 @@ def execute_system_action():
if mode:
# For on-demand modes, we would need to integrate with the display controller
# For now, just start the display service
result = subprocess.run(['sudo', 'systemctl', 'start', 'ledmatrix'],
capture_output=True, text=True)
result = subprocess.run(['sudo', 'systemctl', 'start', 'ledmatrix.service'],
capture_output=True, text=True, timeout=15)
return jsonify({
'status': 'success' if result.returncode == 0 else 'error',
'message': f'Started display in {mode} mode',
'message': f'Started display in {mode} mode' if result.returncode == 0
else f'Failed to start display in {mode} mode: {result.stderr.strip() or "check sudo systemctl status ledmatrix.service"}',
'returncode': result.returncode,
'stdout': result.stdout,
'stderr': result.stderr
})
else:
result = subprocess.run(['sudo', 'systemctl', 'start', 'ledmatrix'],
capture_output=True, text=True)
result = subprocess.run(['sudo', 'systemctl', 'start', 'ledmatrix.service'],
capture_output=True, text=True, timeout=15)
elif action == 'stop_display':
result = subprocess.run(['sudo', 'systemctl', 'stop', 'ledmatrix'],
capture_output=True, text=True)
result = subprocess.run(['sudo', 'systemctl', 'stop', 'ledmatrix.service'],
capture_output=True, text=True, timeout=15)
elif action == 'enable_autostart':
result = subprocess.run(['sudo', 'systemctl', 'enable', 'ledmatrix'],
capture_output=True, text=True)
result = subprocess.run(['sudo', 'systemctl', 'enable', 'ledmatrix.service'],
capture_output=True, text=True, timeout=15)
elif action == 'disable_autostart':
result = subprocess.run(['sudo', 'systemctl', 'disable', 'ledmatrix'],
capture_output=True, text=True)
result = subprocess.run(['sudo', 'systemctl', 'disable', 'ledmatrix.service'],
capture_output=True, text=True, timeout=15)
elif action == 'reboot_system':
result = subprocess.run(['sudo', 'reboot'],
capture_output=True, text=True)
capture_output=True, text=True, timeout=10)
elif action == 'shutdown_system':
result = subprocess.run(['sudo', 'poweroff'],
capture_output=True, text=True)
capture_output=True, text=True, timeout=10)
elif action == 'git_pull':
# Use PROJECT_ROOT instead of hardcoded path
project_dir = str(PROJECT_ROOT)
@@ -1823,12 +1824,11 @@ def execute_system_action():
'stderr': result.stderr
})
elif action == 'restart_display_service':
result = subprocess.run(['sudo', 'systemctl', 'restart', 'ledmatrix'],
capture_output=True, text=True)
result = subprocess.run(['sudo', 'systemctl', 'restart', 'ledmatrix.service'],
capture_output=True, text=True, timeout=15)
elif action == 'restart_web_service':
# Try to restart the web service (assuming it's ledmatrix-web.service)
result = subprocess.run(['sudo', 'systemctl', 'restart', 'ledmatrix-web'],
capture_output=True, text=True)
result = subprocess.run(['sudo', 'systemctl', 'restart', 'ledmatrix-web.service'],
capture_output=True, text=True, timeout=15)
else:
return jsonify({'status': 'error', 'message': f'Unknown action: {action}'}), 400
@@ -7136,7 +7136,7 @@ def connect_wifi():
# Propagate structured error type so the captive portal UI can show
# "Wrong password — try again" instead of a generic failure message.
error_type = "wrong_password" if (message or "").startswith("wrong_password:") else "connection_failed"
clean_message = (message or "").removeprefix("wrong_password: ") or "Failed to connect to network"
clean_message = (message or "").removeprefix("wrong_password:").lstrip() or "Failed to connect to network"
return jsonify({
'status': 'error',
'message': clean_message,

View File

@@ -1227,6 +1227,8 @@ function initializePlugins() {
// searchPluginStore renders Installed/Reinstall badges against it.
loadInstalledPlugins().then(() => {
searchPluginStore(!isReswapWarm);
}).catch(err => {
console.error('[PluginStore] loadInstalledPlugins failed:', err);
});
// Setup search functionality (with guard against duplicate listeners)