mirror of
https://github.com/ChuckBuilds/LEDMatrix.git
synced 2026-05-13 09:13:32 +00:00
* fix: service control buttons and AP-mode SSH lockout post-install
Two user-reported issues after fresh install:
1. All service buttons (Start/Stop/Restart Display, Restart Web Service)
failed silently — only Reboot worked.
Root cause: sudoers rules use `ledmatrix.service` (with suffix) but
api_v3.py called `sudo systemctl start ledmatrix` (no suffix). sudo
does exact string matching, so every service action was rejected with
returncode=1. Also missing from sudoers: ledmatrix-web, journalctl,
and is-active entries.
Fix:
- Add `.service` suffix to all 8 sudo systemctl call sites in
api_v3.py (_ensure_display_service_running, _stop_display_service,
and all execute_system_action branches).
- Add timeout=15 to all subprocess.run calls in execute_system_action
(previously could hang indefinitely).
- Add missing sudoers rules to first_time_install.sh and
configure_web_sudo.sh: ledmatrix-web.service start/stop/restart,
is-active for both name forms, and journalctl -u/-t ledmatrix rules.
2. SSH and web UI became inaccessible after ~1 hour even though the
display kept running.
Root cause: wifi_monitor_daemon restarts NetworkManager after 5
consecutive internet failures (~2.5 min). Each NM restart drops WiFi
briefly. During that window check_and_manage_ap_mode() increments
_disconnected_checks but the daemon never reset it after the restart.
After 3 such NM-restart cycles, _disconnected_checks reached 3 and
AP mode activated — changing the Pi from WiFi client to hotspot
(192.168.4.1) and killing SSH on the old IP.
Fix:
- Reset wifi_manager._disconnected_checks = 0 in the daemon
immediately after a successful NM restart so the brief drop it
causes doesn't count toward AP-mode activation.
- Increase _disconnected_checks_required from 3 to 6 (90s → 3min)
as an additional buffer against transient network flaps.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* revert: restore AP-mode grace period to 90s (3 checks)
The counter reset after NM restart already fully prevents the SSH-lockout
cascade: _disconnected_checks can never accumulate across NM restarts
because it is reset to 0 before the next daemon iteration runs.
The 3→6 increase provided no additional fix for the described problem and
caused a UX regression: fresh Pi devices with no WiFi configured would
wait 3 minutes instead of 90 seconds for the LEDMatrix-Setup hotspot to
appear.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: address five valid review findings; skip two
Fixed:
- march-madness/requirements.txt: Pillow>=10.3.0 (patches CVE-2024-28219;
10.3.0 is the actual fix version — reviewer cited 12.2.0 but that risks
breaking API changes without test coverage)
- wifi_monitor_daemon.py: add missing `import subprocess`; subprocess.run
and CalledProcessError would NameError at runtime on the NM restart path
- wifi_manager.py: validate ap_idle_timeout_minutes before arithmetic —
coerce to int, clamp 1–1440, fall back to 15 on bad config values
- wifi_manager.py: call _remove_nm_dnsmasq_captive_conf() on all three
rollback paths in _enable_ap_mode_nmcli_hotspot() and in the top-level
except block so stale dnsmasq drop-ins are never left behind
- api_v3.py: fix wrong_password prefix strip — removeprefix("wrong_password:")
then lstrip() handles both "wrong_password: msg" and "wrong_password:msg"
- plugins_manager.js: add .catch() to loadInstalledPlugins().then() to
surface failures instead of silently dropping unhandled rejections
Skipped:
- WiFiManager AP state persistence: architectural overhaul; _is_ap_mode_active()
already derives from live system state, not in-memory variables
- Absolute subprocess paths in api_v3.py: paths vary by distro (/usr/bin vs
/bin); web service has a normal PATH; sudoers already use resolved paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: address five review findings (NM retry loop, start_display message, code quality)
- wifi_monitor_daemon: reset _consecutive_internet_failures = 0 in both
NM-restart exception handlers; previously both left the counter at threshold,
causing an immediate retry on the next iteration instead of waiting another
full backoff period
- api_v3: fix start_display failure message — when mode is set and systemctl
returns non-zero, message now includes the failure reason and a hint rather
than always reporting success phrasing
- wifi_manager: move _redirect_backend from class variable to instance variable
in __init__ alongside _ap_enabled_at; class-level default shadowed correctly
in practice (single instance) but was misleading
- wifi_manager: narrow broad except Exception in _check_internet_connectivity
to (subprocess.SubprocessError, OSError) for ping and OSError for HTTP
(urllib.error.URLError is an OSError subclass in Python 3)
- wifi_manager: remove redundant local 'import re as _re' in _validate_ap_config;
re is already imported at module level (line 37)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: address five review findings (Pillow CVEs, daemon exception narrowing, timeout handling, plugin store)
- march-madness/requirements.txt: Pillow>=12.2.0 (patches CVE-2026-42308
and CVE-2026-42310; previous floor of 10.3.0 was insufficient)
- wifi_monitor_daemon: narrow final except Exception to
(subprocess.SubprocessError, OSError) so programming errors in the NM
restart block are no longer silently swallowed
- api_v3/execute_system_action: add explicit subprocess.TimeoutExpired
handler before the generic Exception catch; returns action-specific
message with 'status','message','returncode','stdout','stderr' fields
so the UI receives a precise, actionable payload instead of the generic
'Failed to execute system action' string
- plugins_manager.js: move searchPluginStore into .finally() so the
plugin store renders regardless of whether loadInstalledPlugins succeeds
or fails; .catch() still logs the error
- first_time_install.sh: add safe_plugin_rm.sh NOPASSWD rule to the
/tmp/ledmatrix_web_sudoers block; configure_web_sudo.sh had this rule
but the standalone installer never granted it, leaving plugin removal
broken after first-time install
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(api): resolve sudo/systemctl/reboot/poweroff paths at startup
Use shutil.which() with safe fallbacks for the four privileged binaries
instead of relying on bare names being resolved by the subprocess shell
search. Resolves paths once at module load rather than per-call.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Chuck <chuck@example.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
243 lines
11 KiB
Python
Executable File
243 lines
11 KiB
Python
Executable File
#!/usr/bin/env python3
|
|
"""
|
|
WiFi Monitor Daemon
|
|
|
|
Monitors WiFi connection status and automatically enables/disables access point mode
|
|
when there is no active WiFi connection.
|
|
"""
|
|
|
|
import sys
|
|
import time
|
|
import logging
|
|
import signal
|
|
import subprocess
|
|
from pathlib import Path
|
|
|
|
# Add project root to path (parent of scripts/utils/)
|
|
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
|
|
|
|
from src.wifi_manager import WiFiManager
|
|
|
|
# Configure logging
|
|
logging.basicConfig(
|
|
level=logging.INFO,
|
|
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
|
handlers=[
|
|
logging.StreamHandler(sys.stdout),
|
|
logging.FileHandler('/var/log/ledmatrix-wifi-monitor.log')
|
|
]
|
|
)
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
class WiFiMonitorDaemon:
|
|
"""Daemon to monitor WiFi and manage AP mode"""
|
|
|
|
def __init__(self, check_interval=30):
|
|
"""
|
|
Initialize the WiFi monitor daemon
|
|
|
|
Args:
|
|
check_interval: Seconds between WiFi status checks
|
|
"""
|
|
self.check_interval = check_interval
|
|
self.wifi_manager = WiFiManager()
|
|
self.running = True
|
|
self.last_state = None
|
|
# Counts consecutive checks where nmcli says "connected" but internet is unreachable.
|
|
# After _nm_restart_threshold failures, NetworkManager is restarted as a recovery step.
|
|
self._consecutive_internet_failures = 0
|
|
self._nm_restart_threshold = 5 # ~2.5 min at 30s interval
|
|
|
|
# Register signal handlers for graceful shutdown
|
|
signal.signal(signal.SIGINT, self._signal_handler)
|
|
signal.signal(signal.SIGTERM, self._signal_handler)
|
|
|
|
def _signal_handler(self, signum, frame):
|
|
"""Handle shutdown signals"""
|
|
logger.info(f"Received signal {signum}, shutting down...")
|
|
self.running = False
|
|
|
|
def run(self):
|
|
"""Main daemon loop"""
|
|
logger.info("WiFi Monitor Daemon started")
|
|
logger.info(f"Check interval: {self.check_interval} seconds")
|
|
|
|
# Log initial configuration
|
|
auto_enable = self.wifi_manager.config.get("auto_enable_ap_mode", True)
|
|
ap_ssid = self.wifi_manager.config.get("ap_ssid", "LEDMatrix-Setup")
|
|
logger.info(f"Configuration: auto_enable_ap_mode={auto_enable}, ap_ssid={ap_ssid}")
|
|
|
|
# Log initial status
|
|
initial_status = self.wifi_manager.get_wifi_status()
|
|
initial_ethernet = self.wifi_manager._is_ethernet_connected()
|
|
logger.info(f"Initial status: WiFi connected={initial_status.connected}, "
|
|
f"Ethernet connected={initial_ethernet}, AP active={initial_status.ap_mode_active}")
|
|
if initial_status.connected:
|
|
logger.info(f" WiFi SSID: {initial_status.ssid}, IP: {initial_status.ip_address}, Signal: {initial_status.signal}%")
|
|
|
|
while self.running:
|
|
try:
|
|
# Get current status before checking
|
|
status = self.wifi_manager.get_wifi_status()
|
|
ethernet_connected = self.wifi_manager._is_ethernet_connected()
|
|
|
|
# Check WiFi status and manage AP mode
|
|
state_changed = self.wifi_manager.check_and_manage_ap_mode()
|
|
|
|
# Get updated status after check
|
|
updated_status = self.wifi_manager.get_wifi_status()
|
|
updated_ethernet = self.wifi_manager._is_ethernet_connected()
|
|
|
|
current_state = {
|
|
'connected': updated_status.connected,
|
|
'ethernet_connected': updated_ethernet,
|
|
'ap_active': updated_status.ap_mode_active,
|
|
'ssid': updated_status.ssid
|
|
}
|
|
|
|
# Log state changes with detailed information
|
|
if current_state != self.last_state:
|
|
logger.info("=== State Change Detected ===")
|
|
if updated_status.connected:
|
|
logger.info(f"WiFi connected: {updated_status.ssid} (IP: {updated_status.ip_address}, Signal: {updated_status.signal}%)")
|
|
else:
|
|
logger.info("WiFi disconnected (no active connection)")
|
|
|
|
if updated_ethernet:
|
|
logger.info("Ethernet connected")
|
|
else:
|
|
logger.debug("Ethernet not connected")
|
|
|
|
if updated_status.ap_mode_active:
|
|
logger.info(f"AP mode ACTIVE - SSID: {ap_ssid} (IP: 192.168.4.1)")
|
|
else:
|
|
logger.debug("AP mode inactive")
|
|
|
|
if state_changed:
|
|
logger.info("AP mode state was changed by check_and_manage_ap_mode()")
|
|
|
|
logger.info("=============================")
|
|
self.last_state = current_state.copy()
|
|
else:
|
|
# Log periodic status (less verbose)
|
|
if updated_status.connected:
|
|
logger.debug(f"Status check: WiFi={updated_status.ssid} ({updated_status.signal}%), "
|
|
f"Ethernet={updated_ethernet}, AP={updated_status.ap_mode_active}")
|
|
else:
|
|
logger.debug(f"Status check: WiFi=disconnected, Ethernet={updated_ethernet}, AP={updated_status.ap_mode_active}")
|
|
|
|
# Escalating recovery: if nmcli reports connected but actual internet
|
|
# is unreachable for several consecutive checks, restart NetworkManager.
|
|
# This is done HERE (not inside check_and_manage_ap_mode) to keep the
|
|
# AP-enable trigger clean and avoid false-positive AP enables from
|
|
# transient packet loss on otherwise working WiFi.
|
|
if updated_status.connected and not updated_status.ap_mode_active:
|
|
if not self.wifi_manager.check_internet_connectivity():
|
|
self._consecutive_internet_failures += 1
|
|
logger.warning(
|
|
f"Internet unreachable despite nmcli connection "
|
|
f"({self._consecutive_internet_failures}/{self._nm_restart_threshold})"
|
|
)
|
|
if self._consecutive_internet_failures >= self._nm_restart_threshold:
|
|
logger.warning("Restarting NetworkManager to recover internet connectivity")
|
|
try:
|
|
subprocess.run(
|
|
["/usr/bin/systemctl", "restart", "NetworkManager"],
|
|
capture_output=True, timeout=20, check=True
|
|
)
|
|
self._consecutive_internet_failures = 0
|
|
# NM restart causes a brief WiFi drop; reset the AP-mode grace
|
|
# counter so that transient disconnect doesn't count toward
|
|
# triggering AP mode.
|
|
self.wifi_manager._disconnected_checks = 0
|
|
except subprocess.CalledProcessError as e:
|
|
logger.error(f"NetworkManager restart failed (rc={e.returncode}); "
|
|
"resetting failure counter to avoid tight retry loop")
|
|
self._consecutive_internet_failures = 0
|
|
except (subprocess.SubprocessError, OSError) as e:
|
|
logger.error(f"NetworkManager restart error: {e}; "
|
|
"resetting failure counter to avoid tight retry loop")
|
|
self._consecutive_internet_failures = 0
|
|
else:
|
|
self._consecutive_internet_failures = 0
|
|
else:
|
|
self._consecutive_internet_failures = 0
|
|
|
|
# Sleep until next check
|
|
time.sleep(self.check_interval)
|
|
|
|
except KeyboardInterrupt:
|
|
logger.info("Received keyboard interrupt, shutting down...")
|
|
self.running = False
|
|
break
|
|
except Exception as e:
|
|
logger.error(f"Error in monitor loop: {e}", exc_info=True)
|
|
logger.error(f"Error details - type: {type(e).__name__}, args: {e.args}")
|
|
# Log current state for debugging
|
|
try:
|
|
error_status = self.wifi_manager.get_wifi_status()
|
|
logger.error(f"State at error: WiFi={error_status.connected}, AP={error_status.ap_mode_active}")
|
|
except Exception as state_error:
|
|
logger.error(f"Could not get state at error: {state_error}")
|
|
# Continue running even if there's an error
|
|
time.sleep(self.check_interval)
|
|
|
|
logger.info("WiFi Monitor Daemon stopped")
|
|
|
|
# Ensure AP mode is disabled on shutdown if WiFi or Ethernet is connected
|
|
logger.info("Performing cleanup on shutdown...")
|
|
try:
|
|
status = self.wifi_manager.get_wifi_status()
|
|
ethernet_connected = self.wifi_manager._is_ethernet_connected()
|
|
logger.info(f"Final status: WiFi={status.connected}, Ethernet={ethernet_connected}, AP={status.ap_mode_active}")
|
|
|
|
if (status.connected or ethernet_connected) and status.ap_mode_active:
|
|
if status.connected:
|
|
logger.info(f"Disabling AP mode on shutdown (WiFi is connected to {status.ssid})")
|
|
elif ethernet_connected:
|
|
logger.info("Disabling AP mode on shutdown (Ethernet is connected)")
|
|
|
|
success, message = self.wifi_manager.disable_ap_mode()
|
|
if success:
|
|
logger.info(f"AP mode disabled successfully: {message}")
|
|
else:
|
|
logger.warning(f"Failed to disable AP mode: {message}")
|
|
else:
|
|
logger.debug("AP mode cleanup not needed (not active or no network connection)")
|
|
except Exception as e:
|
|
logger.error(f"Error during shutdown cleanup: {e}", exc_info=True)
|
|
|
|
|
|
def main():
|
|
"""Main entry point"""
|
|
import argparse
|
|
|
|
parser = argparse.ArgumentParser(description='WiFi Monitor Daemon for LED Matrix')
|
|
parser.add_argument(
|
|
'--interval',
|
|
type=int,
|
|
default=30,
|
|
help='Check interval in seconds (default: 30)'
|
|
)
|
|
parser.add_argument(
|
|
'--foreground',
|
|
action='store_true',
|
|
help='Run in foreground (for debugging)'
|
|
)
|
|
|
|
args = parser.parse_args()
|
|
|
|
daemon = WiFiMonitorDaemon(check_interval=args.interval)
|
|
|
|
try:
|
|
daemon.run()
|
|
except Exception as e:
|
|
logger.error(f"Fatal error: {e}", exc_info=True)
|
|
sys.exit(1)
|
|
|
|
|
|
if __name__ == '__main__':
|
|
main()
|
|
|