Improve first-time install error diagnostics and resilience (#369)

* fix(install): don't let outer ERR trap mask first_time_install.sh failures

set +e alone doesn't suppress bash's ERR trap, so any non-zero exit from
first_time_install.sh inside the one-shot installer immediately triggered
the outer on_error handler with a generic "Main installation, line 370"
message — before the script could report the real exit code or point to
logs/. Suspend the trap for that block so the existing if/else handling
runs instead.

* feat(install): surface root cause of web dependency install failures

install_dependencies_apt.py previously reported only which packages
failed, not why - the actual apt/pip error was discarded (apt) or
could scroll out of the on_error log tail (pip), leaving "Step 7:
Install web interface dependencies (line 915)" as the only visible
detail.

Capture command output for each install attempt and print a compact
DEPENDENCY INSTALLATION FAILURES summary with the last lines of error
output per package. Also run the installer with `python3 -u` for
real-time, correctly-ordered logging, and widen the on_error tail from
50 to 100 lines so the summary isn't cut off.

* feat(install): harden first-time install against common Pi failure modes

- wait_for_apt_lock: apt_update/apt_install now wait (up to 3min) for
  unattended-upgrades to release the dpkg lock instead of failing
  outright with "Command failed after 3 attempts" right after first boot.
- check_disk_space: new pre-flight check (Step 1) so a full SD card fails
  fast with a clear message instead of a cryptic mid-build error.
- Step 6: wrap rpi-rgb-led-matrix git clone/submodule operations in retry
  for resilience to transient network issues.
- Step 6: capture `pip install .` build output and print the last 50
  lines on failure, so the actual cmake/compiler error is visible instead
  of just "Failed to install rpi-rgb-led-matrix Python package".

* fix(install): bound subprocess output and dedupe apt update in dependency installer

Address coderabbitai review on PR #369:
- _run() now streams combined stdout/stderr to a temp file and returns
  only the last ERROR_TAIL_LINES lines, instead of buffering full
  output in memory (Codacy also flagged the previous capture_output
  call as a subprocess-without-static-string security issue; the new
  call is annotated as safe since cmd is built from hardcoded args).
- `apt update` now runs once in main() instead of once per package
  needing an apt fallback.

* fix(install): suppress remaining Codacy subprocess false-positive

Codacy's Semgrep-based check still flagged the cmd-built subprocess.run
call as "without a static string" even with the Bandit nosec applied.
Add a nosemgrep marker alongside it - cmd is always a hardcoded
apt/pip argument list, never user input.

* fix(install): correctly detect already-installed dateutil/websocket-client

Address remaining coderabbitai findings on PR #369:
- check_package_installed() did __import__(package_name) directly, but
  python-dateutil and websocket-client import as dateutil/websocket. Both
  always failed the "already installed" check and were reinstalled on
  every run. Add an IMPORT_NAME_MAP for the mismatched names.
- _run() still read the entire temp file into memory before slicing the
  tail. Stream it line-by-line into a deque(maxlen=ERROR_TAIL_LINES)
  instead so memory use stays bounded for very chatty commands.

---------

Co-authored-by: Chuck <chuck@example.com>
This commit is contained in:
Chuck
2026-06-11 18:12:35 -04:00
committed by GitHub
parent cf28a8c0d5
commit 5beef0aa01
3 changed files with 213 additions and 94 deletions

View File

@@ -340,9 +340,14 @@ main() {
echo ""
# Execute with proper error handling and non-interactive mode
# Temporarily disable errexit to capture exit code instead of exiting immediately
# Temporarily disable errexit AND the ERR trap to capture exit code instead of
# exiting immediately. `set +e` alone does not suppress the ERR trap, so without
# `trap '' ERR` a non-zero exit from first_time_install.sh would trigger on_error
# here with the generic "Main installation" message instead of the detailed
# if/else handling below.
set +e
trap '' ERR
# Check /tmp permissions - only fix if actually wrong (common in automated scenarios)
# When running manually, /tmp usually has correct permissions (1777)
TMP_PERMS=$(stat -c '%a' /tmp 2>/dev/null || echo "unknown")
@@ -370,6 +375,7 @@ main() {
sudo -E env TMPDIR=/tmp LEDMATRIX_ASSUME_YES=1 bash ./first_time_install.sh -y </dev/null
fi
INSTALL_EXIT_CODE=$?
trap 'on_error $LINENO' ERR # Re-enable ERR trap
set -e # Re-enable errexit
if [ $INSTALL_EXIT_CODE -eq 0 ]; then