Skip to content

Raspberry Pi 5: Rare system freeze correlated with kernel log flooding and USB/UART buffer exhaustion #7184

@Boldfor

Description

@Boldfor

Description

On a Raspberry Pi 5, the system can become fully unresponsive under certain workload conditions.
When this happens:

  • SSH becomes unreachable
  • most userland processes and containers stop responding
  • openHAB may still update some internal state
  • Homematic device communication ceases entirely
  • only a hard power-cycle restores operation

The behavior does not resemble an immediate kernel panic, out-of-memory kill, or userspace crash.
Instead, it is preceded by repeated kernel log messages that appear to originate from a UART/driver path, followed by systemd-journald warnings and eventual stall.


Observed Behavior

In every recorded instance of this issue, shortly before the unresponsiveness, the kernel log shows a repeating pattern similar to:

raw-uart raw-uart1: generic_raw_uart_handle_rx_char(): rx fifo full
eq3loop: eq3loop_write_master() mmd_hmip: not enough space in the buffers
eq3loop: eq3loop_write_master() return error

Followed by journald messages indicating overrun and watchdog timeouts:

systemd-journald: /dev/kmsg buffer overrun, some messages lost
systemd-journald.service: Watchdog timeout
systemd-journald.service: Killing process systemd-journal

After these messages, thousands of kernel messages are lost, journald is restarted, and the system eventually becomes largely unresponsive.

Expected Behavior

  • Under heavy load, or in the presence of occasional driver buffer exhaustion, the system should remain responsive and recover gracefully. Specifically:
  • driver buffer exhaustion should be handled without log flooding
  • kernel log streams should be rate limited to avoid overwhelming the logging subsystem
  • unrelated kernel subsystems (network, scheduling, file systems) should not be impacted by issues in a single hardware driver path
  • the system should remain responsive even under high I/O pressure

Workloads Observed

  • The issue has been observed under the following conditions:
  • Raspberry Pi 5 with NVMe root filesystem
  • RaspberryMatic running inside Docker
  • Other containers including openHAB, InfluxDB, Frigate
  • NVMe subjected to high sustained I/O (e.g., video timeline scrub, large historical data queries)
  • RF traffic concurrently active via a USB-connected Homematic stick
    The problem is rare and not reliably reproducible; it occurs under combinations of high I/O and sustained driver activity.

System information

  • Platform: Raspberry Pi 5 (ARM64 / aarch64)
  • Kernel: 6.6.x+rpt-rpi
  • OS: Raspberry Pi OS 64-bit
  • Root filesystem on NVMe (Pineboards AI Bundle M-Key)
  • Docker engine (dockerd / containerd)
  • Homematic RF stick connected directly by USB
  • Other RF devices on a powered USB hub

Full journal logs around the freeze events:

Attached logs show consistent patterns across multiple freeze events:
dmesg_after_reboot.log
journal_prevboot.log
kernel.log
lastlog.txt
lsmod.log
wtmp.txt
These logs consistently contain the repeating uart/eq3loop buffer exhaustion messages preceding the journal overruns and subsequent system stall.

Additional context

  • USB power supply issues have been tested and ruled out using both passive and active hubs
  • No undervoltage or CPU throttling flags detected via firmware telemetry
  • The system is normally stable for extended periods (weeks–months) between incidents
  • The issue does not appear tied to specific userland containers or services alone

Additional observations

I have observed two closely related failure modes:

  1. A complete system hang requiring a hard power cycle
  2. A partial failure where Homematic commands are delayed by ~15–30 seconds,
    followed by execution of multiple queued commands almost simultaneously

In the second case, the system remains partially responsive for a short time
(SSH sometimes still works, non-Homematic services continue to operate),
but the Homematic communication path appears stalled.

In at least one incident, this delayed state escalated into an automatic reboot
rather than a permanent freeze. This suggests a kernel-level failure that
progresses over time rather than an immediate hard lockup.

Shortly before both types of failures, kernel logs show sustained flooding of:

raw-uart: generic_raw_uart_handle_rx_char(): rx fifo full
eq3loop: not enough space in the buffers

followed by /dev/kmsg buffer overruns and journald watchdog events.

This indicates that the delayed Homematic behavior may be an early warning sign
of the same kernel-level issue that later results in a full system hang or reboot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions