Raspberry Pi 5: Rare system freeze correlated with kernel log flooding and USB/UART buffer exhaustion

## Description

On a Raspberry Pi 5, the system can become fully unresponsive under certain workload conditions.  
When this happens:

- SSH becomes unreachable
- most userland processes and containers stop responding
- openHAB may still update some internal state
- Homematic device communication ceases entirely
- only a hard power-cycle restores operation

The behavior does **not** resemble an immediate kernel panic, out-of-memory kill, or userspace crash.  
Instead, it is preceded by repeated kernel log messages that appear to originate from a UART/driver path, followed by systemd-journald warnings and eventual stall.

---

## Observed Behavior

In every recorded instance of this issue, shortly before the unresponsiveness, the kernel log shows a repeating pattern similar to:

```text
raw-uart raw-uart1: generic_raw_uart_handle_rx_char(): rx fifo full
eq3loop: eq3loop_write_master() mmd_hmip: not enough space in the buffers
eq3loop: eq3loop_write_master() return error
```
Followed by journald messages indicating overrun and watchdog timeouts:
```text
systemd-journald: /dev/kmsg buffer overrun, some messages lost
systemd-journald.service: Watchdog timeout
systemd-journald.service: Killing process systemd-journal
```
After these messages, thousands of kernel messages are lost, journald is restarted, and the system eventually becomes largely unresponsive.


## Expected Behavior
- Under heavy load, or in the presence of occasional driver buffer exhaustion, the system should remain responsive and recover gracefully. Specifically:
- driver buffer exhaustion should be handled without log flooding
- kernel log streams should be rate limited to avoid overwhelming the logging subsystem
- unrelated kernel subsystems (network, scheduling, file systems) should not be impacted by issues in a single hardware driver path
- the system should remain responsive even under high I/O pressure


## Workloads Observed
- The issue has been observed under the following conditions:
- Raspberry Pi 5 with NVMe root filesystem
- RaspberryMatic running inside Docker
- Other containers including openHAB, InfluxDB, Frigate
- NVMe subjected to high sustained I/O (e.g., video timeline scrub, large historical data queries)
- RF traffic concurrently active via a USB-connected Homematic stick
The problem is rare and not reliably reproducible; it occurs under combinations of high I/O and sustained driver activity.


## System information
- Platform: Raspberry Pi 5 (ARM64 / aarch64)
- Kernel: 6.6.x+rpt-rpi
- OS: Raspberry Pi OS 64-bit
- Root filesystem on NVMe (Pineboards AI Bundle M-Key)
- Docker engine (dockerd / containerd)
- Homematic RF stick connected directly by USB
- Other RF devices on a powered USB hub


## Full journal logs around the freeze events:
Attached logs show consistent patterns across multiple freeze events:
[dmesg_after_reboot.log](https://github.com/user-attachments/files/24274134/dmesg_after_reboot.log)
[journal_prevboot.log](https://github.com/user-attachments/files/24274133/journal_prevboot.log)
[kernel.log](https://github.com/user-attachments/files/24274130/kernel.log)
[lastlog.txt](https://github.com/user-attachments/files/24274131/lastlog.txt)
[lsmod.log](https://github.com/user-attachments/files/24274132/lsmod.log)
[wtmp.txt](https://github.com/user-attachments/files/24274135/wtmp.txt)
These logs consistently contain the repeating uart/eq3loop buffer exhaustion messages preceding the journal overruns and subsequent system stall.

## Additional context
- USB power supply issues have been tested and ruled out using both passive and active hubs
- No undervoltage or CPU throttling flags detected via firmware telemetry
- The system is normally stable for extended periods (weeks–months) between incidents
- The issue does not appear tied to specific userland containers or services alone

## Additional observations

I have observed two closely related failure modes:

1. A complete system hang requiring a hard power cycle
2. A partial failure where Homematic commands are delayed by ~15–30 seconds,
   followed by execution of multiple queued commands almost simultaneously

In the second case, the system remains partially responsive for a short time
(SSH sometimes still works, non-Homematic services continue to operate),
but the Homematic communication path appears stalled.

In at least one incident, this delayed state escalated into an automatic reboot
rather than a permanent freeze. This suggests a kernel-level failure that
progresses over time rather than an immediate hard lockup.

Shortly before both types of failures, kernel logs show sustained flooding of:
```
raw-uart: generic_raw_uart_handle_rx_char(): rx fifo full
eq3loop: not enough space in the buffers
```
followed by `/dev/kmsg` buffer overruns and journald watchdog events.

This indicates that the delayed Homematic behavior may be an early warning sign
of the same kernel-level issue that later results in a full system hang or reboot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Raspberry Pi 5: Rare system freeze correlated with kernel log flooding and USB/UART buffer exhaustion #7184

Description

Observed Behavior

Expected Behavior

Workloads Observed

System information

Full journal logs around the freeze events:

Additional context

Additional observations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Raspberry Pi 5: Rare system freeze correlated with kernel log flooding and USB/UART buffer exhaustion #7184

Description

Description

Observed Behavior

Expected Behavior

Workloads Observed

System information

Full journal logs around the freeze events:

Additional context

Additional observations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions