plat-qcom: Add diagnostic logging support by zelvam95 · Pull Request #7 · qualcomm-linux/optee_os

zelvam95 · 2026-04-04T17:20:23Z

This patch series introduces a diagnostic ring buffer driver for Qualcomm
platforms, enabling persistent early boot logging that survives warm resets
and facilitates post-mortem analysis of system failures.

The implementation is structured as two patches:

Core driver with self-contained memory management
Platform enablement for Kodiak and Lemans with optimized memory layout

Key capabilities:

Persistent circular buffer in IMEM survives warm resets
Automatic overflow handling with wrap counter
Download mode detection preserves crash diagnostics
Self-contained driver with automatic memory registration
Seamless OP-TEE trace subsystem integration

Memory layout:

Kodiak: Buffer at 0x14696000 (within 100KB IMEM)
Lemans: Buffer at 0x146AF000 (within 200KB IMEM)
Common: Metadata at 0x14680720, TCSR at 0x1FD3000

This implementation enables comprehensive system debugging by providing
persistent logs accessible to bootloaders and diagnostic tools across
system resets.

b49020 · 2026-04-06T08:57:28Z

Memory layout:

Kodiak: Buffer at 0x14696000 (within 100KB IMEM)

Lemans: Buffer at 0x146AF000 (within 200KB IMEM)

Common: Metadata at 0x14680720, TCSR at 0x1FD3000

@zelvam95 how is this allocation done on IMEM for the diag buffer? 0x14680000 is actually the load address for TF-A BL2/ U-Boot SPL on Lemans platform as TZ image. I believe there is overlap going on here.

zelvam95 · 2026-04-06T08:59:29Z

Memory layout:

Kodiak: Buffer at 0x14696000 (within 100KB IMEM)

Lemans: Buffer at 0x146AF000 (within 200KB IMEM)

Common: Metadata at 0x14680720, TCSR at 0x1FD3000

@zelvam95 how is this allocation done on IMEM for the diag buffer? 0x14680000 is actually the load address for TF-A BL2/ U-Boot SPL on Lemans platform as TZ image. I believe there is overlap going on here.

Hi @b49020,

I referred to QTEE Code & in QTEE, this region is being used for DIAG Buffer. Its reserved/marked down for that purpose. Have we changed the layout/do we have a separate layout for our current open source boot loader stack?

b49020 · 2026-04-06T09:13:19Z

Memory layout:

Kodiak: Buffer at 0x14696000 (within 100KB IMEM)

Lemans: Buffer at 0x146AF000 (within 200KB IMEM)

Common: Metadata at 0x14680720, TCSR at 0x1FD3000

@zelvam95 how is this allocation done on IMEM for the diag buffer? 0x14680000 is actually the load address for TF-A BL2/ U-Boot SPL on Lemans platform as TZ image. I believe there is overlap going on here.

Hi @b49020,

I referred to QTEE Code & in QTEE, this region is being used for DIAG Buffer. Its reserved/marked down for that purpose. Have we changed the layout/do we have a separate layout for our current open source boot loader stack?

This DIAG buffer is internal to QTEE memory layout where it is put on IMEM I think. How does the crashdump flow gets to know this address? Is it hard coded in the tooling somewhere for each SoC?

zelvam95 · 2026-04-06T09:41:27Z

Memory layout:

Kodiak: Buffer at 0x14696000 (within 100KB IMEM)

Lemans: Buffer at 0x146AF000 (within 200KB IMEM)

Common: Metadata at 0x14680720, TCSR at 0x1FD3000

@zelvam95 how is this allocation done on IMEM for the diag buffer? 0x14680000 is actually the load address for TF-A BL2/ U-Boot SPL on Lemans platform as TZ image. I believe there is overlap going on here.

Hi @b49020,
I referred to QTEE Code & in QTEE, this region is being used for DIAG Buffer. Its reserved/marked down for that purpose. Have we changed the layout/do we have a separate layout for our current open source boot loader stack?

This DIAG buffer is internal to QTEE memory layout where it is put on IMEM I think. How does the crashdump flow gets to know this address? Is it hard coded in the tooling somewhere for each SoC?

Its not hard coded; Its rather retrieved from another IMEM address. For QTEE DIAG Log, they write the DIAG Buffer start address at offset 0x720 & crashscope/tools read that offset, understand where the DIAG Log is present and then does the parsing.

b49020 · 2026-04-06T09:59:03Z

Memory layout:

Kodiak: Buffer at 0x14696000 (within 100KB IMEM)

Lemans: Buffer at 0x146AF000 (within 200KB IMEM)

Common: Metadata at 0x14680720, TCSR at 0x1FD3000

@zelvam95 how is this allocation done on IMEM for the diag buffer? 0x14680000 is actually the load address for TF-A BL2/ U-Boot SPL on Lemans platform as TZ image. I believe there is overlap going on here.

Hi @b49020,
I referred to QTEE Code & in QTEE, this region is being used for DIAG Buffer. Its reserved/marked down for that purpose. Have we changed the layout/do we have a separate layout for our current open source boot loader stack?

This DIAG buffer is internal to QTEE memory layout where it is put on IMEM I think. How does the crashdump flow gets to know this address? Is it hard coded in the tooling somewhere for each SoC?

Its not hard coded; Its rather retrieved from another IMEM address. For QTEE DIAG Log, they write the DIAG Buffer start address at offset 0x720 & crashscope/tools read that offset, understand where the DIAG Log is present and then does the parsing.

Does that mean whole of IMEM is reserved for the crashdump tooling? Isn't XBL using any of the IMEM regions too? I think the memory layout has to be studied further. Would you be able to put the memory layout on a confluence page such that we can have further alignment?

zelvam95 · 2026-04-07T18:51:29Z

Kodiak:

100K starting 0x14680000

Start Address Page Size Region
14680000 24K Monitor Code/Data
14686000 20K Reserved
1468B000 40K - 64 bytes QSEE Code/Data
14694FBC 0 bytes Dummy cookie to check overlap. New cookies will be added below this cookie.
14694FC0 4 bytes Device Storage Type
14694FC4 48 bytes Measured Boot Info
14694FF4 4 bytes Sec Mem Info
14694FF8 8 bytes STACK_CANARY_INFO
14695000 4K OEM_IMEM
14696000 12K TZ Diag

Lemans:

200K starting 0x14680000

Start Address Page Size Region
14680000 24K Monitor Code/Data
14686000 36K Reserved
146A8000 24K - 20 bytes QSEE Code/Data
146ADFEC 4 bytes Dummy cookie to check overlay
146ADFF0 4 bytes Device Storage Type
146ADFF4 4 bytes Sec Mem Info
146ADFF8 8 bytes STACK_CANARY_INFO
146AE000 4K OEM_IMEM
146AF000 12K TZ Diag

zelvam95 · 2026-04-07T18:55:36Z

Hi @b49020,

This info is documented and available internally already. I have shared high level split above. The OCIMEM is split across different sub systems for different use-cases as required. We can review this and align on this.

b49020 · 2026-04-08T05:37:37Z

@zelvam95 thanks for gathering that info, I have updated TF-A BL2 for Lemans to load and execute from pIMEM instead here: qualcomm-linux/trusted-firmware-a#15. Feel free to test your changes with that.

b49020 · 2026-04-08T05:38:54Z

Also, TF-A BL2 on Kodiak was already executing from pIMEM as you can see here: https://github.com/qualcomm-linux/trusted-firmware-a/blob/qcom-next/plat/qti/hoya/kodiak/rb3gen2/inc/platform_def.h#L15

core/drivers/qcom_diag_log.c

ldts · 2026-04-10T16:04:59Z

core/drivers/qcom_diag_log.c

+	return (reg_value == DLOAD_MAGIC_COOKIE);
+}
+
+void qcom_diag_log_init(void)


should this function return an error code?

Even if we return error code, all we can do is log it in calling place... I think since we're already logging here during successful init + magic is updated to indicate any initialization failure which can be used by tooling/HLOS drivers as required, so I think we needn't necessarily return an error here.

Let me know if you feel its better to return for any reason? & we can plan accordingly;

yes, at least do generate a WARN on diag being configured but not available.
Tracing errors is important on secure platforms.

core/arch/arm/plat-qcom/main.c

core/arch/arm/plat-qcom/platform_config.h

core/drivers/qcom_diag_log.c

core/include/drivers/qcom_diag_log.h

ldts · 2026-04-11T08:12:39Z

core/drivers/qcom_diag_log.c

+#define DLOAD_MAGIC_COOKIE	0x10       /* Download mode detection value */
+
+/*
+ * struct diag - Main diagnostic region structure


no need for these comments. IMO we should use standard semantics for the circular buffer instead of these definitions.

Would you mind using the usual head, tail? maybe check how linux implements a simple one (include/linux/circ_buf.h)

struct diag { uint32_t magic; uint32_t ring_len; /* must be power of 2 */ uint32_t head; /* write index, monotonically increasing */ uint32_t tail; /* read index, monotonically increasing */ uint32_t overrun; struct mutex lock; uint8_t log_buf[]; };

could we also provide a reader or it makes no sense?

Otherwise, if we are not going to provide a reader lets not call them overruns and we dont need the tail...they are just wraps I think?

for (p = str; *p; p++) { head = diag->head++; if (head && !(head & (diag->ring_len - 1))) diag->wrap++; diag->log_buf[head & (diag->ring_len - 1)] = *p; } dsb(); mutex_unlock(&diag->lock);

Reader not required. version was suggested during wider discussion with SSG Team.

This log driver would be used across arch + there will be tooling built around this. We don't need mutex around this. Mutex is there in the calling place for diag_log_putc (plat_trace_ext_puts). Similarly tail is not required.

ring_offset & ring_len also have a purpose & it will be used by HLOS/tooling infra built around this. We can potentially introduce more variables in the future version before log_buf & ring_offset would print to exact start of log_buf.

Also, ring_len gives out the log buf size which can be used by tooling infra.

This struct was aligned on based on discussion with SSG Team & we plan to re-use same infra/structure across TF-A & OP-TEE. I got a comment to make both version & magic as uint32_t which I will be updating.

I dont understand your points

the diag region is global memory. every write access to it must be protected by a mutex. the mutex must belong to the structure.

the identifiers/names in the structure seem to come from a school primer on computer science. we can/should do better. if there is no reader this is not really a ring buffer, it is just a circular buffer (at least for me).

These sort of definitions seem more appropriate to me (layout may vary)

struct circ_wo_buf { uint8_t *buf; size_t size; unsigned int head; unsigned int wrap; struct mutex lock; };

struct diag_buf { uint16_t version; uint 16_t magic struct circ_wo_buf wo_cbuf; /*maybe also *wo_cbuf in case address varies with version */ }

Hi Jorge,

Reg. variable naming/structure definition:

I was clarifying on the layout as to why its required and about the tooling requirements downstream for retaining a particular layout.

Also, the current struct variables like offset, len etc. namings are already used in BCM's implementation which is available upstream. Its a local variable there & I have made it as part of the header, thats the main diff between QCOM vs BCM's impl..

I feel different persons might have different perspective/preferences in this. Do you feel the BCM's impl is not upto the mark and their naming is also like from a school primer? I used that as reference since thats the only implementation available upstream which uses this plat_trace_ext_puts.

Anyways, I am open to make minor modifications as suggested above with nested structure + slight name changes..

Reg. mutex - Your concern about global memory access is valid in general, but in this specific case:

The trace infrastructure already provides serialization: All calls to qcom_diag_log_puts() go through trace_ext_puts() which ensures serialized access.

Single controlled entry point: Unlike general-purpose drivers, this has a single entry point through the trace infrastructure, which acts as the serialization gatekeeper.

This follows the established pattern in OP-TEE:

BCM's elog implementation (plat-bcm/bcm_elog.c) uses the same approach: global memory buffer, no mutex, relies on trace infrastructure

Note: Would request you to please check the trace_ext_puts function -> where plat_trace_ext_puts is called & also check the BCM's implementation of log driver where a similar diag log impl is already available and upstreamed.

struct circ_wo_buf { unsigned int buf_off; size_t size; unsigned int head; unsigned int wrap; uint8_t buf[]; };

struct diag_buf { uint32_t version; uint32_t magic; struct circ_wo_buf wo_cbuf; };

Planning to update above way now. We're using "flexible array" at the end & so can't update as uint8_t* buf at the start. buf_off is to give the "offset" to buf[]. In future, even if we add new fields, the external toolings etc would read the buf_off to identify buf[] start and so wouldn't have issues.

Let me know what you think of this struct + mutex requirements.

WRT to the lock, yes, drop it from the circular buffer definition - I didnt notice we were using the trace infrastructure (I should have clone the repo and read it)

WRT to the names, yes, do use head for semantics and define a circular buffer explicitly.
These are not slight name changes - these are software semantics that make the code readable - the distance between the word "offset" or "head" is similar to the difference between an atom and a horse.

I am not part of the SSG discussions on the structure defintions, but do not use a flat layout - use abstractions...ie see below for another example....

struct diag_hdr { uint16_t version; uint16_t magic; }; struct circ_buf_conf { uint32_t base; uint32_t size; }; struct circ_wo_buf { uint32_t wrap; uint32_t head; uint8_t buf[]; }; struct diag { struct diag_hdr hdr; struct circ_buf_conf conf; struct circ_wo_buf wo_cbuf; };

Yes, I am using trace infra. Mutex part is clear now :).

Reg. structure variable naming:
I used "BCMs" implementation of the log driver as reference for this & in their implementation if you check, you'd notice they are using "offset" instead of "head" for a very similar log driver that they have. I assumed what's already upstream available can be used as a standard for reference but probably not all changes upstream follow these software semantics. I mentioned them as slight name changes because otherwise if it was considered as part of sw semantics, then they should have knocked down the BCMs impl when it was posted for review?

Note - I am just trying to highlight the point of reference - that it was already avail upstream.

Anyways, You can check the latest patch where I have modified to using head instead of offset. I will incrementally do changes as suggested above with abstractions & post same for review.

yeah, don't rely on what's upstream :) as you know (better than I do!) we have many platforms to upstream so the better our abstractions (the more indirections) the easier will be to extend and maintain.

Add diagnostic ring buffer driver for Qualcomm platforms to enable persistent logging across system resets and crashes. The driver provides a circular buffer in reserved memory that persists across warm resets, allowing post-mortem analysis of system state before crashes or unexpected resets. Key features: - Circular ring buffer with wraparound tracking - Download mode detection to preserve logs during crash dumps Signed-off-by: Selvam Sathappan Periakaruppan <speriaka@qti.qualcomm.com>

Platform-specific target headers with memory configurations for Kodiak (0x14696000) and Lemans (0x146AF000) Signed-off-by: Selvam Sathappan Periakaruppan <speriaka@qti.qualcomm.com>

Enable the diagnostic ring buffer driver for Kodiak and Lemans platforms. Integration points: - plat_trace_ext_puts(): Routes trace output to diagnostic buffer - plat_console_init(): Initializes diagnostic buffer before UART console - CFG_QCOM_DIAG_LOG: Enabled in debug builds (CFG_TEE_CORE_DEBUG=y) The diagnostic buffer is initialized early in the boot process to capture logs from the earliest stages, providing valuable debugging information for system crashes and unexpected resets. Signed-off-by: Selvam Sathappan Periakaruppan <speriaka@qti.qualcomm.com>

ldts · 2026-04-13T07:47:02Z

core/drivers/qcom/diag/diag_log.c

+
+	for (p = str; *p; p++) {
+		diag->wo_cbuf.buf[diag->wo_cbuf.head++] = *p;
+		if (diag->wo_cbuf.head >= diag->wo_cbuf.size) {


it is safer to mask rather than this sort of checks to avoid corrupting memory on glitches.

we can use this sort of construct as long as we can guarantee the size of the buffer is ^2 I think. then there is zero risk or running over the memory

for (p = str; *p; p++) { head = diag->head++; if (head && !(head & (diag->wo_cbuf.size - 1))) diag->wrap++; diag->log_buf[head & (diag->wo_cbuf.size - 1)] = *p; }

Unfortunately, we cannot guarantee power-of-2 as size for these buffers. There are many existing platforms where this buffer size is not power of 2. Also, We have some header which we have to reduce from DIAG SIZE. So, lets say even if DIAG Size overall is 8KB (power of 2). If we subtract the header. Then the actual circ buffer size would become 4KB.

In our current case, the DIAG size is 12KB
Wasted space: Rounds down to power-of-2 (e.g., 12KB → 8KB usable)
~33% loss: For 12KB DIAG_SIZE, loses 4KB

For 8KB diag log we'd lose 50% and so on.

This region is ideally expected to be access protected (write access is restricted to AP-S). So, would it be safe to assume SEL1/SEL3 will not corrupt this? Or at least the risk is comparatively low? (Though ideally it is possible in some cases?)

um, ok...it is a shame that the buffers cant be a power of 2.

ldts · 2026-04-13T08:29:32Z

Please remove the driver initialization from main.c - just initialize it in the driver itself (driver_init, early_init or whatever is appropiate).
main should just implement plat_trace_ext_puts.

then qcom_diag_log_puts should keep a static pointer to a valid diag. And we shouldnt need to check for the magic every time...

zelvam95 · 2026-04-13T10:25:44Z

Please remove the driver initialization from main.c - just initialize it in the driver itself (driver_init, early_init or whatever is appropiate). main should just implement plat_trace_ext_puts.

I would ideally prefer if this diag log is available as early as possible & I assumed that it'd be good to have this at same point of console init. Even BCM's implementation which is upstream, follows exact same approach where its init is done from main.c.

In PRODUCTION -> Console logs would be disabled & DIAG Logs could be our only way for debug & ideally it'd be better to have this enabled as early as possible to capture everything?

then qcom_diag_log_puts should keep a static pointer to a valid diag. And we shouldnt need to check for the magic every time...

This part of the comment I will check further & try to handle this.

ldts · 2026-04-13T11:29:06Z

Please remove the driver initialization from main.c - just initialize it in the driver itself (driver_init, early_init or whatever is appropiate). main should just implement plat_trace_ext_puts.

I would ideally prefer if this diag log is available as early as possible & I assumed that it'd be good to have this at same point of console init. Even BCM's implementation which is upstream, follows exact same approach where its init is done from main.c.

In PRODUCTION -> Console logs would be disabled & DIAG Logs could be our only way for debug & ideally it'd be better to have this enabled as early as possible to capture everything?

then qcom_diag_log_puts should keep a static pointer to a valid diag. And we shouldnt need to check for the magic every time...

This part of the comment I will check further & try to handle this.

so why a driver? why not just a log feature...drivers (except the console) typically have the syscalls to decide when to load.

@b49020 also do we need to integrate the tracing feature with the FFA console? I dont see any hooks there...do you know why?

b49020 · 2026-04-13T12:02:25Z

@b49020 also do we need to integrate the tracing feature with the FFA console?

There isn't anything like FF-A console since there isn't any ABI requirement with TF-A there. But as you said for this diag feature, let's add support in the common platform code rather than a driver since it's not really running in a driver context but all the OP-TEE contexts.

ldts · 2026-04-13T12:07:15Z

@b49020 also do we need to integrate the tracing feature with the FFA console?

There isn't anything like FF-A console since there isn't any ABI requirement with TF-A there. But as you said for this diag feature, let's add support in the common platform code rather than a driver since it's not really running in a driver context but all the OP-TEE contexts.

there is exactly an FF-A console (as an alternative to the console or the semihosting console)
OP-TEE/optee_os@d4a8769

none provide the hooks to trace.

b49020 · 2026-04-13T12:21:48Z

there is exactly an FF-A console (as an alternative to the console or the semihosting console)

Ah I see, that's news to me. Not sure how that FF-A console manages sync among different SPs as each can clobber other console logs. Maybe there is SP identifier.

Anyhow, let's not tie this with FF-A since this feature is expected to work on platforms which don't support FF-A too. BTW, this is related to Qualcomm crashdump tooling as well, so we need to support diagnostic logs that the tooling understands during a crash.

Also, thinking about it more the driver vs platform code. I think similar to UART drivers, let's keep the diag as a special driver code.

b49020 · 2026-04-13T12:23:02Z

core/drivers/qcom/diag/kodiak/diag_target.h

+#ifndef DIAG_TARGET_H
+#define DIAG_TARGET_H
+
+#define IMEM_BASE		UL(0x14680000)


Let's move this macro to platform specific config file instead.

b49020 · 2026-04-13T12:24:25Z

core/drivers/qcom/diag/kodiak/diag_target.h

+
+#define TCSR_BOOT_MISC_DETECT	UL(0x1FD3000)
+
+#define DIAG_BASE		UL(0x14696000)


Let's move this macro to platform specific config file too. After that all other macros can be moved to common header and avoid any duplication.

ldts · 2026-04-13T12:41:50Z

Also, thinking about it more the driver vs platform code. I think similar to UART drivers, let's keep the diag as a special driver code.

I dont think it should be a driver...it is just writing raw to a qcom specific memory region (no protocols, no registers, no locking, nothing...just init it early and it is good to go.)

zelvam95 · 2026-04-13T16:13:20Z

Also, thinking about it more the driver vs platform code. I think similar to UART drivers, let's keep the diag as a special driver code.

I dont think it should be a driver...it is just writing raw to a qcom specific memory region (no protocols, no registers, no locking, nothing...just init it early and it is good to go.)

I originally thought its similar to UART driver, and kept it here; But as Jorge pointed out, it doesn't really have any registers + no locking/etc. as well & the current logging drivers avail in upstream also has put it under plat-*/ and not drivers/... So I think moving it there makes sense & planning to do that change.

@b49020, Hope thats fine. If incase there are any stronger reasons to put it under driver, please do let us know so we can plan accordingly;

ldts · 2026-04-14T07:34:53Z

just for future reference: OP-TEE/optee_os#7775

zelvam95 force-pushed the feature/optee-diag-ring-buffer branch from cd68754 to 995ec2b Compare April 4, 2026 17:36

b49020 requested review from b49020 and ldts April 6, 2026 07:16

b49020 force-pushed the qcom-next branch from b8aa854 to 8038358 Compare April 10, 2026 08:15