Skip to content

Deny Idle entry of CPU and CPU domain when it already has IPI pending#444

Open
smankad-oss wants to merge 3 commits intoqualcomm-linux:qcom-6.18.yfrom
smankad-oss:qcom-6.18.y
Open

Deny Idle entry of CPU and CPU domain when it already has IPI pending#444
smankad-oss wants to merge 3 commits intoqualcomm-linux:qcom-6.18.yfrom
smankad-oss:qcom-6.18.y

Conversation

@smankad-oss
Copy link
Copy Markdown

@smankad-oss smankad-oss commented Apr 9, 2026

Introduce a helper function to check for pending IPIs on CPU.

When governors used during cpuidle try to find the most optimal idle state
for a CPU or a group of CPUs, they are known to quite often fail. One
reason for this is, that they are not taking into account whether there has
been an IPI scheduled for any of the CPUs that are affected by the selected
idle state.

CPU can get IPI interrupt from another CPU while it is executing
cpuidle_select() or about to execute same. The selection do not account
for pending interrupts and may continue to enter selected idle state only
to exit immediately.
Make use of the helper function at CPUidle to deny the idle entry when there is already IPI pending.

CRs-Fixed: 4374430

Link to CPUIdle change: https://lore.kernel.org/r/20260403-cpuidle_ipi-v2-1-b3e44b032e2c@oss.qualcomm.com
Signed-off-by: Ulf Hansson ulf.hansson@linaro.org
Signed-off-by: Maulik Shah maulik.shah@oss.qualcomm.com
Signed-off-by: Sneh Mankad sneh.mankad@oss.qualcomm.com

@smankad-oss smankad-oss requested a review from a team April 9, 2026 04:32
Copy link
Copy Markdown

@shashim-quic shashim-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link: tag missing in first 2 commits. Please add them to establish clear source .

@smankad-oss
Copy link
Copy Markdown
Author

Link: tag missing in first 2 commits. Please add them to establish clear source .

Added the links Shiraz.

shashim-quic
shashim-quic previously approved these changes Apr 10, 2026
@shashim-quic
Copy link
Copy Markdown

Link: tag missing in first 2 commits. Please add them to establish clear source .

Added the links Shiraz.

Also check why qcom-6.18.y-check is failing. MAke sure you have associated appropriate changes to mainlin component.
This is to enforce mainline first policy, else you PR wont be picked for merge.

storulf and others added 3 commits April 10, 2026 10:58
When governors used during cpuidle try to find the most optimal idle state
for a CPU or a group of CPUs, they are known to quite often fail. One
reason for this is, that they are not taking into account whether there has
been an IPI scheduled for any of the CPUs that are affected by the selected
idle state.

To enable pending IPIs to be taken into account for cpuidle decisions,
introduce a new helper function, cpus_peek_for_pending_ipi().

Link: https://lore.kernel.org/all/20251105095415.17269-2-ulf.hansson@linaro.org/
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sneh Mankad <sneh.mankad@oss.qualcomm.com>
… IPIs

When the genpd governor for CPUs, tries to select the most optimal idle
state for a group of CPUs managed in a PM domain, it fails far too often.

On a Dragonboard 410c, which is an arm64 based platform with 4 CPUs in one
cluster that is using PSCI OS-initiated mode, we can observe that we often
fail when trying to enter the selected idle state. This is certainly a
suboptimal behaviour that leads to many unnecessary requests being sent to
the PSCI FW.

A simple dd operation that reads from the eMMC, to generate some IRQs and
I/O handling helps us to understand the problem, while also monitoring the
rejected counters in debugfs for the corresponding idle states of the genpd
in question.

 Menu governor:
cat /sys/kernel/debug/pm_genpd/power-domain-cluster/idle_states
State          Time Spent(ms) Usage      Rejected   Above      Below
S0             1451           437        91         149        0
S1             65194          558        149        172        0
dd if=/dev/mmcblk0 of=/dev/null bs=1M count=500
524288000 bytes (500.0MB) copied, 3.562698 seconds, 140.3MB/s
cat /sys/kernel/debug/pm_genpd/power-domain-cluster/idle_states
State          Time Spent(ms) Usage      Rejected   Above      Below
S0             2694           1073       265        892        1
S1             74567          829        561        790        0

 The dd completed in ~3.6 seconds and rejects increased with 586.

 Teo governor:
cat /sys/kernel/debug/pm_genpd/power-domain-cluster/idle_states
State          Time Spent(ms) Usage      Rejected   Above      Below
S0             4976           2096       392        1721       2
S1             160661         1893       1309       1904       0
dd if=/dev/mmcblk0 of=/dev/null bs=1M count=500
524288000 bytes (500.0MB) copied, 3.543225 seconds, 141.1MB/s
cat /sys/kernel/debug/pm_genpd/power-domain-cluster/idle_states
State          Time Spent(ms) Usage      Rejected   Above      Below
S0             5192           2194       433        1830       2
S1             167677         2891       3184       4729       0

 The dd completed in ~3.6 seconds and rejects increased with 1916.

The main reason to the above problem is pending IPIs for one of the CPUs
that is affected by the idle state that the genpd governor selected. This
leads to that the PSCI FW refuses to enter it. To improve the behaviour,
let's start to take into account pending IPIs for CPUs in the genpd
governor, hence we fallback to use the shallower per CPU idle state.

 Re-testing with this change shows a significant improved behaviour.

 - Menu governor:
cat /sys/kernel/debug/pm_genpd/power-domain-cluster/idle_states
State          Time Spent(ms) Usage      Rejected   Above      Below
S0             2556           878        19         368        1
S1             69974          596        10         152        0
dd if=/dev/mmcblk0 of=/dev/null bs=1M count=500
524288000 bytes (500.0MB) copied, 3.522010 seconds, 142.0MB/s
cat /sys/kernel/debug/pm_genpd/power-domain-cluster/idle_states
State          Time Spent(ms) Usage      Rejected   Above      Below
S0             3360           1320       28         819        1
S1             70168          710        11         267        0

 The dd completed in ~3.5 seconds and rejects increased with 10.

 - Teo governor
cat /sys/kernel/debug/pm_genpd/power-domain-cluster/idle_states
State          Time Spent(ms) Usage      Rejected   Above      Below
S0             5145           1861       39         938        1
S1             188887         3117       51         1975       0
dd if=/dev/mmcblk0 of=/dev/null bs=1M count=500
524288000 bytes (500.0MB) copied, 3.653100 seconds, 136.9MB/s
cat /sys/kernel/debug/pm_genpd/power-domain-cluster/idle_states
State          Time Spent(ms) Usage      Rejected   Above      Below
S0             5260           1923       42         1002       1
S1             190849         4033       52         2892       0

 The dd completed in ~3.7 seconds and rejects increased with 4.

Note that, the rejected counters in genpd are also being accumulated in the
rejected counters that are managed by cpuidle, yet on a per CPU idle states
basis. Comparing these counters before/after this change, through cpuidle's
sysfs interface shows the similar improvements.

Link: https://lore.kernel.org/all/20251105095415.17269-3-ulf.hansson@linaro.org/
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sneh Mankad <sneh.mankad@oss.qualcomm.com>
…t pending

CPU can get IPI interrupt from another CPU while it is executing
cpuidle_select() or about to execute same. The selection do not account
for pending interrupts and may continue to enter selected idle state only
to exit immediately.

Example trace collected when there is cross CPU IPI.

 [000] 154.892148: sched_waking: comm=sugov:4 pid=491 prio=-1 target_cpu=007
 [000] 154.892148: ipi_raise: target_mask=00000000,00000080 (Function call interrupts)
 [007] 154.892162: cpu_idle: state=2 cpu_id=7
 [007] 154.892208: cpu_idle: state=4294967295 cpu_id=7
 [007] 154.892211: irq_handler_entry: irq=2 name=IPI
 [007] 154.892211: ipi_entry: (Function call interrupts)
 [007] 154.892213: sched_wakeup: comm=sugov:4 pid=491 prio=-1 target_cpu=007
 [007] 154.892214: ipi_exit: (Function call interrupts)

This impacts performance and the above count increments.

commit ccde652 ("smp: Introduce a helper function to check for pending
IPIs") already introduced a helper function to check the pending IPIs and
it is used in pmdomain governor to deny the cluster level idle state when
there is a pending IPI on any of cluster CPUs.

This however does not stop CPU to enter CPU level idle state. Make use of
same at CPUidle to deny the idle entry when there is already IPI pending.

With change observing glmark2 [1] off screen scores improving in the range
of 25% to 30% on Qualcomm lemans-evk board which is arm64 based having two
clusters each with 4 CPUs.

[1] https://github.com/glmark2/glmark2

Link: https://lore.kernel.org/r/20260403-cpuidle_ipi-v2-1-b3e44b032e2c@oss.qualcomm.com
Signed-off-by: Maulik Shah <maulik.shah@oss.qualcomm.com>
Signed-off-by: Sneh Mankad <sneh.mankad@oss.qualcomm.com>
@smankad-oss
Copy link
Copy Markdown
Author

Link: tag missing in first 2 commits. Please add them to establish clear source .

Added the links Shiraz.

Also check why qcom-6.18.y-check is failing. MAke sure you have associated appropriate changes to mainlin component. This is to enforce mainline first policy, else you PR wont be picked for merge.

I believe it is due to no mention of CR in PR description. Have added and will re-trigger the check. Thanks for pointing out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants