Skip to content

[fix issue 1888] by adding host checks#1889

Open
Ashwin-Prabhakar wants to merge 1 commit intoqualcomm-linux:masterfrom
Ashwin-Prabhakar:fix-cc1plus-RAM-exhausion-1888
Open

[fix issue 1888] by adding host checks#1889
Ashwin-Prabhakar wants to merge 1 commit intoqualcomm-linux:masterfrom
Ashwin-Prabhakar:fix-cc1plus-RAM-exhausion-1888

Conversation

@Ashwin-Prabhakar
Copy link
Copy Markdown

While building large recipes on a laptop that meets all the host requirements, some recipes fail to build with cc1plus failure due to RAM exhaustion. This fixes it by explicitly adding pressure and thread count.

While building large recipes on a laptop that meets all the host requirements, some recipes fail to build with cc1plus
failure due to RAM exhaustion. This fixes it by explicitly adding pressure and thread count.
Comment thread ci/base.yml
host: |
CPU_COUNT = "${@oe.utils.cpu_count(at_least=2)}"
THREAD_COUNT = "${@oe.utils.cpu_count(at_least=2, at_most=20)}"
BB_NUMBER_THREADS ?= "${CPU_COUNT}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for this is oe.utils.cpu_count(), why do we need to change?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are correct. this is not needed

Copy link
Copy Markdown

@sjakki88 sjakki88 Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have dynamic configs which can work both developer and CI machines, if we have 20 thread Cap IN high end configuration it will increase the build time.

CPU_COUNT = "${@oe.utils.cpu_count(at_least=2)}"

Detect system RAM (in GB)

RAM_GB = "${@int(oe.utils.total_memory()/1024/1024)}"

Allow one thread per 3GB RAM

SAFE_THREADS = "${@min(${CPU_COUNT}, int(${RAM_GB} / 3))}"

BB_NUMBER_THREADS = "${SAFE_THREADS}"
PARALLEL_MAKE = "-j ${SAFE_THREADS}"`

Example outcomes

Machine	                             RESULT
64 cores / 64 GB RAM	      21 threads
64 cores / 128 GB RAM	      42 threads
32 cores / 32 GB RAM	      10 threads

Comment thread ci/base.yml
CPU_COUNT = "${@oe.utils.cpu_count(at_least=2)}"
THREAD_COUNT = "${@oe.utils.cpu_count(at_least=2, at_most=20)}"
BB_NUMBER_THREADS ?= "${CPU_COUNT}"
BB_NUMBER_PARSE_THREADS ?= "${CPU_COUNT}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default valueu for this is multiprocessing.cpu_count() or os.cpu_count(), why do we need to change?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled this from qli1.7 and this is also not necessary.

Comment thread ci/base.yml
BB_PRESSURE_MAX_CPU = "900000"
BB_PRESSURE_MAX_IO = "900000"
BB_PRESSURE_MAX_MEMORY = "900000"
PARALLEL_MAKE ?= "-j ${THREAD_COUNT} -l ${THREAD_COUNT}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for this is -j oe.utils.cpu_count(), why do we need to change?

Do you know why -l isn't used in bitbake? Is there some restriction in the make version?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only change needed are THREAD_COUNT, BB_PRESSURE_MAX_CPU, BB_PRESSURE_MAX_IO, BB_PRESSURE_MAX_IO and PARALLEL_MAKE. I am not sure why -l isn't used.

THREAD_COUNT = "${@oe.utils.cpu_count(at_least=2, at_most=20)}"
BB_PRESSURE_MAX_CPU = "900000"
BB_PRESSURE_MAX_IO = "900000"
BB_PRESSURE_MAX_MEMORY = "900000"
PARALLEL_MAKE ?= "-j ${THREAD_COUNT} -l ${THREAD_COUNT}"

The above alone can be sufficient or may be we can try and add BB_LOADFACTOR_MAX variable that monitors max system load and pauses new task execution if threshold is exceeded. Let me know your preferred approach I will modify my change accordingly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-l is supposed to be a load average float, not a cpu count.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since https://git.yoctoproject.org/poky-contrib/commit/?h=rpurdie/wipqueue4&id=d66a327fb6189db5de8bc489859235dcba306237 was never merged, the jobserver load balancing happens per recipe, not per build. And from my experiments with it, the jobserver will only look at load after it has spawned the first batch of jobs, making it almost useless for preventing peak loads.
It has been a few years since I looked at it, maybe GNU make has fixed their jobserver since.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can leave meta‑qcom unchanged and introduce a distro‑level build policy in meta‑qcom‑distro via qcom-distro-build-policy.conf. This does not impact the current build behavior which seems to work well in most cases. We can then introduce an opt in variable, BB_PRESSURE_PROFILE, which can be set to "safety" in local.conf. When this is enabled, we activate a safety mode that uses Linux PSI back‑pressure to prioritize build completion and host stability, helping reduce out of mem failures while trading off build time.
Would like your thoughts on this approach.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not against using the BB pressure feature. but we need to understand clearly how it works, and how to tweak it. we will not merge a config that helps 'on your laptop' only ;-) but if we find a config that is good/better for everyone then sure, we should.

we are seeing lots of spurious issues even on our builders where builds just got aborted, so we might have issues already..

we need to prove what config to use.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clear. Did not know that we were also facing sporadic failures in our builders.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to clang and rust!

@vkraleti
Copy link
Copy Markdown
Contributor

vkraleti commented Apr 7, 2026

Missing SoB in commit message. Limit each line in commit message to 72-75 char.

Comment thread ci/base.yml
WATCHDOG_RUNTIME_SEC:pn-systemd = "30"
host: |
CPU_COUNT = "${@oe.utils.cpu_count(at_least=2)}"
THREAD_COUNT = "${@oe.utils.cpu_count(at_least=2, at_most=20)}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it limited by 20?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same limit was also set in qli1.7. I thought it was reasonable to setting it to use ~70% of total threads available in my system than the build quietly failing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this is not just about your machine, since we are setting that for everyone here. our gitub runners have 64 core, and we have some build machines with 192 cores.. we cannot hardcode max to be 20.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, is there a reasonable % of number of threads that we can set so that all recipes will safely build ? say like 90% of the total available threads.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using the linux pressure output. that way we let the system monitor itself and adjust. i've started the build of this PR, so that we can see what the pressure settings do in the log.

Comment thread ci/base.yml
BB_NUMBER_PARSE_THREADS ?= "${CPU_COUNT}"
BB_PRESSURE_MAX_CPU = "900000"
BB_PRESSURE_MAX_IO = "900000"
BB_PRESSURE_MAX_MEMORY = "900000"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How were these values selected?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is 90% of linux pressure stall information. /proc/pressure/cpu, / proc/pressure/cpu,ip,memory information is monitored and 1000000 is the max value and setting this to a reasonable 90% of the max. This was also taken from qli1.7 defaults.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants