Skip to content

Conversation

@rucoder
Copy link

@rucoder rucoder commented Nov 26, 2025

Description

Remove sysconf(_SC_NPROCESSORS_CONF) validation that causes false failures when the library runs in a process with cgroup v2 CPU limits.

When cpu.max is set in cgroup v2, sysconf(_SC_NPROCESSORS_CONF) returns the cgroup-limited CPU count rather than the actual system CPU count. However, scanning /sys/devices/system/cpu always returns the real number of cores in the system, causing a mismatch and spurious check failures.

By removing the sysconf() call and relying solely on the sysfs scan, we ensure consistent CPU topology detection regardless of cgroup constraints.

Affected parts

  • library
  • pqos utility
  • rdtset utility
  • other: (please specify)

Motivation and Context

https://github.com/lf-edge/eve uses libpqos from privileged system container pkg/pillar. This is a houskeeping conitainer that creates user workloads and it is restricted only to core 0 but has full visibility of /sys/fs so the library did not initilize

How Has This Been Tested?

This was tested from EVE OS on a real device with intel RDT support

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

Remove sysconf(_SC_NPROCESSORS_CONF) validation that causes false
failures when the library runs in a process with cgroup v2 CPU limits.

When cpu.max is set in cgroup v2, sysconf(_SC_NPROCESSORS_CONF) returns
the cgroup-limited CPU count rather than the actual system CPU count.
However, scanning /sys/devices/system/cpu always returns the real number
of cores in the system, causing a mismatch and spurious check failures.

By removing the sysconf() call and relying solely on the sysfs scan, we
ensure consistent CPU topology detection regardless of cgroup constraints.

Signed-off-by: Mikhail Malyshev <[email protected]>
@rstorozh rstorozh requested a review from Copilot December 5, 2025 13:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes CPU detection failures in cgroup v2 constrained environments by removing the sysconf(_SC_NPROCESSORS_CONF) validation check that caused false positives when CPU limits were applied.

Key Changes:

  • Removed sysconf() validation that conflicted with sysfs scanning in cgroup v2 environments
  • Enhanced error messages with more specific context including the SYSTEM_CPU path
  • Added debug/error logging for CPU topology detection failures

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +260 to +262
num_cpus = scandir(SYSTEM_CPU, &namelist, filter_cpu, cpu_sort);

if (pqos_set_no_files_limit(max_core_count)) {
if (pqos_set_no_files_limit(num_cpus)) {
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scandir() result is not validated before being used in pqos_set_no_files_limit(). If scandir() fails and returns a negative value, this will be passed to pqos_set_no_files_limit() which expects a positive count. The validation at line 278 occurs after this call. Move the validation before line 262 to prevent passing invalid values.

Copilot uses AI. Check for mistakes.
num_cpus = scandir(SYSTEM_CPU, &namelist, filter_cpu, cpu_sort);
if (num_cpus <= 0 || max_core_count < num_cpus) {
LOG_ERROR("Failed to read proc cpus!\n");
if (num_cpus <= 0 ) {
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space before closing parenthesis. Remove the trailing space for consistency with code style.

Copilot uses AI. Check for mistakes.
Comment on lines +289 to +290
LOG_DEBUG("Logical core %u is offline, skipping\n", lcore);
continue;
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation uses spaces instead of consistent indentation style. The continue statement and surrounding braces at lines 288-291 should match the indentation pattern used elsewhere in the function (which appears to use 8-space indentation based on the surrounding code).

Suggested change
LOG_DEBUG("Logical core %u is offline, skipping\n", lcore);
continue;
LOG_DEBUG("Logical core %u is offline, skipping\n", lcore);
continue;

Copilot uses AI. Check for mistakes.
if (retval != PQOS_RETVAL_OK)
break;
if (retval != PQOS_RETVAL_OK) {
LOG_ERROR("Failed to get socket for lcore %u\n", lcore);
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly added error logging blocks at lines 294-297, 300-303, and 307-310 use inconsistent indentation (spaces instead of tabs). Ensure these blocks match the indentation style used in the rest of the file for consistency.

Copilot uses AI. Check for mistakes.
if (retval != PQOS_RETVAL_OK)
break;
if (retval != PQOS_RETVAL_OK) {
LOG_ERROR("Failed to get NUMA node for lcore %u\n", lcore);
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly added error logging blocks at lines 294-297, 300-303, and 307-310 use inconsistent indentation (spaces instead of tabs). Ensure these blocks match the indentation style used in the rest of the file for consistency.

Copilot uses AI. Check for mistakes.
if (retval != PQOS_RETVAL_OK)
break;
if (retval != PQOS_RETVAL_OK) {
LOG_ERROR("Failed to get cache IDs for lcore %u\n", lcore);
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly added error logging blocks at lines 294-297, 300-303, and 307-310 use inconsistent indentation (spaces instead of tabs). Ensure these blocks match the indentation style used in the rest of the file for consistency.

Copilot uses AI. Check for mistakes.
@rstorozh
Copy link
Contributor

rstorozh commented Dec 8, 2025

@rucoder Please make sure that your patch follows the project's style.
You should use the following commands for this:
$ export CHECKPATCH=<path_to_checkpatch>checkpatch.pl
$ make style
Here, is the path to Linux Kernel's checkpatch.pl utility. For example <linux_kernel_repo>/scripts/checkpatch.pl

Also make sure that unit-tests are passed:
$ cd unit-test
<intel-cmt-cat/unit-test> make clean
<intel-cmt-cat/unit-test> make
<intel-cmt-cat/unit-test> make run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants