Add support for CPU overcommit

Caffeine currently implicitly assumes that on a given target node it can access sufficient CPU resources to dedicate at least one CPU core (or at least one hardware thread) per image in a multi-image run. If a user decides to run more images on a node than there are available physical cores (i.e. "CPU overcommit") then performance might degrade. Running with CPU overcommit is probably never recommended in production runs, but it's a practice that can occasionally be useful when using a laptop/workstation to debug defects that only arise at larger image scales.

There are things that can be done at the runtime level to detect CPU overcommit (in many cases). The actual impact of overcommit on system performance is a complicated topic that depends many details including the communication transports in use and the OS process scheduling policies. Once detected (or directed by a user setting) that we are in an overcommit scenario, adjustments can be made (e.g. in some busy-wait synchronization algorithms) to provide more friendly sharing of core resources, to hopefully avoid some of the worst performance penalties in heavy overcommit scenarios.

This issue exists to track progress on this topic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for CPU overcommit #222

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for CPU overcommit #222

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions