Skip to content

Conversation

@pdamianov-dev
Copy link
Contributor

This is related to the creation of a suitable framework to run pressure tests against node images as part of Node Hardening efforts.

Supporting documents:

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “CRI pressure” execution path that can modify kubelet flags via a privileged DaemonSet before running ClusterLoader2, enabling node pressure testing for node hardening work.

Changes:

  • Switches the k8s-resource-pressure topology to use a new cri-pressure engine template.
  • Adds a new pipeline step template that applies a kubelet-modifying DaemonSet before running the benchmark.
  • Extends cri.py with a modify-kubelet subcommand and adds DaemonSet-manifest generation in clusterloader2/utils.py.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

File Description
steps/topology/k8s-resource-pressure/execute-clusterloader2.yml Routes the topology to the new cri-pressure execution template.
steps/engine/clusterloader2/cri-pressure/execute.yml Adds a pre-benchmark step to apply a kubelet config updater DaemonSet, then runs CL2.
modules/python/clusterloader2/utils.py Adds DaemonSet YAML generation for updating kubelet flags (but also modifies imports/initialization).
modules/python/clusterloader2/cri/cri.py Adds modify-kubelet CLI command and wires it to the DaemonSet generator.

export DS=$(PYTHONPATH=$PYTHONPATH:$(pwd) python3 $PYTHON_SCRIPT_FILE modify-kubelet \
--custom_kubelet_config "$CUSTOM_KUBELET_CONFIG")

printf '%s' "$DS" | kubectl apply -f -
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After applying the kubelet-config-updater DaemonSet, the pipeline proceeds immediately. Add an explicit wait (e.g., kubectl rollout status ds/kubelet-config-updater -n kube-system --timeout=...) so the benchmark doesn’t start before kubelet restarts/flag updates take effect.

Suggested change
printf '%s' "$DS" | kubectl apply -f -
printf '%s' "$DS" | kubectl apply -f -
kubectl rollout status ds/kubelet-config-updater -n kube-system --timeout=10m

Copilot uses AI. Check for mistakes.
Comment on lines +390 to +392
parser_modify_kubelet.add_argument(
"--custom_kubelet_config", type=str, help="Custom kubelet flags in string format"
)
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--custom_kubelet_config is optional, but modify-kubelet depends on it to generate the DaemonSet. Make this argument required (or validate non-empty and exit with a clear error) to avoid applying a manifest with an empty/None flags string.

Copilot uses AI. Check for mistakes.
Comment on lines +386 to +390
# Sub-command for modify-kubelet
parser_modify_kubelet = subparsers.add_parser(
"modify-kubelet", help="Add custom flags to kubelet and apply via daemonset"
)
parser_modify_kubelet.add_argument(
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing unit tests covering main() subcommands in modules/python/tests/test_cri.py, but the new modify-kubelet subcommand isn’t covered. Add a test that exercises arg parsing and verifies it invokes the right handler (and ideally validates the manifest output path).

Copilot uses AI. Check for mistakes.
Comment on lines 1 to 5
from xml.dom import minidom
import json
import os
import docker
from clients.docker_client import DockerClient
from utils.logger_config import get_logger, setup_logging

setup_logging()
logger = get_logger(__name__)

Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imports / module-level initialization for Docker + logging were removed, but this module still references DockerClient, docker.errors.ContainerError, and logger (e.g., in run_cl2_command and process_cl2_reports). This will raise NameError at runtime when running clusterloader2. Re-introduce the needed imports and define logger (or switch these functions to use logging.getLogger(__name__) and ensure docker/DockerClient are imported).

Copilot uses AI. Check for mistakes.
Comment on lines +209 to +213
labels:
app: kubelet-config-updater
spec:
hostPID: true
containers:
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This DaemonSet is privileged with host mounts but has no node scoping. As written it can run on every node and modify kubelet configuration cluster-wide. Add a nodeSelector/affinity (e.g., cri-resource-consume=true) and at minimum constrain to Linux nodes to reduce blast radius.

Copilot uses AI. Check for mistakes.
Comment on lines +222 to +226
echo "Updating kubelet configuration..."
export CUSTOM_EVICTION_FLAGS="{custom_kubelet_flags}"
export kubelet_flags_eof_key="--node-ip"
for kvp in $CUSTOM_EVICTION_FLAGS; do
key=${{kvp%%=*}}
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom_kubelet_flags is interpolated directly into a shell script (export CUSTOM_EVICTION_FLAGS="..."). If the input contains quotes/newlines or shell metacharacters, it can break the manifest or lead to unintended command execution on nodes. Prefer passing the flags as data (e.g., base64-encode and decode in the container, or use single-quoted YAML + robust escaping) and validate the expected --key=value format before generating the manifest.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants