Skip to content

Conversation

@chmill-zz
Copy link
Contributor

What this PR does / why we need it:

this is adding node-exporter into the vhdbuild by default. At the end of the vhdbuild we systemctl disable to allow a fresh start during cse letting the node-exporter-startup script run and gather up node specific details needed.

Which issue(s) this PR fixes:

Fixes #

@chmill-zz chmill-zz changed the title Nodeexportershift feat: node-exporter into vhd build Jan 22, 2026
@github-actions github-actions bot added the components This pull request updates cached components on Linux or Windows VHDs label Jan 22, 2026
@chmill-zz chmill-zz force-pushed the nodeexportershift branch 2 times, most recently from 68a2704 to d494735 Compare January 23, 2026 21:59
@@ -0,0 +1,5 @@
tls_server_config:
cert_file: "/etc/kubernetes/certs/kubeletserver.crt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these paths aren't necessarily correct - they depend on whether kubelet serving certificate rotation is enabled - when it's disabled these paths are correct, however when it's enabled both cert_file and key_file should point towards: /var/lib/kubelet/pki/kubelet-server-current.pem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting this is copy paste from the aks-vm-extension repo and what is on my node today. I don't see any where it's touched, just a static file.

root@aks-sys-41317600-vmss000000:/etc/node-exporter.d# cat web-config.yml
tls_server_config:
  cert_file: "/etc/kubernetes/certs/kubeletserver.crt"
  key_file: "/etc/kubernetes/certs/kubeletserver.key"
  client_auth_type: "RequireAndVerifyClientCert"
  client_ca_file: "/etc/kubernetes/certs/ca.crt"

i think we could address this in the node-exporter-startup.sh and check for the existence of /var/lib/kubelet/pki/kubelet-server-current.pem and if it exists use it. And if not use /etc/kubernetes/certs/kubeletserver.crt

@@ -0,0 +1,5 @@
[Path]
PathModified=/var/lib/kubelet/pki/kubelet.crt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you trying to point towards kubelet's client cert? if so this should be: /var/lib/kubelet/pki/kubelet-client-current.pem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm starting to question if this node-exporter-restart.path and node-exporter-restart.service are even needed.

https://github.com/prometheus/exporter-toolkit/blob/master/docs/web-configuration.md

its reading like node-exporter will be reading the web-config with each request so changes are picked up right away.

This is another config that is copy paste from aks-vm-extensions, and looking like lacking real ownership/maintaining.


# specify tls config path, if it exists
if [ -f $PUBLIC_SETTINGS_PATH ]
then
Copy link
Contributor

@cameronmeissner cameronmeissner Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we tend to follow the style of if [ condition ]; then

configureNodeExporter() {
echo "Configuring Node Exporter"
# Check for skip file to determine if node-exporter was installed on this VHD
if [ ! -f /etc/node-exporter.d/skip_vhd_node_exporter ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the skip file isn't there, meaning we don't want to skip, we end up skipping? this seems backwards

Copy link
Contributor

@cameronmeissner cameronmeissner Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, rather than commit and manage an empty flag file in version control, can we instead pivot off some other node exporter asset file's existence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip file is for the extension logic as a signal that the component is agentbaker managed and extension should skip all managing.

existing nodes should have all components from extension previously. If we were to check for node-exporter running it should always be there vhd or extension install originally and we wouldn't be able to tell which. The skip file becomes a clear sign that this was vhd baked and extension should ignore.

once we get to a point where extension isn't needed anymore we can easily remove it

mv "/opt/bin/${packageName}-${packageVersion}" "/opt/bin/${packageName}"
chmod a+x /opt/bin/${packageName}
rm -rf /opt/bin/${packageName}-* &
elif [ -f "/usr/local/bin/${packageName}-${packageVersion}" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change here? we should be using /opt/bin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think a few things got messed up when i decided to involve master in some rebasing and not main... everyone is pretty upset by that move.

# Check for the skip sentinel file - this is essential for e2e testing
# If this file doesn't exist, e2e tests will silently skip node-exporter validation. Which is bad.
local skip_file="/etc/node-exporter.d/skip_vhd_node_exporter"
if [ ! -f "$skip_file" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this seems backwards

@@ -0,0 +1,32 @@
#!/bin/sh

if [ "$(cat /etc/os-release | grep ^ID= | cut -c 4-)" = "flatcar" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the flatcar check? it looks like we don't install node exporter on flatcar according to what's currently in install-dependencies.sh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah for the initial setup in AB i'm just looking at getting ubuntu and mariner working. Plan is to get all distros and this is copy paste from existing extension so eventually needed

exit "${ret}"
}
# Limit parallel image pulls - count only image_pids, not BCC_PID
while [ "${#image_pids[@]}" -ge "$parallel_container_image_pull_limit" ]; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated changes?

[ -n "${VHD_LOGS_FILEPATH:-}" ] && echo " - node-exporter ${NODE_EXPORTER_VERSION}-${NODE_EXPORTER_REVISION}" >> "${VHD_LOGS_FILEPATH}"

rm -rf "${NODE_EXPORTER_BUILD_ROOT}"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A random question. Would you prefer to write this logic in bash or golang?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

components This pull request updates cached components on Linux or Windows VHDs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants