-
Notifications
You must be signed in to change notification settings - Fork 244
feat: node-exporter into vhd build #7704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
1ae3542
ec60291
3108c0a
d9de63e
8b28957
863a916
cde4d47
83ac99e
371cd28
87674db
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -76,6 +76,8 @@ ERR_ENABLE_MANAGED_GPU_EXPERIENCE=123 # Error confguring managed GPU experience | |
| # Error code 124 is returned when a `timeout` command times out, and --preserve-status is not specified: https://man7.org/linux/man-pages/man1/timeout.1.html | ||
| ERR_VHD_BUILD_ERROR=125 # Reserved for VHD CI exit conditions | ||
|
|
||
| ERR_NODE_EXPORTER_START_FAIL=128 # Error starting or enabling node-exporter service | ||
|
|
||
| ERR_SWAP_CREATE_FAIL=130 # Error allocating swap file | ||
| ERR_SWAP_CREATE_INSUFFICIENT_DISK_SPACE=131 # Error insufficient disk space for swap file creation | ||
|
|
||
|
|
@@ -938,10 +940,10 @@ fallbackToKubeBinaryInstall() { | |
| if [ "${SHOULD_ENFORCE_KUBE_PMC_INSTALL}" = "true" ]; then | ||
| echo "Kube PMC install is enforced, skipping fallback to kube binary install for ${packageName}" | ||
| return 1 | ||
| elif [ -f "/opt/bin/${packageName}-${packageVersion}" ]; then | ||
| mv "/opt/bin/${packageName}-${packageVersion}" "/opt/bin/${packageName}" | ||
| chmod a+x /opt/bin/${packageName} | ||
| rm -rf /opt/bin/${packageName}-* & | ||
| elif [ -f "/usr/local/bin/${packageName}-${packageVersion}" ]; then | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why the change here? we should be using
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah i think a few things got messed up when i decided to involve |
||
| mv "/usr/local/bin/${packageName}-${packageVersion}" "/usr/local/bin/${packageName}" | ||
| chmod a+x /usr/local/bin/${packageName} | ||
| rm -rf /usr/local/bin/${packageName}-* & | ||
| return 0 | ||
| else | ||
| echo "No binary fallback found for ${packageName} version ${packageVersion}" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| tls_server_config: | ||
| cert_file: "/etc/kubernetes/certs/kubeletserver.crt" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these paths aren't necessarily correct - they depend on whether kubelet serving certificate rotation is enabled - when it's disabled these paths are correct, however when it's enabled both
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. interesting this is copy paste from the aks-vm-extension repo and what is on my node today. I don't see any where it's touched, just a static file. i think we could address this in the node-exporter-startup.sh and check for the existence of |
||
| key_file: "/etc/kubernetes/certs/kubeletserver.key" | ||
| client_auth_type: "RequireAndVerifyClientCert" | ||
| client_ca_file: "/etc/kubernetes/certs/ca.crt" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| [Path] | ||
| # Watch server cert paths - one will exist depending on whether kubelet serving cert rotation is enabled | ||
| # Rotation enabled: kubelet-server-current.pem (symlink updated on rotation) | ||
| # Rotation disabled: kubeletserver.crt (static cert) | ||
| PathModified=/var/lib/kubelet/pki/kubelet-server-current.pem | ||
| PathModified=/etc/kubernetes/certs/kubeletserver.crt | ||
|
|
||
| [Install] | ||
| WantedBy=multi-user.target |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| [Service] | ||
| Type=OneShot | ||
| ExecStart=/bin/systemctl restart node-exporter.service |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| [Unit] | ||
| Description=Prometheus Node Exporter | ||
| Documentation=https://github.com/prometheus/node_exporter | ||
|
|
||
| [Service] | ||
| ExecStart=/opt/bin/node-exporter-startup.sh | ||
|
|
||
| Restart=on-failure | ||
| RestartSec=10 | ||
|
|
||
| [Install] | ||
| WantedBy=multi-user.target |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| #!/bin/sh | ||
|
|
||
| if [ "$(cat /etc/os-release | grep ^ID= | cut -c 4-)" = "flatcar" ]; then | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need the flatcar check? it looks like we don't install node exporter on flatcar according to what's currently in install-dependencies.sh
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah for the initial setup in AB i'm just looking at getting ubuntu and mariner working. Plan is to get all distros and this is copy paste from existing extension so eventually needed |
||
| NODE_IP=$(ip -o -4 addr show dev eth0 | awk '{print $4}' | cut -d '/' -f 1) | ||
| else | ||
| NODE_IP=$(hostname -I | awk '{print $1}') | ||
| fi | ||
|
|
||
| TLS_CONFIG_PATH="/etc/node-exporter.d/web-config.yml" | ||
| TLS_CONFIG_ARG="" | ||
| KUBELET_DEFAULTS="/etc/default/kubelet" | ||
|
|
||
| # Detect TLS cert paths from kubelet configuration | ||
| # Priority: rotation cert > static cert paths from kubelet flags > skip TLS | ||
| CERT_FILE="" | ||
| KEY_FILE="" | ||
|
|
||
| # Check for rotation cert first (used when --rotate-server-certificates=true) | ||
| if [ -f "/var/lib/kubelet/pki/kubelet-server-current.pem" ]; then | ||
| CERT_FILE="/var/lib/kubelet/pki/kubelet-server-current.pem" | ||
| KEY_FILE="/var/lib/kubelet/pki/kubelet-server-current.pem" | ||
| elif [ -f "$KUBELET_DEFAULTS" ]; then | ||
| # Parse kubelet flags for static cert paths | ||
| KUBELET_FLAGS=$(grep "^KUBELET_FLAGS=" "$KUBELET_DEFAULTS" | cut -d'=' -f2-) | ||
| TLS_CERT=$(echo "$KUBELET_FLAGS" | grep -o '\--tls-cert-file=[^ ]*' | cut -d'=' -f2) | ||
| TLS_KEY=$(echo "$KUBELET_FLAGS" | grep -o '\--tls-private-key-file=[^ ]*' | cut -d'=' -f2) | ||
|
|
||
| if [ -n "$TLS_CERT" ] && [ -n "$TLS_KEY" ] && [ -f "$TLS_CERT" ] && [ -f "$TLS_KEY" ]; then | ||
| CERT_FILE="$TLS_CERT" | ||
| KEY_FILE="$TLS_KEY" | ||
| fi | ||
| fi | ||
|
|
||
| # Configure TLS if we found valid cert paths | ||
| if [ -n "$CERT_FILE" ] && [ -n "$KEY_FILE" ]; then | ||
| cat > "$TLS_CONFIG_PATH" <<EOF | ||
| tls_server_config: | ||
| cert_file: "$CERT_FILE" | ||
| key_file: "$KEY_FILE" | ||
| client_auth_type: "RequireAndVerifyClientCert" | ||
| client_ca_file: "/etc/kubernetes/certs/ca.crt" | ||
| EOF | ||
| TLS_CONFIG_ARG="--web.config.file=${TLS_CONFIG_PATH}" | ||
| fi | ||
|
|
||
| exec /opt/bin/node-exporter \ | ||
| --web.listen-address=${NODE_IP}:19100 \ | ||
| ${TLS_CONFIG_ARG} \ | ||
| --no-collector.wifi \ | ||
| --no-collector.hwmon \ | ||
| --collector.cpu.info \ | ||
| --collector.filesystem.mount-points-exclude="^/(dev|proc|sys|run/containerd/.+|var/lib/docker/.+|var/lib/kubelet/.+)($|/)" \ | ||
| --collector.netclass.ignored-devices="^(azv.*|veth.*|[a-f0-9]{15})$" \ | ||
| --collector.netclass.netlink \ | ||
| --collector.netdev.device-exclude="^(azv.*|veth.*|[a-f0-9]{15})$" \ | ||
| --no-collector.arp.netlink | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the skip file isn't there, meaning we don't want to skip, we end up skipping? this seems backwards
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, rather than commit and manage an empty flag file in version control, can we instead pivot off some other node exporter asset file's existence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip file is for the extension logic as a signal that the component is agentbaker managed and extension should skip all managing.
existing nodes should have all components from extension previously. If we were to check for node-exporter running it should always be there vhd or extension install originally and we wouldn't be able to tell which. The skip file becomes a clear sign that this was vhd baked and extension should ignore.
once we get to a point where extension isn't needed anymore we can easily remove it