-
Notifications
You must be signed in to change notification settings - Fork 244
feat: node-exporter into vhd build #7704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0d52887 to
60cc001
Compare
f0f23e8 to
eb8e316
Compare
68a2704 to
d494735
Compare
d494735 to
277e340
Compare
277e340 to
1ae3542
Compare
| @@ -0,0 +1,5 @@ | |||
| tls_server_config: | |||
| cert_file: "/etc/kubernetes/certs/kubeletserver.crt" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these paths aren't necessarily correct - they depend on whether kubelet serving certificate rotation is enabled - when it's disabled these paths are correct, however when it's enabled both cert_file and key_file should point towards: /var/lib/kubelet/pki/kubelet-server-current.pem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting this is copy paste from the aks-vm-extension repo and what is on my node today. I don't see any where it's touched, just a static file.
root@aks-sys-41317600-vmss000000:/etc/node-exporter.d# cat web-config.yml
tls_server_config:
cert_file: "/etc/kubernetes/certs/kubeletserver.crt"
key_file: "/etc/kubernetes/certs/kubeletserver.key"
client_auth_type: "RequireAndVerifyClientCert"
client_ca_file: "/etc/kubernetes/certs/ca.crt"
i think we could address this in the node-exporter-startup.sh and check for the existence of /var/lib/kubelet/pki/kubelet-server-current.pem and if it exists use it. And if not use /etc/kubernetes/certs/kubeletserver.crt
| @@ -0,0 +1,5 @@ | |||
| [Path] | |||
| PathModified=/var/lib/kubelet/pki/kubelet.crt | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you trying to point towards kubelet's client cert? if so this should be: /var/lib/kubelet/pki/kubelet-client-current.pem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm starting to question if this node-exporter-restart.path and node-exporter-restart.service are even needed.
https://github.com/prometheus/exporter-toolkit/blob/master/docs/web-configuration.md
its reading like node-exporter will be reading the web-config with each request so changes are picked up right away.
This is another config that is copy paste from aks-vm-extensions, and looking like lacking real ownership/maintaining.
|
|
||
| # specify tls config path, if it exists | ||
| if [ -f $PUBLIC_SETTINGS_PATH ] | ||
| then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we tend to follow the style of if [ condition ]; then
| configureNodeExporter() { | ||
| echo "Configuring Node Exporter" | ||
| # Check for skip file to determine if node-exporter was installed on this VHD | ||
| if [ ! -f /etc/node-exporter.d/skip_vhd_node_exporter ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the skip file isn't there, meaning we don't want to skip, we end up skipping? this seems backwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, rather than commit and manage an empty flag file in version control, can we instead pivot off some other node exporter asset file's existence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip file is for the extension logic as a signal that the component is agentbaker managed and extension should skip all managing.
existing nodes should have all components from extension previously. If we were to check for node-exporter running it should always be there vhd or extension install originally and we wouldn't be able to tell which. The skip file becomes a clear sign that this was vhd baked and extension should ignore.
once we get to a point where extension isn't needed anymore we can easily remove it
| mv "/opt/bin/${packageName}-${packageVersion}" "/opt/bin/${packageName}" | ||
| chmod a+x /opt/bin/${packageName} | ||
| rm -rf /opt/bin/${packageName}-* & | ||
| elif [ -f "/usr/local/bin/${packageName}-${packageVersion}" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the change here? we should be using /opt/bin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think a few things got messed up when i decided to involve master in some rebasing and not main... everyone is pretty upset by that move.
| # Check for the skip sentinel file - this is essential for e2e testing | ||
| # If this file doesn't exist, e2e tests will silently skip node-exporter validation. Which is bad. | ||
| local skip_file="/etc/node-exporter.d/skip_vhd_node_exporter" | ||
| if [ ! -f "$skip_file" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, this seems backwards
| @@ -0,0 +1,32 @@ | |||
| #!/bin/sh | |||
|
|
|||
| if [ "$(cat /etc/os-release | grep ^ID= | cut -c 4-)" = "flatcar" ]; then | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the flatcar check? it looks like we don't install node exporter on flatcar according to what's currently in install-dependencies.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah for the initial setup in AB i'm just looking at getting ubuntu and mariner working. Plan is to get all distros and this is copy paste from existing extension so eventually needed
| exit "${ret}" | ||
| } | ||
| # Limit parallel image pulls - count only image_pids, not BCC_PID | ||
| while [ "${#image_pids[@]}" -ge "$parallel_container_image_pull_limit" ]; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated changes?
| [ -n "${VHD_LOGS_FILEPATH:-}" ] && echo " - node-exporter ${NODE_EXPORTER_VERSION}-${NODE_EXPORTER_REVISION}" >> "${VHD_LOGS_FILEPATH}" | ||
|
|
||
| rm -rf "${NODE_EXPORTER_BUILD_ROOT}" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A random question. Would you prefer to write this logic in bash or golang?
What this PR does / why we need it:
this is adding node-exporter into the vhdbuild by default. At the end of the vhdbuild we systemctl disable to allow a fresh start during cse letting the node-exporter-startup script run and gather up node specific details needed.
Which issue(s) this PR fixes:
Fixes #