PT-2488 - full refactor of a pt-k8s-debug-collector#1054
Conversation
svetasmirnova
left a comment
There was a problem hiding this comment.
I just started review and underlined few obvious things.
In addition to the comment I think adding tests for new added files, such as logs.go, paths.go, etc. makes sense.
This refactor includes replacing all of the `kubectl` cli calls with golang sdk for k8s. Additionaly dumper now has new structure, new logger, tar file path controll, and multithreaded approach for downloading and exporting files form multiple pods.
df8e2ee to
99ad957
Compare
This commit intorduces changes in pt-k8s-debug-collector integration tests. Now it is possible to test tool against already running cluster, or to deploy all needed resources automaticaly with k3d.
|
When I try to open archive I receive an error: Old version of pt-k8s-debug-collector works fine. |
|
You removed errors.txt file that contained errors that the tool received while were taking the dump from the top directory in the archive. I see there is dumper.log file there instead. What it is supposed to do? |
|
I've renamed it back to |
|
@svetasmirnova fixed issue with tar. It was a bug that appears only if one of the pods is unreachable. |
Co-authored-by: Sveta Smirnova <svetasmirnova@users.noreply.github.com>
…om:percona/percona-toolkit into PT-2448-pt-k8s-debug-collector-refactoring
svetasmirnova
left a comment
There was a problem hiding this comment.
I see you removed fix for https://perconadev.atlassian.net/browse/PT-2299 and test for it. Can you add it back? Commit was e1390c4
| } | ||
| if _, err := tw.Write(content); err != nil { | ||
| return errors.Wrapf(err, "write content to %s", location) | ||
| const CONCURRENT_EXPORT_WORKERS = 5 |
There was a problem hiding this comment.
Why is this a constant?
There was a problem hiding this comment.
This constant constrains a number of concurrent requests to k8s API.
If you remove this semaphore, a goroutine will be created:
- for every cluster-scoped resource
- for every namespace * every namespace-scoped resource
In a real cluster (e.g., 100+ namespaces and 50+ resource types), this can easily result in 5,000+ concurrent goroutines.
That can lead to: Kubernetes API rate limiting, increased memory usage, timeouts, accidental self-inflicted DoS on the cluster
This refactor includes replacing all of the
kubectlcli calls with golang sdk for k8s. Additionaly dumper now has new structure, new logger, tar file path controll, and multithreaded approach for downloading and exporting files form multiple pods.Resulting Archive changes
added
cluster-scopedfolder for the resources that are cluster wide.in the root there is a total log file, instead of just errors.
The contributed code is licensed under GPL v2.0
Contributor Licence Agreement (CLA) is signed
util/update-modules has been ran
(
/libwas not changed)Documentation updated
Test suite update