Skip to content

Comments

PT-2488 - full refactor of a pt-k8s-debug-collector#1054

Open
BON4 wants to merge 31 commits into3.xfrom
PT-2448-pt-k8s-debug-collector-refactoring
Open

PT-2488 - full refactor of a pt-k8s-debug-collector#1054
BON4 wants to merge 31 commits into3.xfrom
PT-2448-pt-k8s-debug-collector-refactoring

Conversation

@BON4
Copy link

@BON4 BON4 commented Jan 9, 2026

This refactor includes replacing all of the kubectl cli calls with golang sdk for k8s. Additionaly dumper now has new structure, new logger, tar file path controll, and multithreaded approach for downloading and exporting files form multiple pods.

Resulting Archive changes

  • added cluster-scoped folder for the resources that are cluster wide.

  • in the root there is a total log file, instead of just errors.

  • The contributed code is licensed under GPL v2.0

  • Contributor Licence Agreement (CLA) is signed

  • util/update-modules has been ran
    (/lib was not changed)

  • Documentation updated

  • Test suite update

Copy link
Collaborator

@svetasmirnova svetasmirnova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just started review and underlined few obvious things.

In addition to the comment I think adding tests for new added files, such as logs.go, paths.go, etc. makes sense.

@BON4 BON4 changed the title PT-2421 - full refactor of a pt-k8s-debug-collector PT-2488 - full refactor of a pt-k8s-debug-collector Jan 15, 2026
BON4 added 7 commits January 16, 2026 09:32
This refactor includes replacing all of the `kubectl` cli calls
with golang sdk for k8s. Additionaly dumper now has new structure,
new logger, tar file path controll, and multithreaded approach
for downloading and exporting files form multiple pods.
BON4 added 2 commits February 6, 2026 19:05
This commit intorduces changes in pt-k8s-debug-collector integration
tests.
Now it is possible to test tool against already running cluster, or
to deploy all needed resources automaticaly with k3d.
@svetasmirnova
Copy link
Collaborator

When I try to open archive I receive an error:

$ kubectl get pods -n pxc
NAME                                               READY   STATUS         RESTARTS      AGE
cluster1-haproxy-0                                 2/2     Running        3 (33m ago)   49m
cluster1-haproxy-1                                 2/2     Running        0             31m
cluster1-haproxy-2                                 2/2     Running        0             28m
cluster1-pxc-0                                     3/3     Running        0             49m
cluster1-pxc-1                                     3/3     Running        0             31m
cluster1-pxc-2                                     2/3     ErrImagePull   0             17m
percona-xtradb-cluster-operator-6756dbf588-8vrfk   1/1     Running        0             51m

sveta@s76:~/src/percona/percona-toolkit/src/go/pt-k8s-debug-collector$ ../../../bin/pt-k8s-debug-collector --forwardport=3333
INFO[0000] Checking for updates                         
INFO[0000] Contacting version check API at https://v.percona.com/. Timeout set to 3s 
sveta@s76:~/src/percona/percona-toolkit/src/go/pt-k8s-debug-collector$ tar -xzf cluster-dump.tar.gz 
tar: cluster-dump/pxc/cluster1-pxc-2: Cannot open: File exists
tar: Exiting with failure status due to previous errors

Old version of pt-k8s-debug-collector works fine.

sveta@s76:~/src/percona/percona-toolkit/src/go/pt-k8s-debug-collector$ pt-k8s-debug-collector --forwardport=3333
2026/02/16 15:57:21 Start collecting cluster data
2026/02/16 15:58:01 Done
sveta@s76:~/src/percona/percona-toolkit/src/go/pt-k8s-debug-collector$ tar -xzf cluster-dump.tar.gz 
sveta@s76:~/src/percona/percona-toolkit/src/go/pt-k8s-debug-collector$ ls cluster-dump
default  errors.txt  kube-node-lease  kube-public  kube-system  nodes.yaml  pxc

@svetasmirnova
Copy link
Collaborator

You removed errors.txt file that contained errors that the tool received while were taking the dump from the top directory in the archive. I see there is dumper.log file there instead. What it is supposed to do?

@BON4
Copy link
Author

BON4 commented Feb 16, 2026

I've renamed it back to errors.txt for compatibility.

@BON4
Copy link
Author

BON4 commented Feb 16, 2026

@svetasmirnova fixed issue with tar. It was a bug that appears only if one of the pods is unreachable.

Co-authored-by: Sveta Smirnova <svetasmirnova@users.noreply.github.com>
BON4 added 2 commits February 16, 2026 16:16
…om:percona/percona-toolkit into PT-2448-pt-k8s-debug-collector-refactoring
Copy link
Collaborator

@svetasmirnova svetasmirnova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you removed fix for https://perconadev.atlassian.net/browse/PT-2299 and test for it. Can you add it back? Commit was e1390c4

}
if _, err := tw.Write(content); err != nil {
return errors.Wrapf(err, "write content to %s", location)
const CONCURRENT_EXPORT_WORKERS = 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a constant?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant constrains a number of concurrent requests to k8s API.
If you remove this semaphore, a goroutine will be created:

  • for every cluster-scoped resource
  • for every namespace * every namespace-scoped resource

In a real cluster (e.g., 100+ namespaces and 50+ resource types), this can easily result in 5,000+ concurrent goroutines.
That can lead to: Kubernetes API rate limiting, increased memory usage, timeouts, accidental self-inflicted DoS on the cluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants