K8SPG-943: check operator panic in e2e test#1441
K8SPG-943: check operator panic in e2e test#1441mayankshah1607 wants to merge 4 commits intomainfrom
Conversation
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
There was a problem hiding this comment.
Pull request overview
This PR adds operator panic detection to E2E test cleanup phases. The change introduces a new check_operator_panic function that searches operator logs for panic messages and calls this function before destroying the operator in test cleanup scripts.
Changes:
- Added
check_operator_panicfunction toe2e-tests/functionsthat checks operator logs for "Observed a panic" messages - Updated 29 E2E test cleanup files to call
check_operator_panicbeforedestroy_operator
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| e2e-tests/functions | Adds new check_operator_panic function to detect panics in operator logs |
| e2e-tests/tests//99-.yaml | Updates 29 test cleanup scripts to check for operator panics before destroying the operator |
| check_operator_panic() { | ||
| local operator_pod=$(get_operator_pod) | ||
| if kubectl logs -n "${OPERATOR_NS:-$NAMESPACE}" "$operator_pod" -c operator | grep -q "Observed a panic"; then | ||
| echo "Detected panic in operator" | ||
| exit 1 | ||
| fi | ||
| } |
There was a problem hiding this comment.
The function doesn't verify that the operator pod exists before checking for panics. If get_operator_pod returns an empty string (no operator pod found), the kubectl logs command will fail, but the error will be masked by the pipeline to grep, causing the function to silently succeed without checking for panics.
Add validation that operator_pod is non-empty before attempting to retrieve logs. Consider also handling the case where kubectl logs fails due to the pod not existing or not being ready yet.
| local operator_pod=$(get_operator_pod) | ||
| if kubectl logs -n "${OPERATOR_NS:-$NAMESPACE}" "$operator_pod" -c operator | grep -q "Observed a panic"; then |
There was a problem hiding this comment.
- Problem:
check_operator_paniccan silently succeed even when it fails to actually check logs (e.g.,get_operator_podreturns empty/multiple pod names, orkubectl logsfails), because thekubectl logs | grep -qpipeline is used withoutpipefailand without validating$operator_pod. - Why it matters: This creates false negatives, which defeats the purpose of adding a panic check to E2E cleanup steps.
- Fix: Validate that exactly one operator pod name is resolved; if none (or more than one), fail with a clear message. Also treat log retrieval failures as errors (enable
set -o pipefailfor the pipeline or checkkubectl logsexit status before grepping).
| local operator_pod=$(get_operator_pod) | |
| if kubectl logs -n "${OPERATOR_NS:-$NAMESPACE}" "$operator_pod" -c operator | grep -q "Observed a panic"; then | |
| local operator_pod | |
| operator_pod=$(get_operator_pod) | |
| # Ensure we have exactly one operator pod to check. | |
| if [[ -z "$operator_pod" ]]; then | |
| echo "Failed to detect operator pod: no operator pod found in namespace '${OPERATOR_NS:-$NAMESPACE}'" >&2 | |
| exit 1 | |
| fi | |
| # Split the result into an array to detect multiple pods. | |
| local pods_array=() | |
| # shellcheck disable=SC2206 | |
| pods_array=($operator_pod) | |
| if [[ ${#pods_array[@]} -ne 1 ]]; then | |
| echo "Failed to detect operator pod: expected 1 pod, got ${#pods_array[@]}: $operator_pod" >&2 | |
| exit 1 | |
| fi | |
| operator_pod="${pods_array[0]}" | |
| # Retrieve logs and fail if we cannot get them. | |
| local logs | |
| if ! logs=$(kubectl logs -n "${OPERATOR_NS:-$NAMESPACE}" "$operator_pod" -c operator 2>&1); then | |
| echo "Failed to retrieve logs for operator pod '$operator_pod' in namespace '${OPERATOR_NS:-$NAMESPACE}'" >&2 | |
| echo "$logs" >&2 | |
| exit 1 | |
| fi | |
| if grep -q "Observed a panic" <<<"$logs"; then |
commit: 1418e39 |
CHANGE DESCRIPTION
Problem:
Short explanation of the problem.
Cause:
Short explanation of the root cause of the issue if applicable.
Solution:
Short explanation of the solution we are providing with this PR.
CHECKLIST
Jira
Needs Doc) and QA (Needs QA)?Tests
Config/Logging/Testability