Skip to content

tyk-operator-conf secret creation via post-install hook breaks atomic deploys and prevents adding operator to existing installations #440

@m4s-b3n

Description

@m4s-b3n

tyk-operator-conf secret creation via post-install hook breaks atomic deploys and prevents adding operator to existing installations

Describe the bug

The tyk-operator-conf secret is created by the tyk-bootstrap chart via a post-install Helm hook. This design causes two significant issues:

  1. Atomic deploys fail: When using helm install --atomic, the tyk-operator deployment starts before the post-install hook runs, causing pods to fail because tyk-operator-conf secret doesn't exist yet. This results in a failed atomic install.

  2. Cannot add operator to existing installation: Since post-install hooks only run during helm install (not helm upgrade), users who initially deployed without the operator cannot later enable it - the bootstrap job won't run again to create the required secret.

Expected behavior

  • Atomic deploys should work when global.components.operator: true
  • Users should be able to enable the operator on existing installations via helm upgrade

Current behavior

  • Operator pods fail on initial install until post-install hook completes (breaks --atomic)
  • Enabling operator on existing installation requires manually creating the tyk-operator-conf secret

Steps to reproduce

Scenario 1 - Atomic install failure

helm install tyk-stack tyk-helm/tyk-stack -n tyk --atomic \
  --set global.components.operator=true \
  --set global.secrets.useSecretName=my-tyk-secrets
  # ... other values

Result: Install fails because operator pods can't start without the tyk-operator-conf secret.

Note: Even though the operator license is provided via global.secrets.useSecretName, the bootstrap job still creates a separate tyk-operator-conf secret that the operator deployment requires.

Scenario 2 - Adding operator to existing installation

# Initial install without operator
helm install tyk-stack tyk-helm/tyk-stack -n tyk \
  --set global.components.operator=false \
  --set global.secrets.useSecretName=my-tyk-secrets

# Later, try to enable operator
helm upgrade tyk-stack tyk-helm/tyk-stack -n tyk \
  --set global.components.operator=true \
  --set global.secrets.useSecretName=my-tyk-secrets

Result: Operator pods fail because tyk-operator-conf secret is never created (post-install hook doesn't run on upgrade).

The secret provided via global.secrets.useSecretName contains the operator license key, but the bootstrap job that reads this and creates tyk-operator-conf only runs on initial install.

Suggested solutions

  1. Use pre-install,pre-upgrade hook instead of post-install: Change the bootstrap job to run as a pre-install,pre-upgrade hook. This ensures the tyk-operator-conf secret exists before the operator deployment is created. The challenge here is that the bootstrap job currently waits for Dashboard to be ready to fetch TYK_AUTH and TYK_ORG - this would need to be redesigned (e.g., create secret with known values from global.secrets.useSecretName rather than fetching from Dashboard API).

  2. Create the secret via a pre-install hook with values from the user-provided secret: Since users already provide credentials via global.secrets.useSecretName, a pre-install hook could create tyk-operator-conf by copying relevant values from that secret, rather than bootstrapping Dashboard first.

  3. Document the limitation: At minimum, document that:

    • --atomic installs are not supported when operator is enabled
    • Adding operator to existing installations requires manual secret creation

Environment

  • Helm chart version: tyk-stack (latest)
  • Kubernetes version: N/A (design issue)

Additional context

The core issue is that the bootstrap job runs as a post-install hook, meaning it executes after all resources (including the operator deployment) are created. The operator deployment immediately fails because it references a secret that doesn't exist yet.

The tyk-k8s-bootstrap application has logic to detect existing organizations and skip recreation, so idempotency is already handled. However, the fundamental timing issue (post vs pre) needs to be addressed for atomic deploys to work.

For scenario 2 (adding operator to existing installation), adding post-upgrade to the hook would help, but only if combined with solution 3 (optional secretRef) to handle the initial pod failure gracefully.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingexternal

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions