feat(clone): this allows you to use the dotcms image as an init container and clone an another dotCMS env #34443

wezell · 2026-01-29T19:08:13Z

Below is documentation:

dotCMS Environment Cloning

This feature allows you to use the dotCMS Docker image as an init container to clone data and assets from another running dotCMS environment. This is useful for:

Setting up development environments from production/staging
Creating test environments with real data
Initializing new dotCMS instances with existing content

How It Works

The 10-import-env.sh script runs at container startup (before dotCMS itself starts) and:

Downloads the database backup from the source environment via the Maintenance API
Downloads the assets archive from the source environment
Imports the database into PostgreSQL
Extracts assets to the shared data directory
Exits the container so it can be restarted to run dotCMS.

When the script exits cleanly and is restarted, it will skip re-importing and run dotCMS. This enables the "init container" pattern in Kubernetes where the import runs once, then the main dotCMS container starts with the imported data.

Environment Variables

Required

Variable	Description
`DOT_IMPORT_ENVIRONMENT`	URL of the source dotCMS environment (e.g., `https://demo.dotcms.com`)
`DOT_IMPORT_API_TOKEN`	API token for authentication (Bearer token)
`DOT_IMPORT_USERNAME_PASSWORD`	Alternative: Basic auth credentials in `user:password` format

Note: Either DOT_IMPORT_API_TOKEN or DOT_IMPORT_USERNAME_PASSWORD is required.

Database Connection

Variable	Description
`DB_BASE_URL`	JDBC URL for target PostgreSQL (e.g., `jdbc:postgresql://host/dbname`)
`DB_USERNAME`	PostgreSQL username
`DB_PASSWORD`	PostgreSQL password

Optional

Variable	Default	Description
`DOT_IMPORT_DROP_DB`	`false`	Drop existing database schema before import
`DOT_IMPORT_MAX_ASSET_SIZE`	`100mb`	Maximum asset file size to download
`DOT_IMPORT_ALL_ASSETS`	`false`	Include non-live (working/archived) assets
`DOT_IMPORT_IGNORE_ASSET_ERRORS`	`true`	Continue if asset extraction has errors

Usage Examples

Docker Standalone

# Create environment file
cat > app.env << 'EOF'
DOT_IMPORT_ENVIRONMENT=https://demo.dotcms.com
[email protected]:admin
DOT_IMPORT_DROP_DB=true

DB_BASE_URL=jdbc:postgresql://localhost:5432/dotcms
DB_USERNAME=dotcmsdbuser
DB_PASSWORD=password
EOF

# Run dotCMS with environment cloning
docker run --env-file app.env \
  -v ./data:/data \
  -p 8080:8082 \
  dotcms/dotcms:latest

Kubernetes Init Container

apiVersion: v1
kind: Pod
metadata:
  name: dotcms
spec:
  initContainers:
    - name: clone-environment
      image: dotcms/dotcms:latest
      env:
        - name: DOT_IMPORT_ENVIRONMENT
          value: "https://source.dotcms.com"
        - name: DOT_IMPORT_API_TOKEN
          valueFrom:
            secretKeyRef:
              name: dotcms-secrets
              key: import-api-token
        - name: DB_BASE_URL
          value: "jdbc:postgresql://postgres:5432/dotcms"
        - name: DB_USERNAME
          valueFrom:
            secretKeyRef:
              name: dotcms-secrets
              key: db-username
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: dotcms-secrets
              key: db-password
      volumeMounts:
        - name: shared-data
          mountPath: /data/shared
  containers:
    - name: dotcms
      image: dotcms/dotcms:latest
      # ... main dotCMS container config
      volumeMounts:
        - name: shared-data
          mountPath: /data/shared
  volumes:
    - name: shared-data
      persistentVolumeClaim:
        claimName: dotcms-data

Note: It is possible to not use an init container and just have dotCMS fire up the first time to clone the target environment. In this case, you will need to adjust your probes and add an appropriate start up delay before the pod gets cycled (not terrible in dev but probably not recommended for production values).

Behavior Details

Idempotency

The script creates an import_complete.txt marker file after successful import
Subsequent container starts skip the import if this marker exists
Delete the marker file to force a re-import

Locking

A lock.txt file prevents concurrent imports (important for Kubernetes)
Lock files older than 30 minutes are considered stale and removed
Pods wait 3 minutes and exit if a lock is held by another process

Database Safety

The script checks if the target database already contains data (inode count)
Import is skipped if data exists (unless DOT_IMPORT_DROP_DB=true)
Use DOT_IMPORT_DROP_DB=true to wipe and reimport

Downloaded Files

Database and asset backups are cached in $IMPORT_DATA_DIR
File names include an MD5 hash of the source URL
Delete cached files to force re-download

Exit Codes

Code	Meaning
`0`	No import needed (already complete or not configured)
`1`	Error during import
`13`	Import completed successfully (signals init container completion)

Troubleshooting

Import stuck or failed

Check for stale lock file: ls -la /data/shared/import/lock.txt
Remove lock if stale: rm /data/shared/import/lock.txt

Force reimport

Remove the completion marker: rm /data/shared/import/import_complete.txt

Optionally remove cached downloads to re-download:

rm /data/shared/import/*_assets.zip
rm /data/shared/import/*_dotcms_db.sql.gz

Authentication failures

Verify DOT_IMPORT_API_TOKEN or DOT_IMPORT_USERNAME_PASSWORD is correct
Ensure the user has access to the Maintenance API endpoints:
- /api/v1/maintenance/_downloadAssets
- /api/v1/maintenance/_downloadDb

Source Environment Requirements

The source dotCMS environment must:

Be accessible over HTTPS/HTTP from the target environment

…iner and clone an another dotCMS env ref: #34442

Remove the unconditional `exit 0` so the entrypoint continues to source startup scripts, clarify the import script’s exit-13 success path, and install `libarchive-tools` to support asset unpacking during imports. ref: #34442

spbolton · 2026-01-30T20:10:22Z

I am fine with this, the lack of a helm chart to help with the templating of this init-container makes integrating this a pain, but is fine for starting with a few instances. There are a couple of concerns that may need addressing and clarifying taking into account how it would effectively work when there is more than on replica for the stateful set, it is more an issue if trying to run on a upgrade where pods are being replaced and we still have active connections.

PR #34443 Review - dotCMS Environment Cloning Feature

Summary

This PR adds environment cloning functionality using an init container pattern. The implementation works for initial deployment scenarios with EFS shared storage and OrderedReady pod management, but has critical limitations that should be addressed before production use.

Overall Assessment: ⚠️ Request Changes - Works for intended use case but needs improvements for robustness

✅ What Works (With EFS + OrderedReady)

Per-pod volume isolation: Eliminated (EFS is shared)
Lock file race condition: Mitigated (OrderedReady prevents concurrent execution)
Database import race: Mitigated (sequential pod startup)
Initial deployment: Works correctly for first-time setup

🚨 Critical Issues (Must Fix)

1. Lock File Race Condition - Fundamental Flaw

Problem: The lock file mechanism has a race condition. It only "works" because OrderedReady prevents concurrent execution, not because the lock is correct.

Current Code (10-import-env.sh lines 178-195):

# Check if lock exists
if [ -f "$IMPORT_IN_PROCESS" ]; then
  # ... check lock age ...
fi

# Create lock (NOT ATOMIC with check above)
mkdir -p $IMPORT_DATA_DIR && touch $IMPORT_IN_PROCESS

Why It's Flawed:

Check-then-create is not atomic
Two processes can both see "no lock" and both create it
Only works because OrderedReady prevents Pod-1 from starting until Pod-0 is Ready

Impact: If podManagementPolicy: Parallel is used (or OrderedReady fails), multiple pods will import simultaneously → database corruption

Recommendation: Use atomic directory creation:

LOCK_DIR="$IMPORT_DATA_DIR/.lock"
if mkdir "$LOCK_DIR" 2>/dev/null; then
  # We got the lock
  trap "rmdir '$LOCK_DIR' 2>/dev/null" EXIT
else
  # Lock exists, check if stale
  # ... existing stale lock logic ...
fi

Alternative: Use PostgreSQL advisory locks (automatically released on connection close)

2. Doesn't Work on Existing StatefulSets with Multiple Replicas

Problem: Cannot safely run on existing StatefulSets during rolling updates. Old pods remain active while new pods run import, causing database conflicts.

Scenario:

Rolling Update:
  Pod-2 (old) deleted
  Pod-2 (new) created
    ├─ Init container runs 10-import-env.sh
    ├─ Sees import_complete.txt doesn't exist (or was deleted)
    ├─ Starts importing database
    └─ Pod-0 and Pod-1 (old) still running, connected to DB
    → CONFLICT: Database operations conflict with active connections

Impact:

Database corruption risk
Active sessions killed
Service disruption
Data loss

Recommendation: Add active connection check before import:

check_active_connections() {
  ACTIVE=$(psql -h "$DB_HOST" -d "$DB_NAME" -U "$DB_USERNAME" -qtAX -c \
    "SELECT count(*) FROM pg_stat_activity 
     WHERE datname = '$DB_NAME' 
     AND pid != pg_backend_pid()
     AND state != 'idle'" 2>/dev/null || echo "0")
  
  if [ "$ACTIVE" -gt 0 ]; then
    echo "ERROR: Database has $ACTIVE active connections"
    echo "Cannot import while database is in use"
    echo "This script is designed for initial deployment only"
    echo "Stop all pods before performing refresh"
    exit 1
  fi
}

# Call before import
check_active_connections || exit 1

Also: Document this limitation clearly in the PR description and script comments

⚠️ High Priority Issues (Should Fix)

3. Stale Lock File Handling

Problem: If Pod-0 crashes during import, lock file remains for up to 30 minutes, blocking progress.

Current Behavior:

Pod restarts, sees lock (< 30 min old), waits 3 min, exits
Repeats until lock is 30 minutes old
No way to manually recover

Recommendation:

Reduce timeout from 30 to 10 minutes
Add trap handler for cleanup on exit
Consider database advisory locks (auto-released on connection close)

4. Partial Import on Failure

Problem: If import fails partway (e.g., asset extraction fails), database may be imported but assets not extracted, leaving inconsistent state.

Recommendation:

Add cleanup on failure (trap ERR)
Only create import_complete.txt if all steps succeed
Document recovery procedures

📝 Documentation & Clarity Issues

5. OrderedReady Dependency Not Documented

Issue: The script depends on OrderedReady but this isn't clearly documented.

Recommendation: Add to PR description:

Important: This feature requires podManagementPolicy: OrderedReady (default). If Parallel is used, multiple pods may import simultaneously, causing database corruption.

6. Exit Code 13 - Non-Standard

Issue: Exit code 13 is non-standard (typically EACCES). Monitoring systems may misinterpret as error.

Recommendation:

Document exit code 13 clearly
Or use exit code 0 and check for marker file in entrypoint
Add to troubleshooting section

7. EFS Requirement Not Explicit

Issue: Script assumes shared storage but doesn't validate.

Recommendation:

Document EFS requirement clearly
Add validation or at least warning if per-pod volumes detected
Add to "Requirements" section

✅ Positive Aspects

Idempotency: Good use of import_complete.txt marker
Error Handling: Basic error handling present
Documentation: PR description is comprehensive
Use Case: Solves real problem (environment cloning)
Init Container Pattern: Correct approach for production

🔧 Recommended Changes

Must Fix (Before Merge)

✅ Fix lock file race condition (atomic operations)
✅ Add active connection check before import
✅ Document OrderedReady requirement
✅ Document EFS requirement

Should Fix (Before Production)

⚠️ Improve stale lock handling (shorter timeout, better cleanup)
⚠️ Add partial import cleanup on failure
⚠️ Document exit code 13 behavior

🧪 Testing Recommendations

Must Test:

Initial deployment with 3 replicas (sequential startup)
Pod restart during import (stale lock handling)
Rolling update on existing StatefulSet (should fail gracefully with active connection check)
Per-pod volumes scenario (should detect/warn)

Should Test:

Large database import (timeout handling)
Network failure during download (recovery)
Asset extraction failure (cleanup)

📊 Risk Assessment

Scenario	Risk Level	Works?	Notes
Initial deployment (EFS + OrderedReady)	✅ Low	Yes	Works correctly
Existing StatefulSet (rolling update)	🚨 High	No	Conflicts with active connections
Without OrderedReady	🚨 High	No	Lock race condition causes failures
Per-pod volumes	🚨 High	No	Each pod imports independently

💡 Additional Suggestions

Consider database advisory locks instead of file-based locks (more reliable, auto-cleanup)
Add pod ordinal check for extra safety (only pod-0 imports on initial deployment)
Add structured logging for better observability
Add metrics/telemetry for import operations
Consider resume capability for failed imports (checkpoint progress)

Final Recommendation

Request Changes - Address critical issues before merge:

Fix lock file race condition (atomic operations)
Add active connection check (prevent conflicts on existing StatefulSets)
Document dependencies (OrderedReady, EFS)

The feature is useful and the init container pattern is correct, but the implementation needs these fixes for production robustness.

Context Notes

Storage: Assumes /data/shared is EFS (shared across pods)
Pod Management: Requires OrderedReady (default StatefulSet policy)
Use Cases: Initial deployment or total refresh (intentional database/filesystem replacement)
Pattern: Init container (as documented in PR description)

Remove the unconditional so the entrypoint continues to source startup scripts, clarify the import script’s exit-13 success path, and install to support asset unpacking during imports. ref: #34442

feat(clone): this allows you to use the dotcms image as an init conta…

2f0a0c6

…iner and clone an another dotCMS env ref: #34442

github-project-automation bot added this to dotCMS - Product Planning Jan 29, 2026

github-actions bot mentioned this pull request Jan 29, 2026

[FEATURE] Make it easy for dotCMS to clone another environment. #34442

Open

7 tasks

wezell added 2 commits January 29, 2026 14:17

feat(clone): run entrypoint setup after import success

86f151f

Remove the unconditional `exit 0` so the entrypoint continues to source startup scripts, clarify the import script’s exit-13 success path, and install `libarchive-tools` to support asset unpacking during imports. ref: #34442

feat(clone): run entrypoint setup after import success

02ec090

Remove the unconditional `exit 0` so the entrypoint continues to source startup scripts, clarify the import script’s exit-13 success path, and install `libarchive-tools` to support asset unpacking during imports. ref: #34442

wezell requested review from dcolina, dsilvam and spbolton January 29, 2026 20:40

wezell linked an issue Jan 29, 2026 that may be closed by this pull request

[FEATURE] Make it easy for dotCMS to clone another environment. #34442

Open

7 tasks

feat(clone): run entrypoint setup after import success

338da8f

Remove the unconditional `exit 0` so the entrypoint continues to source startup scripts, clarify the import script’s exit-13 success path, and install `libarchive-tools` to support asset unpacking during imports. ref: #34442

feat(clone): run entrypoint setup after import success

fba9675

Remove the unconditional so the entrypoint continues to source startup scripts, clarify the import script’s exit-13 success path, and install to support asset unpacking during imports. ref: #34442

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(clone): this allows you to use the dotcms image as an init container and clone an another dotCMS env #34443

feat(clone): this allows you to use the dotcms image as an init container and clone an another dotCMS env #34443

Uh oh!

wezell commented Jan 29, 2026 •

edited

Loading

Uh oh!

spbolton commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(clone): this allows you to use the dotcms image as an init container and clone an another dotCMS env #34443

Are you sure you want to change the base?

feat(clone): this allows you to use the dotcms image as an init container and clone an another dotCMS env #34443

Uh oh!

Conversation

wezell commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Below is documentation:

dotCMS Environment Cloning

How It Works

Environment Variables

Required

Database Connection

Optional

Usage Examples

Docker Standalone

Kubernetes Init Container

Behavior Details

Idempotency

Locking

Database Safety

Downloaded Files

Exit Codes

Troubleshooting

Import stuck or failed

Force reimport

Authentication failures

Source Environment Requirements

Uh oh!

spbolton commented Jan 30, 2026

PR #34443 Review - dotCMS Environment Cloning Feature

Summary

✅ What Works (With EFS + OrderedReady)

🚨 Critical Issues (Must Fix)

1. Lock File Race Condition - Fundamental Flaw

2. Doesn't Work on Existing StatefulSets with Multiple Replicas

⚠️ High Priority Issues (Should Fix)

3. Stale Lock File Handling

4. Partial Import on Failure

📝 Documentation & Clarity Issues

5. OrderedReady Dependency Not Documented

6. Exit Code 13 - Non-Standard

7. EFS Requirement Not Explicit

✅ Positive Aspects

🔧 Recommended Changes

Must Fix (Before Merge)

Should Fix (Before Production)

🧪 Testing Recommendations

📊 Risk Assessment

💡 Additional Suggestions

Final Recommendation

Context Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wezell commented Jan 29, 2026 •

edited

Loading