fix: add npm auth debug step and fix publish retry logic#835
fix: add npm auth debug step and fix publish retry logic#835carlos-alm merged 3 commits intomainfrom
Conversation
GitHub Actions sets errexit by default, which kills the bash script on first npm publish failure before the retry wrapper can fire. Disable errexit and capture exit codes explicitly. Also adds a debug step that runs npm whoami before publish to surface OIDC auth issues early.
|
Claude finished @carlos-alm's task —— View job Reviewing PR: Fix publish retry logic and add debug stepTodo List:
Analysis of Changes:
Phase 1: Root Cause Analysis ✅Problem Being Solved: The 3.9.0 release failed with E404 errors on platform packages, and the retry logic (3 attempts with exponential backoff) never executed because GitHub Actions' default Why This Approach: This addresses the actual root cause rather than symptoms. The E404 suggests an authentication issue (OIDC token problems), and the retry logic failing to execute is a legitimate bash behavior issue. The debug step will surface auth problems early, and disabling Risk Assessment: ✅ LOW RISK - Changes are surgical and justified. Adding debug output and fixing broken retry logic are both defensive improvements. No functionality is being suppressed or hidden. Phase 2: Configuration Change Detection ✅Configuration Changes: None that weaken safety. The changes are:
Assessment: ✅ NO HIGH-RISK CHANGES - No linting rules disabled, no test thresholds lowered, no strict mode flags relaxed. The Phase 3: Test Integrity ✅Test Changes: None. This PR only modifies the publish workflow. RED FLAGS: None detected. No test files modified, no assertions weakened, no test logic changed. Phase 4: Code Quality ✅Code Analysis:
Potential Issues: None significant. The implementation follows bash best practices. Phase 5: Backlog ComplianceZero-dep: ✓ - No new runtime dependencies added Tier: 1 (zero-dep ✓ + foundation-aligned ✓) Final AssessmentRoot Cause AnalysisProblem Being Solved: Failed 3.9.0 publish due to E404 auth errors + non-functional retry logic caused by bash Backlog Compliance
Critical ConcernsNone. This is a straightforward technical fix addressing legitimate CI/CD issues. Final Recommendation
|
Greptile SummaryThis PR fixes the 3.9.0 publish failure by disabling both Confidence Score: 5/5Safe to merge — all prior P1 concerns resolved, no new issues found. All three previously flagged issues (pipefail in platform publish, pipefail in main publish, and .npmrc credential exposure) are correctly fixed. The retry logic now correctly captures exit codes with No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant CI as GitHub Actions
participant Debug as Debug npm auth
participant Plat as Publish platform packages
participant Main as Publish main package
CI->>Debug: if not already_published
Debug->>Debug: npm --version / node --version
Debug->>Debug: npm whoami (OIDC check)
Debug->>Debug: grep registry ~/.npmrc
CI->>Plat: if not already_published (set +eo pipefail)
loop Each platform artifact
Plat->>Plat: npm view pkg@version (skip if exists)
loop npm_publish_retry (max 3 attempts)
Plat->>Plat: ::group:: attempt N/3
Plat->>Plat: npm publish --provenance
Plat->>Plat: capture rc / ::endgroup::
alt rc == 0
Plat-->>Plat: return 0
else rc != 0 and attempt < 3
Plat->>Plat: warn + sleep (15s → 30s → 60s)
end
end
Plat->>Plat: FAILED=1 on retry exhaustion
end
Plat->>CI: exit $FAILED
CI->>Main: if not already_published (set +eo pipefail)
loop npm_publish_retry (max 3 attempts)
Main->>Main: ::group:: attempt N/3
Main->>Main: npm publish --provenance
Main->>Main: capture rc / ::endgroup::
end
Main->>CI: exit 1 on retry exhaustion
Reviews (2): Last reviewed commit: "fix: address review feedback on publish ..." | Re-trigger Greptile |
.github/workflows/publish.yml
Outdated
| NPM_TAG: ${{ needs.compute-version.outputs.npm_tag }} | ||
| shell: bash | ||
| run: | | ||
| set +e # Disable errexit so retry logic works |
There was a problem hiding this comment.
pipefail still active after set +e
set +e disables errexit but leaves pipefail on (GitHub Actions enables both by default). The publish commands themselves are fine since they don't use pipes, but the platform=$(basename "$artifact_dir" | sed 's/^native-//') line uses a pipe: if basename or sed fails, pipefail would set the loop body's exit status to non-zero. With set +e that won't kill the step, but it could silently produce an empty $platform and trigger the "Unknown platform artifact" warning. Easy to prevent by also disabling pipefail.
| set +e # Disable errexit so retry logic works | |
| set +eo pipefail # Disable errexit and pipefail so retry logic works |
There was a problem hiding this comment.
Fixed in 7a47aea — both publish steps now use set +eo pipefail to disable both errexit and pipefail, preventing silent empty-variable issues from the piped basename | sed command.
.github/workflows/publish.yml
Outdated
| run: npm publish --access public --provenance --tag "$NPM_TAG" | ||
| shell: bash | ||
| run: | | ||
| set +e # Disable errexit so retry logic works |
There was a problem hiding this comment.
Same
pipefail caveat applies here
The "Publish main package" step should mirror the fix above for consistency. The npm_publish_retry function doesn't use pipes, but matching the header makes the intent explicit.
| set +e # Disable errexit so retry logic works | |
| set +eo pipefail # Disable errexit and pipefail so retry logic works |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed in 7a47aea — the main package publish step now also uses set +eo pipefail to match the platform publish step.
.github/workflows/publish.yml
Outdated
| cat ~/.npmrc 2>/dev/null || echo "no ~/.npmrc" | ||
| cat .npmrc 2>/dev/null || echo "no .npmrc" |
There was a problem hiding this comment.
.npmrc contents exposed in logs
cat ~/.npmrc dumps the full file to GitHub Actions logs, visible to anyone with repo read access. With OIDC the file holds only an env-var reference (not a literal token), so the current risk is low. If auth is ever switched to a static token the credential would be logged in plain text. Consider filtering to registry-URL lines rather than printing the whole file.
There was a problem hiding this comment.
Fixed in 7a47aea — replaced cat ~/.npmrc / cat .npmrc with grep 'registry' to only show registry-URL lines, avoiding potential credential exposure if auth is ever switched to static tokens.
- Disable pipefail alongside errexit in both publish steps so piped commands (basename | sed) cannot silently produce empty variables - Filter .npmrc output to registry lines only instead of dumping the full file, avoiding potential credential exposure - Guard mkdir/cp of .node artifacts with an explicit file check, setting FAILED=1 and skipping the platform if the copy fails
|
Addressed the outside-diff comment about silent |
|
Resolved merge conflicts with main. The inline retry logic from this PR was replaced by main's shared |
Summary
npm whoamibefore publish to surface OIDC auth failures early instead of getting a cryptic E404set -eo pipefailby default for bash steps, which kills the script on the firstnpm publishfailure before the retry wrapper can fire. Now disableserrexitand captures exit codes explicitly::group::for cleaner CI logsContext
The 3.9.0 publish failed with E404 on platform packages. The retry logic (3 attempts, exponential backoff) never fired — the entire step completed in 6s, confirming
errexitkilled the script immediately. This PR fixes the retries and adds diagnostic output to identify the root cause of the auth failure.Test plan
npm whoamioutput (confirms OIDC auth works or surfaces the failure)