Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 85 additions & 22 deletions .github/workflows/cron-trivy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,13 @@ on:

permissions:
contents: read
security-events: write

jobs:
# Run Trivy on the latest container and update the security code scanning results tab.
trivy-latest:
# Matrix job that pulls the latest image for each supported architecture via the multi-arch latest manifest.
# We then re-tag it locally to ensure that when Trivy runs it does not pull the latest for the wrong architecture.
name: ${{ matrix.arch }} container scan
# Pull and export the image in a separate job so Trivy never runs with
# registry credentials present.
prepare-images:
name: ${{ matrix.arch }} container fetch
runs-on: [ ubuntu-latest ]
continue-on-error: true
strategy:
fail-fast: false
# Matrix of architectures to test along with their local tags for special character substitution
Expand Down Expand Up @@ -54,7 +51,50 @@ jobs:
run: |
docker tag fluent/fluent-bit:latest local/fluent-bit:${{ matrix.local_tag }}

# Deliberately chosen master here to keep up-to-date.
- name: Export image for isolated scanning
run: |
docker save local/fluent-bit:${{ matrix.local_tag }} \
-o fluent-bit-${{ matrix.local_tag }}.tar

- name: Upload image artifact
uses: actions/upload-artifact@v7
with:
name: fluent-bit-image-${{ matrix.local_tag }}
path: fluent-bit-${{ matrix.local_tag }}.tar
if-no-files-found: error

# Run Trivy with no registry credentials and no GitHub write permissions.
trivy-latest:
needs: prepare-images

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid gating all scans on the full fetch matrix

Because trivy-latest now needs: prepare-images, GitHub will wait for the entire fetch matrix to succeed before starting any scan leg. Since prepare-images no longer has continue-on-error: true, a single failure while pulling/exporting one architecture (for example, a missing linux/arm/v7 image or a transient registry/artifact error) causes all three scan jobs and all SARIF uploads to be skipped. In the previous workflow only the failing architecture was lost, so this change turns a partial outage into a complete loss of security-scan coverage.

Useful? React with 👍 / 👎.

name: ${{ matrix.arch }} container scan
runs-on: [ ubuntu-latest ]
continue-on-error: true
permissions: {}
Comment on lines +67 to +72
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In GitHub Actions, what does job-level continue-on-error: truedo for failed steps and matrix jobs, and what is the defaultexit-codebehavior ofaquasecurity/trivy-actionwhenexit-code is omitted?

💡 Result:

In GitHub Actions, setting continue-on-error: true at the job level allows subsequent jobs that depend on it (via needs) to run even if the job encounters a failure in one of its steps. However, within the job itself, execution stops at the first failed step—subsequent steps in the same job do not run. Failed steps are still marked as failed (with steps..outcome and steps..conclusion set to 'failure'), but the job's overall result is treated as 'success' for dependency purposes (e.g., needs..result is 'success'). The workflow status remains failed, and the job shows a red X in the UI indicating internal failure. For matrix jobs (using strategy.matrix), continue-on-error: true at the job level applies to the entire matrix strategy. If any matrix leg fails, the job continues (subsequent jobs run), but unlike non-matrix jobs with strategy.fail-fast: false, the workflow may still succeed overall unless configured otherwise. Individual failed matrix legs are marked failed, but the job result is 'success' for dependencies. When exit-code is omitted in aquasecurity/trivy-action, the default is 0. Trivy exits with code 0 by default even if vulnerabilities are found (unless overridden via exit-code: '1' or config). This means the step succeeds regardless of findings, allowing workflows to continue and upload results (e.g., SARIF for code scanning).

Citations:


🏁 Script executed:

cat -n .github/workflows/cron-trivy.yaml | sed -n '60,130p'

Repository: fluent/fluent-bit

Length of output: 3232


Don't mask scan and SARIF upload failures with job-level continue-on-error.

If you want findings to stay advisory, keep that policy in Trivy's exit-code setting, not on the whole job. At the job level, this mainly swallows real breakage—artifact download/load failures, DB fetch issues, or SARIF upload failures—so coverage can be lost without the workflow failing. The orchestration you need (fan-out/fan-in across matrix jobs) is already handled by fail-fast: false and if: ${{ always() }}.

Suggested changes
  trivy-latest:
    needs: prepare-images
    name: ${{ matrix.arch }} container scan
    runs-on: [ ubuntu-latest ]
-   continue-on-error: true
    permissions: {}
    strategy:
      fail-fast: false
  upload-trivy-results:
    needs: trivy-latest
    name: ${{ matrix.arch }} SARIF upload
    runs-on: [ ubuntu-latest ]
    if: ${{ always() }}
-   continue-on-error: true
    permissions:
      contents: read
      security-events: write
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
trivy-latest:
needs: prepare-images
name: ${{ matrix.arch }} container scan
runs-on: [ ubuntu-latest ]
continue-on-error: true
permissions: {}
trivy-latest:
needs: prepare-images
name: ${{ matrix.arch }} container scan
runs-on: [ ubuntu-latest ]
permissions: {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cron-trivy.yaml around lines 67 - 72, Remove the job-level
"continue-on-error: true" from the trivy-latest matrix job (the job named
"trivy-latest") so scan/artifact/SARIF upload failures are not masked; keep
advisory behavior by configuring Trivy's own exit-code settings in the Trivy
step instead, and rely on existing orchestration controls (fail-fast: false on
the matrix and downstream steps using if: ${{ always() }}) to prevent the whole
workflow from failing while still surfacing real job-level errors.

strategy:
fail-fast: false
# Matrix of architectures to test along with their local tags for special character substitution
matrix:
# The architecture for the container runtime to pull.
arch: [ linux/amd64, linux/arm64, linux/arm/v7 ]
# In a few cases we need the arch without slashes so provide a descriptive extra field for that.
# We could also extract or modify this via a regex but this seemed simpler and easier to follow.
include:
- arch: linux/amd64
local_tag: x86_64
- arch: linux/arm64
local_tag: arm64
- arch: linux/arm/v7
local_tag: arm32
steps:
- name: Download image artifact
uses: actions/download-artifact@v5
with:
name: fluent-bit-image-${{ matrix.local_tag }}

- name: Load image from artifact
run: |
docker load -i fluent-bit-${{ matrix.local_tag }}.tar

- name: Run Trivy vulnerability scanner for any major issues
uses: aquasecurity/trivy-action@57a97c7e7821a5776cebc9bb87c984fa69cba8f1
with:
Expand All @@ -67,25 +107,48 @@ jobs:
template: '@/contrib/sarif.tpl'
output: trivy-results-${{ matrix.local_tag }}.sarif

# Show all detected issues.
# Note this will show a lot more, including major un-fixed ones.
- name: Run Trivy vulnerability scanner for local output
uses: aquasecurity/trivy-action@b6643a29fecd7f34b3597bc6acb0a98b03d33ff8
- name: Upload Trivy results artifact
if: ${{ always() }}
uses: actions/upload-artifact@v7
with:
image-ref: local/fluent-bit:${{ matrix.local_tag }}
format: table
name: trivy-results-${{ matrix.local_tag }}.sarif
path: trivy-results-${{ matrix.local_tag }}.sarif
if-no-files-found: warn

# Upload SARIF in a dedicated job with the minimal write permission needed.
upload-trivy-results:
needs: trivy-latest
name: ${{ matrix.arch }} SARIF upload
runs-on: [ ubuntu-latest ]
if: ${{ always() }}
continue-on-error: true
permissions:
contents: read
security-events: write
strategy:
fail-fast: false
# Matrix of architectures to test along with their local tags for special character substitution
matrix:
# The architecture for the container runtime to pull.
arch: [ linux/amd64, linux/arm64, linux/arm/v7 ]
# In a few cases we need the arch without slashes so provide a descriptive extra field for that.
# We could also extract or modify this via a regex but this seemed simpler and easier to follow.
include:
- arch: linux/amd64
local_tag: x86_64
- arch: linux/arm64
local_tag: arm64
- arch: linux/arm/v7
local_tag: arm32
steps:
- name: Download Trivy results artifact
uses: actions/download-artifact@v5
with:
name: trivy-results-${{ matrix.local_tag }}.sarif

- name: Upload Trivy scan results to GitHub Security tab
uses: github/codeql-action/upload-sarif@v4
with:
sarif_file: trivy-results-${{ matrix.local_tag }}.sarif
category: ${{ matrix.arch }} container
wait-for-processing: true

# In case we need to analyse the uploaded files for some reason.
- name: Detain results for debug if needed
uses: actions/upload-artifact@v7
with:
name: trivy-results-${{ matrix.local_tag }}.sarif
path: trivy-results-${{ matrix.local_tag }}.sarif
if-no-files-found: error