Skip to content

Add containerized shell toolbox#60

Merged
anticomputer merged 7 commits intomainfrom
anticomputer/shell-toolbox
Mar 5, 2026
Merged

Add containerized shell toolbox#60
anticomputer merged 7 commits intomainfrom
anticomputer/shell-toolbox

Conversation

@anticomputer
Copy link
Contributor

@anticomputer anticomputer commented Mar 3, 2026

Adds an MCP server (container_shell.py) that manages a single Docker container per process lifetime. The container starts on the first shell_exec call and stops on exit via atexit. An optional host directory is mounted at /workspace via CONTAINER_WORKSPACE.

Four profiles, each with a Dockerfile, toolbox YAML, and demo taskflow:

  • base — debian:bookworm-slim with binutils, file, xxd, python3, curl, git
  • malware-analysis — extends base with radare2, binwalk, yara, exiftool, checksec, capstone, pwntools, volatility3
  • network-analysis — extends base with nmap, tcpdump, tshark, netcat, dnsutils, jq, httpie
  • sast — extends base with semgrep, pyan3, universal-ctags, GNU global, cscope, graphviz, ripgrep, fd, tree

shell_exec is listed under confirm: in all toolbox YAMLs. Set headless: true on a task to skip confirmation in automated pipelines.

Images are built with scripts/build_container_images.sh. Demos run with scripts/run_container_shell_demo.sh <base|malware|network|sast>.

16 unit tests, all mocked (no Docker required to run tests).

Adds an MCP server that manages a single Docker container per process
lifetime, exposing a shell_exec tool for running arbitrary CLI commands
in an isolated environment with an optional host workspace mount.

Three profiles are provided, each with a Dockerfile, toolbox YAML, and
demo taskflow:

- base: debian:bookworm-slim + binutils, file, xxd, python3, curl, git
- malware-analysis: extends base with radare2, binwalk, yara, exiftool,
  checksec, capstone, pwntools, volatility3
- network-analysis: extends base with nmap, tcpdump, tshark, netcat,
  dnsutils, jq, httpie

New files:
- src/seclab_taskflows/mcp_servers/container_shell.py
- src/seclab_taskflows/containers/{base,malware_analysis,network_analysis}/Dockerfile
- src/seclab_taskflows/toolboxes/container_shell_{base,malware_analysis,network_analysis}.yaml
- src/seclab_taskflows/taskflows/container_shell/{README.md,demo_base,demo_malware_analysis,demo_network_analysis}.yaml
- scripts/build_container_images.sh
- scripts/run_container_shell_demo.sh
- tests/test_container_shell.py (14 tests, all mocked)
@anticomputer anticomputer requested a review from m-y-mo as a code owner March 3, 2026 20:49
Copilot AI review requested due to automatic review settings March 3, 2026 20:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new container_shell MCP server plus supporting Docker images, toolboxes, and demo taskflows to enable running confirmed shell commands inside a long-lived (per-process) Docker container with an optional host workspace mount.

Changes:

  • Introduces src/seclab_taskflows/mcp_servers/container_shell.py (shell_exec) that lazily starts/stops a Docker container and executes commands via docker exec.
  • Adds three container profiles (base / malware-analysis / network-analysis) with Dockerfiles and corresponding toolbox YAMLs.
  • Adds demo taskflows + README and scripts to build images and run the demos; includes mocked unit tests for the new server/toolboxes.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/seclab_taskflows/mcp_servers/container_shell.py New MCP server implementing lazy-start container execution via Docker.
tests/test_container_shell.py Mocked unit tests for container lifecycle and shell_exec, plus toolbox YAML loading.
src/seclab_taskflows/toolboxes/container_shell_base.yaml Toolbox wiring for base container image + confirmation guardrail.
src/seclab_taskflows/toolboxes/container_shell_malware_analysis.yaml Toolbox wiring for malware-analysis container profile.
src/seclab_taskflows/toolboxes/container_shell_network_analysis.yaml Toolbox wiring for network-analysis container profile.
src/seclab_taskflows/taskflows/container_shell/demo_base.yaml Headless demo taskflow using the base container toolbox.
src/seclab_taskflows/taskflows/container_shell/demo_malware_analysis.yaml Headless demo taskflow for static malware triage workflow.
src/seclab_taskflows/taskflows/container_shell/demo_network_analysis.yaml Headless demo taskflow for pcap analysis workflow.
src/seclab_taskflows/taskflows/container_shell/README.md Documentation for profiles, building images, and running demos.
src/seclab_taskflows/containers/base/Dockerfile Base Debian image with core CLI + Python utilities.
src/seclab_taskflows/containers/malware_analysis/Dockerfile Malware analysis image extending base with RE/forensics tooling.
src/seclab_taskflows/containers/network_analysis/Dockerfile Network analysis image extending base with recon/pcap tooling.
scripts/build_container_images.sh Helper script to build the container images.
scripts/run_container_shell_demo.sh Helper script to run one of the demo taskflows end-to-end.
Comments suppressed due to low confidence (1)

src/seclab_taskflows/containers/malware_analysis/Dockerfile:12

  • This Dockerfile downloads and installs a .deb from GitHub Releases using curl and the moving releases/latest endpoint without any checksum or signature verification. If the GitHub repo or the network path is compromised, an attacker can serve a malicious .deb that will be installed as root into your analysis image, leading to arbitrary code execution in every container built from it. Pin the radare2 artifact to a specific immutable version and verify its integrity (e.g., via a known-good checksum or publisher signature) before installing.
# radare2 is not in Debian bookworm apt; install prebuilt deb from GitHub releases
RUN ARCH=$(dpkg --print-architecture) \
    && R2_TAG=$(curl -fsSL "https://api.github.com/repos/radareorg/radare2/releases/latest" \
        | grep -o '"tag_name": *"[^"]*"' | grep -o '"[^"]*"$' | tr -d '"') \
    && R2_VER="${R2_TAG#v}" \
    && curl -fsSL "https://github.com/radareorg/radare2/releases/download/${R2_TAG}/radare2_${R2_VER}_${ARCH}.deb" \
        -o /tmp/r2.deb \
    && apt-get install -y /tmp/r2.deb \

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

A colon in the workspace path breaks Docker's volume mount syntax
(host:container[:options]), silently changing mount behaviour.
Raise RuntimeError early in _start_container() if the colon is present.
Adds a corresponding test.
SLF001 (private member accessed) is expected in tests that exercise
module internals directly. Suppress it via per-file-ignores for tests/*.

PLW0603 (global statement used for assignment) is the correct pattern
for the module-level container ID state. Add to the global ignore list
alongside the existing PLW0602 exemption.
Copilot AI review requested due to automatic review settings March 3, 2026 20:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

src/seclab_taskflows/containers/malware_analysis/Dockerfile:13

  • This Dockerfile downloads and installs a prebuilt radare2 .deb directly from GitHub using the mutable releases/latest tag with no checksum or signature verification (curl ... radare2_${R2_VER}_${ARCH}.debapt-get install -y /tmp/r2.deb). If the GitHub account, release artifacts, or the network path are compromised, a malicious .deb would be transparently installed into your analysis image and run with container privileges, enabling supply-chain compromise of any environment using this image. Pin the download to a specific trusted release (e.g., by hard-coding a tag/version) and verify the artifact’s integrity (checksum or signature) before installation.
RUN ARCH=$(dpkg --print-architecture) \
    && R2_TAG=$(curl -fsSL "https://api.github.com/repos/radareorg/radare2/releases/latest" \
        | grep -o '"tag_name": *"[^"]*"' | grep -o '"[^"]*"$' | tr -d '"') \
    && R2_VER="${R2_TAG#v}" \
    && curl -fsSL "https://github.com/radareorg/radare2/releases/download/${R2_TAG}/radare2_${R2_VER}_${ARCH}.deb" \
        -o /tmp/r2.deb \
    && apt-get install -y /tmp/r2.deb \
    && rm /tmp/r2.deb

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

S101 started firing after a ruff version bump in CI, including against
the pre-existing test_00.py. Use of assert is standard pytest practice;
suppress it for tests/* alongside SLF001.
- seclab-shell-sast image extends base with semgrep, pyan3, universal-ctags,
  GNU global, cscope, graphviz, ripgrep, fd, tree
- Toolbox YAML with server_prompt documenting Python and C call graph workflows
- Demo taskflow: tree, fd, semgrep, ctags, pyan3, gtags then summarise findings
- Runner generates a demo Python file with a shell=True anti-pattern if workspace
  is empty, so semgrep has something to find out of the box
- build_container_images.sh and run_container_shell_demo.sh updated for sast target
- test_toolbox_yaml_valid_sast added (16/16 passing)
Copilot AI review requested due to automatic review settings March 3, 2026 22:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 12 comments.

Comments suppressed due to low confidence (2)

src/seclab_taskflows/taskflows/container_shell/README.md:9

  • PR description says “Three profiles…”, but this README (and the added toolboxes/taskflows) describe four profiles including sast. Please align the PR description and documentation so they describe the same set of supported profiles.
Four container profiles are provided. Each has its own Dockerfile, toolbox
YAML, and demo taskflow.

src/seclab_taskflows/containers/malware_analysis/Dockerfile:12

  • This Dockerfile downloads and installs a radare2 .deb directly from GitHub using the releases/latest endpoint and curl without any version pinning or integrity verification. If the radareorg/radare2 project or its release artifacts are compromised, a malicious package could be pulled into your image and executed at build time and whenever this container runs. To harden the supply chain, pin radare2 to a specific trusted version and verify the downloaded artifact (for example with a checksum or signature) before installing it, or obtain it from a vetted package repository.
RUN ARCH=$(dpkg --print-architecture) \
    && R2_TAG=$(curl -fsSL "https://api.github.com/repos/radareorg/radare2/releases/latest" \
        | grep -o '"tag_name": *"[^"]*"' | grep -o '"[^"]*"$' | tr -d '"') \
    && R2_VER="${R2_TAG#v}" \
    && curl -fsSL "https://github.com/radareorg/radare2/releases/download/${R2_TAG}/radare2_${R2_VER}_${ARCH}.deb" \
        -o /tmp/r2.deb \
    && apt-get install -y /tmp/r2.deb \

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

- Rename _container_id → _container_name throughout (it stores the name
  set via --name, not the Docker-assigned container ID)
- Add empty-image guard in _start_container: raise clear RuntimeError
  when CONTAINER_IMAGE is not set rather than passing an empty string
  to docker run
- Add 30s timeout to docker run subprocess call in _start_container
- Log warning in _stop_container when docker stop fails instead of
  silently ignoring a non-zero returncode
- Default _DEFAULT_WORKDIR to /workspace unconditionally (all images
  set WORKDIR /workspace; the previous "/" fallback when no workspace
  was mounted was inconsistent with the container image defaults)
- Add SPDX headers to container_shell.py, test_container_shell.py,
  and all three Dockerfiles that were missing them
- Remove unused importlib import from test_container_shell.py
- Fix dead sast workspace existence check in run_container_shell_demo.sh
  (mkdir -p always creates workspace so the old condition was never true;
  now checks the actual target path when a specific target is provided)
- Update build_container_images.sh usage comment to include sast
- Clarify malware analysis toolbox prompt: /workspace is bind-mounted
  RW from the host, not an isolated environment
- Update README CONTAINER_TIMEOUT defaults to mention sast profile (60s)
- Add test_start_container_rejects_empty_image and
  test_stop_container_clears_name_on_failure test cases
Copilot AI review requested due to automatic review settings March 4, 2026 02:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (3)

src/seclab_taskflows/mcp_servers/container_shell.py:66

  • _stop_container() calls subprocess.run([...docker stop...]) with no timeout and no handling for missing docker binary. Since this runs at atexit, a hang or exception here can delay shutdown or print noisy tracebacks. Add a reasonable timeout and catch OSError/TimeoutExpired while still clearing _container_name.
    logging.debug(f"Stopping container: {_container_name}")
    result = subprocess.run(
        ["docker", "stop", "--time", "5", _container_name],
        capture_output=True,
        text=True,
    )
    if result.returncode != 0:

src/seclab_taskflows/mcp_servers/container_shell.py:98

  • In shell_exec, only TimeoutExpired is caught around subprocess.run(...). If docker is not installed or cannot be executed, FileNotFoundError/OSError will propagate and likely crash the MCP server. Catch these exceptions and return a structured error string similar to the startup failure path.
    cmd = ["docker", "exec", "-w", workdir, _container_name, "bash", "-c", command]
    logging.debug(f"Executing: {' '.join(cmd)}")
    try:
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout)
    except subprocess.TimeoutExpired:
        return f"[exit code: timeout after {timeout}s]"

src/seclab_taskflows/containers/malware_analysis/Dockerfile:16

  • This Dockerfile downloads and installs a prebuilt radare2 .deb from GitHub Releases using releases/latest without any pinning or checksum/signature verification, which is a supply-chain risk. If the radareorg/radare2 repository or the release artifact is compromised, a malicious package will be installed into your analysis container with the same privileges as any tooling you run. To mitigate this, pin R2_TAG to a known-good version and verify the downloaded .deb against a trusted checksum or signature (or vendor it / use a trusted package repository) before installing.
RUN ARCH=$(dpkg --print-architecture) \
    && R2_TAG=$(curl -fsSL "https://api.github.com/repos/radareorg/radare2/releases/latest" \
        | grep -o '"tag_name": *"[^"]*"' | grep -o '"[^"]*"$' | tr -d '"') \
    && R2_VER="${R2_TAG#v}" \
    && curl -fsSL "https://github.com/radareorg/radare2/releases/download/${R2_TAG}/radare2_${R2_VER}_${ARCH}.deb" \
        -o /tmp/r2.deb \
    && apt-get install -y /tmp/r2.deb \
    && rm /tmp/r2.deb

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +28 to +29
CONTAINER_TIMEOUT = int(os.environ.get("CONTAINER_TIMEOUT", "30"))

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONTAINER_TIMEOUT is parsed with int(...) at import time. If the env var is unset/empty or non-numeric, the whole MCP server will fail to start with a ValueError. Consider parsing defensively (fallback to a default) and/or validating with a clearer error message.

Suggested change
CONTAINER_TIMEOUT = int(os.environ.get("CONTAINER_TIMEOUT", "30"))
def _get_container_timeout(default: int = 30) -> int:
"""Parse CONTAINER_TIMEOUT from the environment defensively."""
raw_value = os.environ.get("CONTAINER_TIMEOUT")
if not raw_value:
return default
try:
timeout = int(raw_value)
if timeout <= 0:
raise ValueError
except ValueError:
logging.warning(
"Invalid CONTAINER_TIMEOUT value %r; falling back to default %ss",
raw_value,
default,
)
return default
return timeout
CONTAINER_TIMEOUT = _get_container_timeout()

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +50
logging.debug(f"Starting container: {' '.join(cmd)}")
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
if result.returncode != 0:
msg = f"docker run failed: {result.stderr.strip()}"
raise RuntimeError(msg)
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker run is executed via subprocess.run(...) without handling FileNotFoundError/OSError (docker missing) or TimeoutExpired. Either case will currently raise and crash the server/tool call. Wrap this call in try/except and surface a clear RuntimeError (or a user-facing error string from shell_exec) so failures are reported cleanly.

This issue also appears in the following locations of the same file:

  • line 60
  • line 92

Copilot uses AI. Check for mistakes.
@anticomputer anticomputer merged commit e5bde72 into main Mar 5, 2026
13 checks passed
@anticomputer anticomputer deleted the anticomputer/shell-toolbox branch March 5, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants