High-scale cloud object storage inspection and crawl engine
Gonimbus is a Go-first library + CLI + server for large-scale inspection and crawl of cloud object storage. It produces machine-friendly outputs (JSONL baseline) and favors prefix-first listing with doublestar matching to stay fast and predictable.
Scale: Tested with 32M+ object buckets. Path-scoped index builds reduce listing costs by 99%+ on date-partitioned data.
- CLI: Run validated crawl/inspect jobs from manifests; stream JSONL to stdout/files or index sinks
- Server: Long-running runner with streaming results; intended to live near the data and accept remote job submissions
- Library: Embeddable components (matcher, crawler, outputs, provider backends) for Go apps
go install github.com/3leaps/gonimbus/cmd/gonimbus@latest# Quick inspection of an S3 prefix
gonimbus inspect s3://my-bucket/path/to/data/
# Run a crawl job from manifest
gonimbus crawl --job crawl-manifest.yaml
# Check environment and auth
gonimbus doctor
# Check with specific AWS profile
gonimbus doctor --provider s3 --profile my-sso-profile
# Start server mode
gonimbus servegit clone https://github.com/3leaps/gonimbus.git
cd gonimbus
make bootstrap
make build
./bin/gonimbus version- S3/S3-compatible: First-class support with access key/secret
- AWS profiles: Assume-role chains, SSO, cached tokens
- GCS: Fast-follow (v0.2.x)
Uses SDK default auth chains - no reinventing the wheel:
- AWS: env vars, shared config/credentials, profiles, SSO, web identity/IRSA
- Enterprise SSO:
--profileflag withaws sso loginworkflow - Raw keys supported as explicit fallback (Wasabi, DigitalOcean Spaces)
See docs/auth/aws-profiles.md for enterprise authentication patterns.
- Doublestar semantics over normalized keys
- Derives strongest possible list prefix per pattern (critical for scale)
- Include/exclude pattern support
- Path-scoped index builds for date-partitioned data (see docs/user-guide/index.md)
Stream-friendly JSONL records:
{"type":"gonimbus.object.v1","ts":"2025-01-15T10:30:00.000Z","job_id":"abc123","provider":"s3","data":{...}}Two output modes for content access:
- Content inspection (JSONL-only):
content headreads first N bytes with base64 encoding. See docs/releases/v0.1.6.md. - Content streaming (mixed framing):
stream getdelivers full content with JSONL headers + raw bytes. See docs/releases/v0.1.5.md.
Optional DuckDB sink for local indexing.
See docs/user-guide/examples/README.md for copy/paste recipes (advanced filtering, s3-compatible endpoints, and more as this project grows).
For automated workflow testing and validation, see fulseed - a companion tool for building reproducible test scenarios.
# Explore workflow (no index required)
gonimbus tree <uri> # Prefix summary (directory-like view)
gonimbus inspect <uri> # Quick inspection with filters
gonimbus crawl --job <path> # Full crawl to JSONL
# Index workflow (for large buckets)
gonimbus index init # Initialize local index database
gonimbus index build --job <path> # Build index from crawl
gonimbus index build --background --job <path> # Background build with job tracking
gonimbus index query <uri> # Query indexed objects by pattern
gonimbus index list # List local indexes
gonimbus index doctor # Validate index integrity
gonimbus index gc # Clean up old indexes
# Job management (for long-running builds)
gonimbus index jobs list # List running and recent jobs
gonimbus index jobs status <id> # Check job state and progress
gonimbus index jobs logs <id> # Stream job logs
gonimbus index jobs stop <id> # Safe cancellation
gonimbus index jobs gc # Clean up old job records
# Content inspection (JSONL-only, for routing decisions)
gonimbus content head <uri> # Read first N bytes (base64 in JSONL)
# Content streaming (for pipeline integration)
gonimbus stream head <uri> # Object metadata (JSONL)
gonimbus stream get <uri> # Stream full content (JSONL + raw bytes)
# Operations
gonimbus transfer --job <path> # Copy/move objects between buckets
gonimbus preflight --job <path> # Verify permissions before transfer
gonimbus doctor # Environment/auth checks
gonimbus serve # Run server mode
gonimbus version # Version info
# Safety latch: hard-disable provider-side mutations
# gonimbus --readonly <command>Gonimbus uses three-layer configuration via gofulmen:
- Template Defaults:
config/gonimbus/v1.0.0/gonimbus-defaults.yaml - User Overrides:
~/.config/3leaps/gonimbus.yaml - Runtime: Environment variables (
GONIMBUS_*) and CLI flags
GONIMBUS_PORT=8080 # Server port
GONIMBUS_HOST=localhost # Server host
GONIMBUS_LOG_LEVEL=info # Log level (trace/debug/info/warn/error)
GONIMBUS_METRICS_PORT=9090 # Metrics port
GONIMBUS_READONLY=1 # Disable provider-side mutationsCopy .env.example to .env for local development.
When running in server mode:
GET /health/*- Liveness/readiness probesGET /version- Full version info with SSOT versionsGET /metrics- Prometheus metrics
- Mounts, sync engines, FUSE/desktop UX
- "List everything by default" for broad patterns (scale requires explicit sharding)
- Pinning/offline queues
make help # Show all targets
make bootstrap # Install dependencies
make build # Build binary
make test # Run unit tests
make test-cloud # Run cloud integration tests (requires moto)
make lint # Run linting
make check-all # Lint + testCloud integration tests run against a local S3-compatible endpoint (moto):
make moto-start # Start moto server (Docker)
make test-cloud # Run cloud integration tests
make moto-stop # Stop moto serverSee docs/development/ for detailed development guides including testing strategy.
See docs/architecture.md for component design:
- Provider Layer
- Match Layer (Cloud-Doublestar)
- Crawl Engine
- Job Manifest schemas
- Output formats
Gonimbus is part of the Fulmen ecosystem:
Level 4: Production Apps (Gonimbus)
Level 3: DX Tools (goneat, fulward)
Level 2: Templates (forge-workhorse-*)
Level 1: Libraries (gofulmen, pyfulmen)
Level 0: Crucible (SSOT - schemas, standards)
- gofulmen - Config path API, three-layer config, schema validation, Crucible shim
- AWS SDK v2 - Default configuration loading
Licensed under the Apache License 2.0. See LICENSE for details.
Trademarks: "Fulmen" and "3 Leaps" are trademarks of 3 Leaps, LLC.
Built with lightning by the 3 Leaps team Part of the Fulmen Ecosystem