Skip to content

RFC: Bid Screening — 2-stage provider pre-filtering for faster deployments #3056

@baktun14

Description

@baktun14

Summary

2-stage bid screening: local DB filtering (fast) + provider-side dry-run (accurate). See akash-network/provider#386 and #3055.


Provider shouldBid Algorithm — Stage Mapping

flowchart TD
    A["GroupSpec"] --> B["🟢 1. ValidateBasic"]
    B --> C["🟢 2. MatchAttributes"]
    C --> D["🔵 3. bidAttrs.SubsetOf"]
    D --> E["🟢 4. MatchResourcesReq"]
    E --> F["⚪ 5. maxGroupVolumes"]
    F --> G["🟢 6. SignedBy"]
    G --> H["🟡 7. DryRunReserve"]
    H --> I["🔵 8. CalculatePrice"]
    I --> J["🔵 9. CanReserveHostnames"]
    J --> K["✅ passed, price"]
    B -->|fail| X["❌ validation error"]
    C -->|fail| X2["❌ incompatible attrs"]
    D -->|fail| X3["❌ order attrs"]
    E -->|fail| X4["❌ resource reqs"]
    F -->|fail| X5["❌ volumes limit"]
    G -->|fail| X6["❌ signature"]
    H -->|fail| X7["❌ capacity"]
    I -->|fail| X8["❌ pricing"]
    J -->|fail| X9["❌ hostname"]
    style B fill:#16a34a,color:#fff
    style C fill:#16a34a,color:#fff
    style E fill:#16a34a,color:#fff
    style G fill:#16a34a,color:#fff
    style D fill:#2563eb,color:#fff
    style H fill:#ca8a04,color:#fff
    style I fill:#2563eb,color:#fff
    style J fill:#2563eb,color:#fff
    style F fill:#6b7280,color:#fff
    style K fill:#16a34a,color:#fff
    style X fill:#dc2626,color:#fff
    style X2 fill:#dc2626,color:#fff
    style X3 fill:#dc2626,color:#fff
    style X4 fill:#dc2626,color:#fff
    style X5 fill:#dc2626,color:#fff
    style X6 fill:#dc2626,color:#fff
    style X7 fill:#dc2626,color:#fff
    style X8 fill:#dc2626,color:#fff
    style X9 fill:#dc2626,color:#fff
Loading

🟢 Stage 1 (DB) | 🔵 Stage 2 (provider) | 🟡 Both | ⚪ Skipped | 🔴 Failure

# Provider code Stage 1 (DB) Stage 2 (provider) Notes
1 ValidateBasic() ✅ Zod schema Also re-validated cpu>0, memory>0, count>0
2 MatchAttributes(providerAttrs) ✅ providerAttribute JOIN+HAVING On-chain attrs from indexer
3 bidAttrs.SubsetOf(requirements) ✅ Provider only Runtime config, not in DB
4 MatchResourcesRequirements ✅ providerAttributeSignature JOIN Signed-attribute check
5 Storage > maxGroupVolumes ✅ Provider only Provider config, low risk
6 SignedBy (allOf+anyOf) ✅ providerAttributeSignature Full replication
7 DryRunReserve(gspec) ⚠️ Snapshot approx ✅ Real K8s scheduling Snapshots ~min old
8 CalculatePrice() ✅ Returns DecCoin Provider-specific
9 CanReserveHostnames() ✅ Provider only Provider-local state

Stage 1 = 5/9 checks (superset, no false negatives) | Stage 1+2 = 9/9 (exact)


Architecture

flowchart TD
    A[SDL] --> B["POST /v1/bid-screening"]
    subgraph S1["Stage 1 — DB (~65ms)"]
        B --> C[Dynamic SQL]
        C --> D[(Indexer DB)]
        D --> E{Matches?}
        E -->|0| F[Diagnosis]
        E -->|N| G[Ranked list]
    end
    subgraph S2["Stage 2 — Provider Calls"]
        G --> H["5-10 concurrent"]
        H --> I{3+ passed?}
        I -->|Yes| J[Return early]
        I -->|No, more| H
        I -->|Done| K{Any passed?}
    end
    J --> L[Price range + count]
    K -->|Yes| L
    K -->|0 passed| M
    F --> M[Blockers + feedback]
Loading

Stage 1 — DB Pre-filtering (#3055)

POST /v1/bid-screening — filters indexer DB for providers that could fulfill a deployment. Dynamic SQL with conditional JOINs (GPU/storage/attributes/signatures only when needed). ~65ms across all scenarios. Returns ranked providers + constraint diagnosis when 0 matches.


Stage 2 — Provider Calls

Calls each provider's /v1/bid-screening for real inventory + pricing.

Auth — should be unauthenticated

Current provider PR uses requireOwner (mTLS/JWT). But bid screening is read-only — doesn't reserve resources or create state. Should be unauthenticated like /v1/status. Action: raise on akash-network/provider#386.

Early-return strategy

  • 5-10 concurrent provider calls per batch
  • Return as soon as 3 providers pass (enough for price range + choice)
  • Stage 1 ranks by leaseCount — first batch most likely to pass
  • 5s per-provider timeout, 15s global cap

Open questions

  • Cache Stage 2 results briefly (~30s)?
  • Retry timed-out providers once or skip?

POC Results (72 online, 1887 total providers)

Scenario Matches Time
Small CPU 50 65ms
Medium CPU 50 66ms
Large CPU 30 67ms
Oversized 4 64ms
GPU A100 2 65ms
GPU RTX4090 4 66ms
GPU 8xH100 0 65ms
NVMe storage 27 67ms
Auditor filter 27 68ms
Region filter 4 66ms

~65ms consistently. DB round-trip dominated, not SQL complexity.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions