Build & Run

Riatix - Benchmark Suite for Point-Read Latency

Overview

This benchmark suite measures and compares point-read latency across multiple Azure data systems, specifically Cosmos DB serverless and Azure Storage (Blob), under identical, region-aligned conditions.

It was designed to validate the hypothesis that for workloads dominated by single-record lookups, a SQL + Blob Storage architecture offers lower latency and greater cost efficiency than a fully document-based store such as Cosmos DB.

Motivation

Enterprise systems often evolve toward schema-flexible or semi-structured data stores. However, most production access patterns remain point reads by ID, for example:

Retrieving a user profile by UserID
Fetching a configuration document by TenantID
Looking up a device telemetry snapshot by (LocationID, DeviceID)

While NoSQL systems such as Cosmos DB provide global distribution and flexible schema, they incur structural latency costs tied to consistency, partition routing, and RU accounting.

This project quantifies those trade-offs through direct, controlled measurements.

Architecture Under Test

A 3-step process:

Generate a synthetic dataset of small JSON documents (~1 KB each) simulating IoT telemetry data.
Ingest the dataset into two different storage architectures:
- SQL + Blob Storage (Document Registry pattern)
- Cosmos DB container with hierarchical partitioning
Run concurrent point-read benchmarks against both systems, measuring per-read latency.

1. Document Registry (SQL + Blob Model)

Component	Description
SQL Table	Maps (PartitionId, DocumentId) -> BlobUri
Blob Storage	Stores the actual JSON documents (~1 KB each)
Access Pattern	Two-step lookup: SQL SELECT -> Blob Download
Expected Behavior	Minimal control-plane overhead, network-limited latency

2. Cosmos DB Model

Component	Description
Container	iotreadings (partitioned by stateId, deviceId) with hierarchical partitioning
Item	JSON document identical to Blob content
Access Pattern	Direct ReadItemAsync(partitionKey, id)
Expected Behavior	Consistent reads, higher per-request overhead

Synthetic Dataset

Weather Data - IoT Telemetry Simulation

States: 50 U.S. states (AK, TX, CA, etc.)
Devices per State: 1000
Reads per Device: 100
Total Documents: 5,000,000+
Document Size: < 1 KB each

Example document:

{
    "id": "AK_sensor-AK-0000_00003",
    "stateId": "AK",
    "deviceId": "sensor-AK-0000",
    "timestamp": "20251117011804035",
    "temperature": 19.073329476673777,
    "humidity": 33.547862546307904,
    "battery": 81,
    "status": "ok",
    "geo": {
        "lat": 0,
        "lon": 0
    }    
}

Documents are grouped by state and device to simulate IoT telemetry ingestion.

Build & Run

This section explains how to build, configure, and run the suite end-to-end: synthetic data generation, Cosmos DB bulk upload, and the cross-store latency benchmark.

Contents

Prerequisites
Required environment variables
Benchmark Runner
CLI Usage
1. Generate synthetic data
2. Upload to Cosmos DB
3. Run the benchmark
Results Summary
Observations & Interpretation

Prerequisites

.NET SDK 9.0+
Azure resources (same region recommended, e.g., East US 2):
- Azure Storage Account (Blob)
  - [VNET integrated with private endpoint and public access, Default Access Tier: Hot, Replication: Zone-redundant (ZRS)]
- Azure Cosmos DB for NoSQL (Serverless).
  - (database + container will be auto-created)
  - [VNET integrated with private endpoint and public access, 4000RUs throughput]
- SQL Server / Azure SQL (for the registry table)
- Azure VM for running the benchmark (recommended)
Network access (SQL firewall rules as needed)

Required environment variables

Set these for default runs (or pass explicit CLI options):

AZURE_SQL_CONNECTION_STRING
- Example: Server=tcp:.database.windows.net,1433;Initial Catalog=;User ID=;Password=;Encrypt=True;...
AZURE_STORAGE_CONNECTION_STRING
- From your Storage Account “Access keys”
AZURE_COSMOSDB_CONNECTION_STRING
- From Cosmos DB Keys (primary connection string)

How to set (examples):

Windows (PowerShell): $env:AZURE_SQL_CONNECTION_STRING="..."
Linux/macOS: export AZURE_SQL_CONNECTION_STRING="..."

Benchmark Runner

The Benchmark Runner performs concurrent point-read tests against both systems.

Sample Size: 50,000 to 100,000 random document IDs
Parallelism: Configurable (default 20)
Region: East US 2 (VM, Cosmos DB, and Storage Account co-located)
Metrics Captured:
- Per-read latency (ms)
- Payload size
- P50 / P95 / P99 latency percentiles
- Average latency (mean)

CLI Usage

Build and Publish the project:

From the project root directory, run:

dotnet publish .\Riatix.DocumentRegistry.Benchmark\Riatix.DocumentRegistry.Benchmark.csproj -c Release -o ./publish

The dotnet publish command will restore, build, and publish the project

Navigate to the publish directory:

cd ./publish

The following commands are available:

Synthetic data generation: [Optionally write to blob storage]

dotnet rixbm.dll generate --target-container "iotreadings"

see help for options:
dotnet rixbm.dll generate --help

Description:
  Generate synthetic data for testing.

Usage:
  rixbm generate [options]

Options:
  --target-container <target-container> (REQUIRED)         The local path [<executing_directory>\<target-container>] or Azure storage container target to save generated files.  The container will be created in Azure Storage account if it does not exist.
  --storage-connection-string <storage-connection-string>  The connection string for Azure Blob Storage. If not provided, a default value from environment variable named 'AZURE_STORAGE_CONNECTION_STRING' will be used.
  --sql-connection-string <sql-connection-string>          The connection string for SQL Database. If not provided, a default value from environment variable named 'AZURE_SQL_CONNECTION_STRING' will be used.
  --report-dir <report-dir>                                The directory to save reports. [default: <executing_directory>\reports]
  --devices-per-state <devices-per-state>                  The number of devices to generate per state. The 'state' represents the US states [default: 1000]
  --readings-per-device <readings-per-device>              The number of readings to generate per device. [default: 100]
  --batch-size <batch-size>                                The number of metadata rows to insert per SQL batch. [default: 500]
  --write-to-blob                                          Whether to write generated files to Azure Blob Storage. Files will always be written to local disk.
  --test-run                                               Whether to run the sample test. Overrides --readings-per-device, --devices-per-state, and --batch-size.

*If you choose not to write to blob storage, files will be saved to local disk at: <executing_directory>\iotreadings and metadata will be inserted into SQL database. You may choose to upload these files to Azure Blob Storage later using your own tools or a tool like AzCopy.

Upload to Cosmos DB:

dotnet rixbm.dll cosmosdbupload --source-dir "<executing_directory>\iotreadings"

see help for options:
dotnet rixbm.dll cosmosdbupload --help

Description:
  Upload files to Cosmos DB.

Usage:
  rixbm cosmosdbupload [options]

Options:
  --source-dir <source-dir>                                        The directory containing the files to upload. [default:
                                                                   <executing_directory>\iotreadings]
  --cosmosdb-connection-string <cosmosdb-connection-string>        The connection string for Cosmos DB. If not provided, a default value from environment variable named 'AZURE_COSMOSDB_CONNECTION_STRING' will be used.
  --cosmosdb-name <cosmosdb-name>                                  The name of the Cosmos DB database. If not provided, a default value of 'iotdb' will be used. [default: iotdb]
  --cosmosdb-container-name <cosmosdb-container-name>              The name of the Cosmos DB container. If not provided, a default value of 'iotreadings' will be used. [default: iotreadings]
  --cosmosdb-retries-per-document <cosmosdb-retries-per-document>  The number of retries for each document upload. [default: 5]
  --reports-dir <reports-dir>                                      The directory to save reports. [default: <executing_directory>\reports]
  --test-run                                                       Whether to run the sample test.

Run the benchmark:

dotnet rixbm.dll benchmark --sample-size 1000

see help for options:
dotnet rixbm.dll benchmark --help

Description:
  Run benchmark tests.

Usage:
  rixbm benchmark [options]

Options:
  --storage-connection-string <storage-connection-string>    The connection string for Azure Blob Storage. If not provided, a default value from environment variable named 'AZURE_STORAGE_CONNECTION_STRING' will be used.
  --sql-connection-string <sql-connection-string>            The connection string for SQL Database. If not provided, a default value from environment variable named 'SQL_CONNECTION_STRING' will be used.
  --cosmosdb-connection-string <cosmosdb-connection-string>  The connection string for Cosmos DB. If not provided, a default value from environment variable named 'AZURE_COSMOSDB_CONNECTION_STRING' will be used.
  --cosmosdb-name <cosmosdb-name>                            The name of the Cosmos DB database. If not provided, a default value of 'iotdb' will be used. [default: iotdb]
  --cosmosdb-container-name <cosmosdb-container-name>        The name of the Cosmos DB container. If not provided, a default value of 'iotreadings' will be used. [default: iotreadings]
  --sample-size <sample-size>                                The number of documents to use for the benchmark tests. [default: 500000]
  --parallelism <parallelism>                                The degree of parallelism to use for the benchmark tests. Defaults to the number of processors on the machine. [default: 20]
  --test-run                                                 Whether to run the sample test.

Benchmark results are written to:

<executing_directory>\reports\latency_results.csv
<executing_directory>\reports\latency_summary.json

Results Summary

Environment:

VM: East US 2 (same region as Cosmos DB and Storage Account)
Runtime: .NET 9.0
OS: Unix 6.14.0.1012 [Ubuntu 24 LTS]

Networking: Both Storage Account and Cosmos DB configured with public access enabled.

Metric	Storage Account	Cosmos DB	Ratio (Cosmos/Blob)
Average	7.70 ms	93.33 ms	12.1x slower
P50	3.87 ms	82.45 ms	21.3x slower
P95	30.05 ms	213.43 ms	7.1x slower
P99	58.15 ms	264.87 ms	4.6x slower
Samples	100,000	100,000	-

Latency Comparison Chart:

Networking: Both Storage Account and Cosmos DB configured with private endpoints (VNET integration).

Metric	Storage (Public)	Storage (Private)	Cosmos DB (Public)	Cosmos DB (Private)	Delta (Private - Public, Storage)
Average (ms)	7.70	16.05	93.33	93.60	+8.35 (2.1x slower)
P50 (ms)	3.87	10.10	82.45	81.74	+6.23 (2.6x slower)
P95 (ms)	30.05	52.92	213.43	214.97	+22.87 (1.8x slower)
P99 (ms)	58.15	76.57	264.87	269.94	+18.42 (1.3x slower)

Observation: Even under identical regional conditions, Cosmos DB point reads were consistently 10 to 20 times slower than equivalent reads from Blob Storage. It could be that Cosmos DB's consistency and metadata guarantees introduce a structural latency floor.

Observations & Interpretation

Blob + SQL Model Advantages

Near hardware-level latency (3 to 8 ms median)
Predictable cost per operation (no RUs)
Simple, deterministic architecture
Ideal for large-scale read-heavy workloads

Cosmos DB Model Characteristics

Predictable but higher base latency (80 to 100 ms)
RU-based cost and consistency trade-offs
Suited for multi-region writes and flexible schema ingestion

Architectural Implications

Use Case	Recommended Store
Point reads by ID, read-heavy	SQL + Blob Storage
Analytical or transactional workloads	Cosmos DB
Multi-region conflict resolution	Cosmos DB
Massive immutable reads	Blob Storage

Artifacts Produced

File	Description
latency_comparison.csv	Raw per-sample latency measurements
latency_summary.json	Summary statistics (avg, P50, P95, P99)

Example Summary File

{
  "StorageAccount": {
    "averageMs": 7.70,
    "p50Ms": 3.87,
    "p95Ms": 30.05,
    "p99Ms": 58.15,
    "samples": 100000,
    "generatedUtc": "2025-11-18T19:36:24Z",
    "machine": "rixbmvm",
    "os": "Unix 6.14.0.1012",
    "clr": "9.0.11"
  },
  "CosmosDB": {
    "averageMs": 93.33,
    "p50Ms": 82.45,
    "p95Ms": 213.43,
    "p99Ms": 264.87,
    "samples": 100000,
    "generatedUtc": "2025-11-18T19:36:24Z",
    "machine": "rixbmvm",
    "os": "Unix 6.14.0.1012",
    "clr": "9.0.11"
  }
}

Future Extensions

If this is a viable architectural approach for your point read workloads, then this could be extended to build a highly resilient, globally distributed data access layer:

Multi-Cloud SQL + Blob replication with geo-failover
Cross-Cloud data access layer for hybrid scenarios
Integration with edge computing nodes for local caching
Integrate Azure Identity (service principals, managed identities) inplace of connection strings for secure access.

Evidences

Benchmark Summary:
- Latency Results: latency_results.csv
- Latency Summary: latency_summary.json
- Storage Account Requests:
- Cosmos DB Requests:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Riatix.DocumentRegistry.Benchmark		Riatix.DocumentRegistry.Benchmark
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
rixbm.sln		rixbm.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Riatix - Benchmark Suite for Point-Read Latency

Overview

Motivation

Architecture Under Test

A 3-step process:

1. Document Registry (SQL + Blob Model)

2. Cosmos DB Model

Synthetic Dataset

Build & Run

Prerequisites

Required environment variables

Benchmark Runner

CLI Usage

Synthetic data generation: [Optionally write to blob storage]

Upload to Cosmos DB:

Results Summary

Observations & Interpretation

Blob + SQL Model Advantages

Cosmos DB Model Characteristics

Architectural Implications

Artifacts Produced

Example Summary File

Future Extensions

Evidences

License

About

Uh oh!

Releases

Packages

Languages

License

krishjag/riatix_bm

Folders and files

Latest commit

History

Repository files navigation

Riatix - Benchmark Suite for Point-Read Latency

Overview

Motivation

Architecture Under Test

A 3-step process:

1. Document Registry (SQL + Blob Model)

2. Cosmos DB Model

Synthetic Dataset

Build & Run

Prerequisites

Required environment variables

Benchmark Runner

CLI Usage

Synthetic data generation: [Optionally write to blob storage]

Upload to Cosmos DB:

Results Summary

Observations & Interpretation

Blob + SQL Model Advantages

Cosmos DB Model Characteristics

Architectural Implications

Artifacts Produced

Example Summary File

Future Extensions

Evidences

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages