This repository serves as the central guideline for evaluating and benchmarking independent open-source product analytics tools.
The objective is to identify a solution that grants full infrastructure control while handling high-volume event ingestion. We are looking for a self-hosted stack that combines robust raw data access with ready-to-use visualization interfaces.
All candidates in this repository are evaluated against these five non-negotiable pillars:
- Open Source / On-Premises: Must be deployable on our own infrastructure.
- Infrastructure: Native support for Docker and Kubernetes (K8s).
- Scalability: Capability to ingest and process high-volume event streams.
- Data Accessibility: Direct access to raw event data (SQL/ClickHouse/API) for custom modeling.
- Visualization: Includes a pre-built, feature-rich dashboard for immediate insights.
The repository is organized as a collection of isolated services. Each directory represents a self-contained module—whether it is an analytics suite, an ingestion pipeline, or a traffic generator—ensuring unbiased performance testing with no shared dependencies.
Open-Product-Engagement-Stack
│
├── _test-website/ # Service: Traffic Generator & Tracking Verification
├── countly/ # Service: Analytics Candidate
├── jitsu/ # Service: Ingestion Candidate
├── matomo/ # Service: Analytics Candidate
├── openreplay/ # Service: Analytics Candidate
├── posthog/ # Service: Analytics Candidate
├── rudderstack/ # Service: Ingestion Candidate
└── README.md
- Analytics & Ingestion Services: Each candidate directory contains
k8s/manifests for production anddocker/compose files for local verification, along with specific architectural documentation. - Traffic Generator (
_test-website): A standalone service responsible for generating synthetic user traffic. It includes a static site, tracking snippets, and Nginx configuration. It is used to simulate user journeys locally to verify that the analytics containers are receiving and visualizing data correctly.
The following tools are being benchmarked. The list includes full Analytics Suites (Dashboard + Storage) and pure Ingestion Pipelines.
| Tool | Focus & Tech Stack | GitHub Stars | Demo | Pros | Cons |
|---|---|---|---|---|---|
| PostHog | Product Suite (ClickHouse, Postgres, Redis, Kafka) |
Link | Unmatched feature set (Flags, Heatmaps, Session Rec); Direct SQL access to ClickHouse. | High operational overhead at scale (>100k events/mo); K8s maintenance is complex. | |
| Matomo | Web Analytics (MySQL/MariaDB, PHP) |
Link | "Gold standard" for GDPR/Compliance; Mature data ownership model; Easy setup. | UI performance degrades with massive datasets; Interface feels legacy/dated. | |
| OpenReplay | Session Replay (Postgres, Redis, ClickHouse, MinIO) |
Best self-hosted replay; Includes DevTools (Network/Console logs) for debugging. | High storage requirements (DOM/Video data); Complex microservice architecture. | ||
| Countly | Mobile & IoT (MongoDB, Node.js) |
Link | Excellent Mobile SDKs; Granular user-level tracking; Extensible plugin system. | High RAM usage at scale; Android documentation lags behind iOS. | |
| RudderStack | CDP / Routing (Postgres [Config], Warehouse [Data]) |
Link | Warehouse-first approach; Extensive integration library; Decouples data from tools. | Steep learning curve; Self-hosted control plane is complex to configure. | |
| Jitsu | Ingestion / ETL (Redis, Go, Warehouse destination) |
Extremely fast ingestion; Scriptable JS transforms; Simple Docker deployment. | Smaller community ecosystem; Documentation is less "enterprise" polished. |
To ensure a unified decision-making process, every tool must be tested against the following dimensions:
- Throughput: Maximum events per second (EPS) before ingestion lag occurs.
- Query Speed: Time to render complex funnel queries on datasets >10M events.
- Resource Usage: CPU/RAM footprint under load in a Kubernetes environment.
- Raw Access: Ease of querying the underlying database (e.g., ClickHouse, Postgres) directly.
- Exportability: Capability to export bulk data to a data lake without proprietary locks.
- Retention: Configurable data retention policies to manage storage costs.
- Deployment: Ease of K8s deployment (Helm chart maturity).
- Maintenance: Complexity of upgrades, migrations, and backups.
- High Availability: Support for clustering and horizontal scaling.
- Dashboards: Quality of out-of-the-box visualisation (Funnels, Retention, Trends).
- SDK Support: Availability and stability of SDKs for our client platforms.