DuckDB Elixir

A DuckDB client for Elixir, providing a powerful Relation API for analytical queries.

Status: 🚀 Core Relation API implemented and tested - Ready for evaluation

About

DuckdbEx brings DuckDB's analytical power to Elixir with:

Lazy Relation API: Composable, chainable query building inspired by Python's DuckDB API
CLI-based Architecture: Uses DuckDB CLI via erlexec for maximum portability
Idiomatic Elixir: Functional, pipe-friendly API with pattern matching
Comprehensive Testing: TDD approach with 158 passing tests
Python API Alignment: SQL and Relation behavior mirror duckdb-python where feasible with the CLI backend

Installation

Requirements

DuckdbEx runs the DuckDB CLI via erlexec.

Recommended: mix duckdb_ex.install (installs to your project’s priv/duckdb/duckdb, or priv/duckdb/duckdb.exe on Windows)
Or install the DuckDB CLI (duckdb) and ensure it is on PATH, or set DUCKDB_PATH to the binary.
Optional: set DUCKDB_EX_EXEC_AS_ROOT=1 to run the CLI as root (helpful in containers that require it).

mix duckdb_ex.install

# Add to mix.exs
def deps do
  [
    {:duckdb_ex, "~> 0.2.0"}
  ]
end

Quick Start

💡 New! Check out the examples/README.md guide for 8 comprehensive, runnable examples:
mix run examples/00_quickstart.exs
mix run examples/01_basic_queries.exs
mix run examples/02_tables_and_data.exs
# ... and more!

Configuration

Resolution order: config :duckdb_ex, :duckdb_path → DUCKDB_PATH → priv/duckdb/duckdb → duckdb in PATH → /usr/local/bin/duckdb
DUCKDB_EX_EXEC_AS_ROOT - Set to 1 or true to run the CLI as root (useful in some containerized environments).

Basic Connection and Queries

# Connect to in-memory database
{:ok, conn} = DuckdbEx.Connection.connect(:memory)

# Or connect to a file
{:ok, conn} = DuckdbEx.Connection.connect("/path/to/database.duckdb")

# Execute SQL directly
{:ok, result} = DuckdbEx.Connection.execute_result(conn, "SELECT 42 as answer")

# Fetch all rows as tuples
{:ok, rows} = DuckdbEx.Connection.fetch_all(conn, "SELECT * FROM users")
# => [{1, "Alice"}, ...]

# Fetch single row
{:ok, row} = DuckdbEx.Connection.fetch_one(conn, "SELECT * FROM users LIMIT 1")
# => {1, "Alice"}

# Close connection
DuckdbEx.Connection.close(conn)

Relation API - Lazy Query Building

The Relation API allows you to build complex queries through method chaining, with execution deferred until you fetch results:

# Create a relation (no execution yet)
relation = DuckdbEx.Connection.table(conn, "orders")

# Chain operations (still no execution)
result = relation
|> DuckdbEx.Relation.filter("amount > 100")
|> DuckdbEx.Relation.project(["customer_name", "amount", "order_date"])
|> DuckdbEx.Relation.order("amount DESC")
|> DuckdbEx.Relation.limit(10)
|> DuckdbEx.Relation.fetch_all()  # Executes here

# Result: Top 10 orders over $100
{:ok, rows} = result

Working with Relations

Creating Relations

# From a table or view
relation = DuckdbEx.Connection.table(conn, "products")

# From SQL
relation = DuckdbEx.Connection.sql(conn, "SELECT * FROM generate_series(1, 100)")

# From range (using DuckDB's range function)
relation = DuckdbEx.Connection.sql(conn, "SELECT * FROM range(10)")

# From values
relation = DuckdbEx.Connection.values(conn, [1, "a"])
relation = DuckdbEx.Connection.values(conn, [{1, "a"}, {2, "b"}])

Filtering Data

# Simple filter
relation
|> DuckdbEx.Relation.filter("price > 50")
|> DuckdbEx.Relation.fetch_all()

# Chain multiple filters (AND logic)
relation
|> DuckdbEx.Relation.filter("price > 50")
|> DuckdbEx.Relation.filter("category = 'Electronics'")
|> DuckdbEx.Relation.fetch_all()

# Complex conditions
relation
|> DuckdbEx.Relation.filter("price > 50 AND (category = 'Electronics' OR category = 'Computers')")
|> DuckdbEx.Relation.fetch_all()

Selecting Columns

# Select specific columns
relation
|> DuckdbEx.Relation.project(["name", "price"])
|> DuckdbEx.Relation.fetch_all()

# Use expressions
relation
|> DuckdbEx.Relation.project([
  "name",
  "price",
  "price * 1.1 as price_with_tax",
  "upper(category) as category_upper"
])
|> DuckdbEx.Relation.fetch_all()

Sorting and Limiting

# Order by column
relation
|> DuckdbEx.Relation.order("price DESC")
|> DuckdbEx.Relation.fetch_all()

# Multiple columns
relation
|> DuckdbEx.Relation.order("category ASC, price DESC")
|> DuckdbEx.Relation.fetch_all()

# Limit results
relation
|> DuckdbEx.Relation.limit(100)
|> DuckdbEx.Relation.fetch_all()

# Top-N query
relation
|> DuckdbEx.Relation.order("revenue DESC")
|> DuckdbEx.Relation.limit(10)
|> DuckdbEx.Relation.fetch_all()

Aggregations

Simple Aggregations

# Count all rows
relation
|> DuckdbEx.Relation.aggregate("count(*) as total")
|> DuckdbEx.Relation.fetch_all()
# => {:ok, [{1000}]}

# Multiple aggregations
relation
|> DuckdbEx.Relation.aggregate([
  "count(*) as count",
  "sum(amount) as total",
  "avg(amount) as average",
  "min(amount) as minimum",
  "max(amount) as maximum"
])
|> DuckdbEx.Relation.fetch_all()

GROUP BY Aggregations

# Group by single column
DuckdbEx.Connection.table(conn, "sales")
|> DuckdbEx.Relation.aggregate(
  "sum(amount) as total_sales",
  group_by: ["region"]
)
|> DuckdbEx.Relation.fetch_all()

# Group by multiple columns
DuckdbEx.Connection.table(conn, "sales")
|> DuckdbEx.Relation.aggregate(
  ["sum(amount) as total", "count(*) as count"],
  group_by: ["region", "year"]
)
|> DuckdbEx.Relation.fetch_all()

# With filtering and ordering
DuckdbEx.Connection.table(conn, "products")
|> DuckdbEx.Relation.filter("price > 10")  # WHERE clause
|> DuckdbEx.Relation.aggregate(
  ["sum(price) as total", "count(*) as count"],
  group_by: ["category"]
)
|> DuckdbEx.Relation.filter("total > 1000")  # HAVING clause
|> DuckdbEx.Relation.order("total DESC")
|> DuckdbEx.Relation.fetch_all()

Convenience Aggregate Methods

# Count rows
relation |> DuckdbEx.Relation.count() |> DuckdbEx.Relation.fetch_all()
# => {:ok, [{100}]}

# Sum a column
relation |> DuckdbEx.Relation.sum("amount") |> DuckdbEx.Relation.fetch_all()
# => {:ok, [{45000}]}

# Average
relation |> DuckdbEx.Relation.avg("price") |> DuckdbEx.Relation.fetch_all()
# => {:ok, [{42.5}]}

# Min/Max
relation |> DuckdbEx.Relation.min("temperature") |> DuckdbEx.Relation.fetch_all()
relation |> DuckdbEx.Relation.max("score") |> DuckdbEx.Relation.fetch_all()

Complete Examples

E-commerce Analytics

{:ok, conn} = DuckdbEx.Connection.connect(:memory)

# Create and populate table
DuckdbEx.Connection.execute(conn, """
  CREATE TABLE orders (
    order_id INTEGER,
    customer_name VARCHAR,
    product_category VARCHAR,
    amount DECIMAL(10,2),
    order_date DATE
  )
""")

DuckdbEx.Connection.execute(conn, """
  INSERT INTO orders VALUES
    (1, 'Alice', 'Electronics', 999.99, '2024-01-15'),
    (2, 'Bob', 'Books', 29.99, '2024-01-16'),
    (3, 'Alice', 'Electronics', 49.99, '2024-01-17'),
    (4, 'Charlie', 'Furniture', 599.99, '2024-01-18'),
    (5, 'Bob', 'Electronics', 299.99, '2024-01-19')
""")

# Analyze: Top customers by total spending in Electronics
{:ok, top_customers} =
  conn
  |> DuckdbEx.Connection.table("orders")
  |> DuckdbEx.Relation.filter("product_category = 'Electronics'")
  |> DuckdbEx.Relation.aggregate(
    ["sum(amount) as total_spent", "count(*) as order_count"],
    group_by: ["customer_name"]
  )
  |> DuckdbEx.Relation.filter("total_spent > 100")
  |> DuckdbEx.Relation.order("total_spent DESC")
  |> DuckdbEx.Relation.fetch_all()

# Result:
# [
#   {"Alice", 1049.98, 2},
#   {"Bob", 299.99, 1}
# ]

Time Series Analysis

# Daily sales aggregation with statistical measures
{:ok, daily_stats} =
  conn
  |> DuckdbEx.Connection.table("sales")
  |> DuckdbEx.Relation.aggregate(
    [
      "date_trunc('day', timestamp) as day",
      "sum(amount) as daily_total",
      "avg(amount) as daily_avg",
      "stddev_pop(amount) as daily_stddev",
      "count(*) as transaction_count"
    ],
    group_by: ["date_trunc('day', timestamp)"]
  )
  |> DuckdbEx.Relation.order("day DESC")
  |> DuckdbEx.Relation.limit(30)
  |> DuckdbEx.Relation.fetch_all()

Data Pipeline

defmodule DataPipeline do
  def process_sales_data(conn) do
    # Reusable base relation
    base = DuckdbEx.Connection.table(conn, "raw_sales")

    # High-value customers
    high_value = base
    |> DuckdbEx.Relation.filter("total_purchases > 1000")
    |> DuckdbEx.Relation.project(["customer_id", "email"])

    # Recent activity
    recent = base
    |> DuckdbEx.Relation.filter("order_date > '2024-01-01'")
    |> DuckdbEx.Relation.aggregate(
      "count(*) as recent_orders",
      group_by: ["customer_id"]
    )

    # Execute both queries
    {:ok, high_value_customers} = DuckdbEx.Relation.fetch_all(high_value)
    {:ok, recent_activity} = DuckdbEx.Relation.fetch_all(recent)

    {high_value_customers, recent_activity}
  end
end

Working with DuckDB Functions

# Use DuckDB's built-in functions
conn
|> DuckdbEx.Connection.sql("SELECT * FROM range(100)")
|> DuckdbEx.Relation.filter("range % 2 = 0")  # Even numbers only
|> DuckdbEx.Relation.project(["range", "range * range as squared"])
|> DuckdbEx.Relation.fetch_all()

# Generate test data
conn
|> DuckdbEx.Connection.sql("SELECT * FROM generate_series(1, 1000) as id")
|> DuckdbEx.Relation.project([
  "id",
  "random() as random_value",
  "case when id % 2 = 0 then 'even' else 'odd' end as parity"
])
|> DuckdbEx.Relation.aggregate(
  ["avg(random_value) as avg_random", "count(*) as count"],
  group_by: ["parity"]
)
|> DuckdbEx.Relation.fetch_all()

API Reference

DuckdbEx.Connection

connect(database, opts \\ []) - Open database connection
execute(conn, sql, params \\ []) - Execute SQL query
execute_result(conn, sql, params \\ []) - Execute and return result struct
executemany(conn, sql, params_list) - Execute SQL with multiple parameter sets
fetch_all(conn, sql) - Execute and fetch all rows
fetch_one(conn, sql) - Execute and fetch first row
fetch_many(conn, sql, count) - Execute and fetch N rows
close(conn) - Close connection
sql(conn, sql) - Create relation from SQL
table(conn, table_name) - Create relation from table
view(conn, view_name) - Create relation from view
values(conn, values) - Create relation from values
read_csv/read_json/read_parquet - Create relation from files

DuckdbEx.Relation

Transformations (lazy, return new relation):

project(relation, columns) - Select columns
filter(relation, condition) - Filter rows
limit(relation, n, offset \\ 0) - Limit results
order(relation, order_by) - Sort results
sort(relation, columns) - Sort alias (list or string)
unique(relation, columns) - Distinct values for columns
aggregate(relation, expressions, opts \\ []) - Aggregate data

Convenience Aggregates:

count(relation) - Count rows
sum(relation, column) - Sum column
avg(relation, column) - Average column
min(relation, column) - Minimum value
max(relation, column) - Maximum value

Execution (trigger query execution):

execute(relation) - Execute and return result struct
fetch_all(relation) - Execute and fetch all rows
fetch_one(relation) - Execute and fetch first row
fetch_many(relation, n) - Execute and fetch N rows

Table/View Operations:

create(relation, table_name) / to_table/2
create_view(relation, view_name, opts \\ []) / to_view/3
insert_into(relation, table_name)
insert(relation, values)
update(relation, set_map, condition \\ nil)

Export:

to_csv(relation, path, opts \\ [])
to_parquet(relation, path, opts \\ [])

DuckdbEx.Result

fetch_all(result) - Get all rows as list of tuples
fetch_one(result) - Get first row as tuple
fetch_many(result, n) - Get N rows as list of tuples
row_count(result) - Get number of rows
columns(result) - Get column names
to_tuples(result) - Convert rows to tuples

Architecture

DuckdbEx uses the DuckDB CLI process via erlexec instead of native NIFs:

Advantages:

✅ Maximum portability (works everywhere DuckDB CLI works)
✅ No compilation needed
✅ Easy to debug and maintain
✅ Handles core SQL features automatically

Trade-offs:

JSON serialization overhead (minimal for analytical queries)
No zero-copy data transfer
No native UDF registration or embedded APIs (Arrow/Polars/Nx)

This architecture is ideal for analytical workloads where query execution time dominates, and the JSON overhead is negligible compared to query processing.

Examples

The examples/ directory contains 8 comprehensive, runnable examples demonstrating all features:

Example	Description	Run With
`00_quickstart.exs`	Your first DuckDB query	`mix run examples/00_quickstart.exs`
`01_basic_queries.exs`	Simple queries, math, strings, dates	`mix run examples/01_basic_queries.exs`
`02_tables_and_data.exs`	CREATE, INSERT, UPDATE, DELETE	`mix run examples/02_tables_and_data.exs`
`03_transactions.exs`	Transaction management	`mix run examples/03_transactions.exs`
`04_relations_api.exs`	Lazy query building	`mix run examples/04_relations_api.exs`
`05_csv_parquet_json.exs`	Reading/writing files	`mix run examples/05_csv_parquet_json.exs`
`06_analytics_window_functions.exs`	Advanced analytics	`mix run examples/06_analytics_window_functions.exs`
`07_persistent_database.exs`	File-based databases	`mix run examples/07_persistent_database.exs`

See examples/README.md for detailed descriptions and more information. Run everything with examples/run_all.sh.

Guides

docs/guides/installation.md
docs/guides/configuration.md
docs/guides/connections.md
docs/guides/relations.md
docs/guides/data_io.md
docs/guides/types_expressions.md
docs/guides/results.md
docs/guides/errors.md
docs/guides/performance_limitations.md
docs/guides/migration_from_python.md
docs/guides/testing_contributing.md

Testing

# Run all tests
mix test

# Run specific test file
mix test test/duckdb_ex/relation_test.exs

# Run with coverage
mix test --cover

# Run with specific seed
mix test --seed 123456

Current Test Coverage: 158 tests, 100% pass rate (after performance optimization)

Development Status

✅ Implemented

Core Connection API:

Connection management (connect, close, default connection helpers)
Query execution (execute, executemany) with parameter binding
Result fetching (fetch_all, fetch_one, fetch_many)
Statement parsing (extract_statements)
DB-API metadata (description, rowcount)
Cursor/duplicate support (Cursor wrapper)
Exception hierarchy (27 types)
Transaction management (begin, commit, rollback, transaction helper)
Checkpoint support
Read-only connections

Relation API - Basic Operations:

Relation creation (sql, table, view, values)
Projections (project)
Filtering (filter)
Ordering (order, sort)
Limiting (limit with offset)
Lazy evaluation

Relation API - Aggregations:

Generic aggregation (aggregate)
GROUP BY support
HAVING clause (via filter after aggregate)
Convenience methods (count, sum, avg, min, max)
Custom aggregate expressions (e.g., stddev_pop, variance) via aggregate/2

Relation API - Advanced:

Joins (inner, left, right, outer, cross)
Set operations (union, intersect, except)
Distinct operations

Relation API - Mutations:

Create tables/views from relations (create, create_view)
Insert rows (insert, insert_into)
Update table relations (update)

File Format Support:

CSV reading/writing (read_csv, to_csv)
Parquet reading/writing (read_parquet, to_parquet)
JSON reading (read_json)
Direct file querying via SQL table functions

Performance:

Optimized query execution (100-200x faster via completion markers)
Tests run in ~1 second (previously took minutes due to timeouts)

📋 Planned (Phase 2+)

Explorer DataFrame integration
Prepared statements
Extensions management
Streaming results

Contributing

This project follows strict Test-Driven Development (TDD):

RED: Write failing tests first
GREEN: Implement minimal code to pass tests
REFACTOR: Improve code while keeping tests green
DOCUMENT: Add comprehensive docs and examples

All contributions should:

Include comprehensive tests
Follow existing code style
Reference Python API where applicable
Maintain 100% test pass rate

Comparison with Python API

# Python DuckDB
import duckdb
conn = duckdb.connect()
rel = conn.table('users')
result = (rel
  .filter('age > 25')
  .project(['name', 'email'])
  .order('name')
  .limit(10)
  .fetchall())

# Elixir DuckDB
{:ok, conn} = DuckdbEx.Connection.connect(:memory)
{:ok, rows} = conn
|> DuckdbEx.Connection.table("users")
|> DuckdbEx.Relation.filter("age > 25")
|> DuckdbEx.Relation.project(["name", "email"])
|> DuckdbEx.Relation.order("name")
|> DuckdbEx.Relation.limit(10)
|> DuckdbEx.Relation.fetch_all()

API is intentionally similar for easy migration!

Performance

DuckdbEx uses a completion marker approach for deterministic query completion detection instead of timeouts:

100-200x faster query execution (7-12ms vs 1000-2000ms per query)
Full test suite runs in ~1 second (158 tests)
No arbitrary timeouts or guessing
Proper error handling for aborted transactions

How It Works

Instead of waiting for timeouts, we append a marker query after each command:

-- Your query
SELECT * FROM users;
-- Completion marker (added automatically)
SELECT '__DUCKDB_COMPLETE__' as __status__;

When we see the marker in the output, we know DuckDB is done. The marker is stripped before returning results to you.

Why This Approach

Deterministic: We know exactly when queries complete
Fast: No waiting for arbitrary timeouts
Reliable: Works for all query types (SELECT, DDL, DML)
Error-aware: Special handling for aborted transactions

Performance Considerations

DuckDB excels at analytical queries on large datasets
Relation API allows DuckDB to optimize entire query tree
JSON overhead is minimal compared to query execution time
Best for OLAP workloads, not OLTP

Requirements

Elixir 1.18+
Erlang/OTP 27+
DuckDB CLI installed (PATH or DUCKDB_PATH)

License

MIT License - see LICENSE file for details.

Acknowledgments

DuckDB - The amazing analytical database
DuckDB Python API - API design inspiration
Community contributors

Support

For questions and discussions:

Open an issue on GitHub
Check DuckDB documentation
Review the docs/ directory for detailed guides

Made with ❤️ for the Elixir and DuckDB communities

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
config		config
docs		docs
examples		examples
lib		lib
test		test
.credo.exs		.credo.exs
.formatter.exs		.formatter.exs
.gitignore		.gitignore
AGENT_PROMPT.md		AGENT_PROMPT.md
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
PHASE_1_COMPLETE.md		PHASE_1_COMPLETE.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICK_START_CHECKLIST.md		QUICK_START_CHECKLIST.md
README.md		README.md
docker-compose.yml		docker-compose.yml
mix.exs		mix.exs
mix.lock		mix.lock

License

nshkrdotcom/duckdb_ex

Folders and files

Latest commit

History

Repository files navigation

DuckDB Elixir

About

Installation

Requirements

Quick Start

Configuration

Basic Connection and Queries

Relation API - Lazy Query Building

Working with Relations

Creating Relations

Filtering Data

Selecting Columns

Sorting and Limiting

Aggregations

Simple Aggregations

GROUP BY Aggregations

Convenience Aggregate Methods

Complete Examples

E-commerce Analytics

Time Series Analysis

Data Pipeline

Working with DuckDB Functions

API Reference

DuckdbEx.Connection

DuckdbEx.Relation

DuckdbEx.Result

Architecture

Examples

Guides

Testing

Development Status

✅ Implemented

📋 Planned (Phase 2+)

Contributing

Comparison with Python API

Performance

How It Works

Why This Approach

Performance Considerations

Requirements

License

Acknowledgments

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages