Skip to content

con.table() materialises the full result set when called on a registered view #358

@RobinL

Description

@RobinL

What happens?

Calling con.table(name) on a name registered via con.register() immediately executes and materialises the entire result set, rather than returning a lazy relation. The equivalent con.sql("FROM name") returns a lazy relation with no immediate execution.

The description (from the docs) of the con.table() method is:

"Create a relation object for the named table"

So I can see an argument that this is not a bug. And the user probably should be using con.view(name).

But if it is not a bug, I think con.table() should raise an error when called against a view, or the docs should explicitly say that it will create a table corresponding to the view.

However, I think it is probably unintentional, and it's a bit of a footgun - DuckDB is usually lazy, and nowhere does it explicitly say that a call to con.table(name) will materialise the full table.

I am not a c++ programmer, and the following comes from investigation using an LLM, but the c++ seems to substantiate the view that this behaviour is unintentional

Evidence this behaviour is unintentional

Why this appears to be a code oversight rather than intended behaviour

RunQuery (the internal method that con.sql() routes through) already has a lazy path. It only takes that path when the params argument is None:

// In RunQuery — the lazy path is explicitly gated on params being None:
if (py::none().is(params)) {
    // creates a lazy QueryRelation — no execution yet
}
// otherwise: executes immediately and materialises into a ColumnDataCollection

When con.sql() is exposed to Python, params is explicitly defaulted to None to ensure the lazy path is taken:

m.def("sql", &DuckDBPyConnection::RunQuery,
      py::arg("query"), py::kw_only(), py::arg("alias") = "", py::arg("params") = py::none());

However, when con.table() falls back to RunQuery for views, it omits the params argument entirely:

unique_ptr<DuckDBPyRelation> DuckDBPyConnection::Table(const string &tname) {
    try {
        return make_uniq<DuckDBPyRelation>(connection.Table(...));  // fails for views
    } catch (const CatalogException &) {
        auto sql_query = StringUtil::Format("from %s", ...);
        return RunQuery(py::str(sql_query), tname);  // <-- params not passed
    }
}

When params is omitted, the C++ default kicks in — which is py::list(), not py::none(). This means the lazy branch is never taken, and the result is always fully materialised.

The lazy infrastructure is in place and working correctly for con.sql(). The fallback in Table() appears to have simply omitted the py::none() argument that would enable it.

To Reproduce

Run with uv run this_script.py

# /// script
# dependencies = [
#   "duckdb==1.4.4",
#   "psutil",
# ]
# ///

"""
Reproducer for: con.table() on a registered view silently materialises all rows.

con.register() creates a temporary VIEW. When con.table() is called on that view,
it falls back to RunQuery() but passes py::list() (the C++ default) instead of
py::none() for the `params` argument. This skips the lazy QueryRelation path and
instead fully executes + materialises the query.

Expected: con.table("name") returns a lazy relation (same as con.sql("FROM name")).
Actual:   con.table("name") materialises the entire result set, taking O(n) time.
"""

import os
import time
import duckdb
import psutil


def rss_mb():
    """Current process RSS in MB."""
    return psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024


con = duckdb.connect()

# Create a table with 100M rows (~800 MB of BIGINT data)
con.sql("CREATE TABLE big AS SELECT range AS id FROM range(100_000_000)")

rel = con.sql("SELECT * FROM big")
con.register("my_view", rel)

# --- con.sql(): lazy, no materialisation ---
mem_before_sql = rss_mb()
start = time.perf_counter()
r1 = con.sql("FROM my_view")
t_sql = time.perf_counter() - start
mem_after_sql = rss_mb()

# --- con.table(): materialises all rows ---
mem_before_table = rss_mb()
start = time.perf_counter()
r2 = con.table("my_view")
t_table = time.perf_counter() - start
mem_after_table = rss_mb()

# --- Show the relation types (proves materialisation) ---
print("=== Relation types ===")
print(
    f"  con.sql()   returns: {type(r1.rel).__name__ if hasattr(r1, 'rel') else type(r1).__name__}"
)
print(
    f"  con.table() returns: {type(r2.rel).__name__ if hasattr(r2, 'rel') else type(r2).__name__}"
)
print()

# --- Timing ---
print("=== Timing ===")
print(f"  con.sql('FROM my_view'):  {t_sql:.4f}s")
print(f"  con.table('my_view'):     {t_table:.4f}s")
print(f"  table() is {t_table / t_sql:.0f}x slower")
print()

# --- Memory ---
print("=== Process RSS memory ===")
print(f"  Before con.sql():   {mem_before_sql:.0f} MB")
print(
    f"  After  con.sql():   {mem_after_sql:.0f} MB  (delta: +{mem_after_sql - mem_before_sql:.0f} MB)"
)
print(f"  Before con.table(): {mem_before_table:.0f} MB")
print(
    f"  After  con.table(): {mem_after_table:.0f} MB  (delta: +{mem_after_table - mem_before_table:.0f} MB)"
)
print()

# --- EXPLAIN shows the difference ---
print("=== EXPLAIN con.sql() result (lazy — references the view) ===")
print(r1.explain())
print()
print(
    "=== EXPLAIN con.table() result (materialised — scans a ColumnDataCollection) ==="
)
print(r2.explain())

OS:

Mac OS Tahoe 26.3

DuckDB Package Version:

1.4.4

Python Version:

3.14.0

Full Name:

Robin Linacre

Affiliation:

UK Ministry of Justice

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration to reproduce the issue?

  • Yes, I have

Ref: moj-analytical-services/uk_address_matcher#253, where I first fell foul of this behaviour

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions