-
Notifications
You must be signed in to change notification settings - Fork 2k
A few TPCH benchmark queries are incorrect causing issues when scale factor > 1 #21368
Description
Describe the bug
I noticed during my work to migrate many of the benchmarks to be primarily sql based that tpch query 11 had no results when the sf wasn't 1. During the investigation I uncovered a similar issue with duckdb and discovered that a ticket for that db has also been created (duckdb/duckdb#17965). quoting from that ticket:
The definition of TPC-H Q11 (per https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC-H_v3.0.1.pdf, page 47) selects:
SELECT
sum(ps_supplycost * ps_availqty) * [FRACTION]Where [FRACTION] is a substitution parameter defined as (2.4.3.11):
2.4.11.3 Substitution Parameters
Values for the following substitution parameter must be generated and used to build the executable query text:
[...]
2. FRACTION is chosen as 0.0001 / SF.
However, in the SQL for query 11 the fractional multiplier is hard-coded as 0.0001. For any scale factor other than SF1 this results in either too much (SF > 1) or too little (SF < 1) filtering, and the result is incorrect.
As well during a validation of the tpch queries I noticed as few other queries where slightly incorrect in that what should have been calculated intervals were hardcoded. For example query 14 has
where
l_partkey = p_partkey
and l_shipdate >= date '1995-09-01'
and l_shipdate < date '1995-10-01';when in fact it should be
where l_partkey = p_partkey
and l_shipdate >= date '1995-09-01'
and l_shipdate < date '1995-09-01' + interval '1' month;;There are a few queries with a similar issue.
To Reproduce
RUST_LOG=info ./bench.sh tpch10 query=11
The output will have no rows.
Expected behavior
No response
Additional context
No response