Skip to content

Conversation

@slabko
Copy link
Contributor

@slabko slabko commented Jan 9, 2026

TODO

  • Not scalar functions, per se, but this should also be fixed: the {ts ...} escape sequence generates DateTime values without fractional seconds, while other drivers generate timestamps with fractional seconds. For example,
    SELECT {ts '2024-01-15 10:00:00.500'} generates 2024-01-15 10:00:00.500 in the SQL Server ODBC Driver, but 2024-01-15 10:00:00 in the ClickHouse ODBC driver.
  • If statement parsing fails, the system must forward the full, unmodified statement to ClickHouse. Execution errors are acceptable in this case, as preserving the original query in ClickHouse logs and error output is required for accurate debugging of queries generated by Power BI and other external tools, rather than attempting to infer or reconstruct the original query.

Notes on the implementation

  1. The RAND function produces the same number if it is called multiple times within the same query. This limitation can be resolved, but it requires reworking the query parser in the ODBC driver. Additionally, the RAND function does not currently support a seed parameter.
  2. The driver currently does not support the Time and Time64 data types. As a result, some functions do not work as expected:
  • CURRENT_TIME and CURTIME return a datetime value (using now64()), which binds correctly to ODBC’s SQL_TIME_STRUCT, but can cause unexpected issues in SQL statements that rely on these functions returning only a time value.
  • CONVERT does not work with the SQL_TIME parameter.
  1. The WEEK function generally works, but it does not produce results consistent with other ODBC drivers. Specifically, it uses ClickHouse’s toISOWeek()
    function, which counts the first and last week of the year only if it has four or more days in that year. Because of this, the results are slightly different from what is expected (see the table below). This can be fixed; however, it would break many existing reports, so the current trade-off is to leave it as is.
Date Expected Actual
2018-01-01 1 1
2022-01-01 1 52
2023-01-01 1 52
2018-12-31 53 1
2023-12-31 53 52
2024-12-31 53 52

TODO

String

  • ASCII( string_exp ) 
  • BIT_LENGTH( string_exp ) 
  • CHAR( code ) 
  • CHAR_LENGTH( string_exp ) 
  • CHARACTER_LENGTH( string_exp ) 
  • CONCAT( string_exp1,string_exp2) 
  • DIFFERENCE( string_exp1,string_exp2)
  • INSERT( string_exp1, start, length, string_exp2) 
  • LCASE( string_exp ) 
  • LEFT( string_exp, count) 
  • LENGTH( string_exp ) 
  • LOCATE( string_exp1, string_exp2[, start]) 
  • LTRIM( string_exp ) 
  • OCTET_LENGTH( string_exp ) 
  • POSITION( character_exp IN character_exp) 
  • REPEAT( string_exp, count) 
  • REPLACE( string_exp1, string_exp2, string_exp3) 
  • RIGHT( string_exp, count) 
  • RTRIM( string_exp ) 
  • SOUNDEX( string_exp ) 
  • SPACE( count ) 
  • SUBSTRING( string_exp, start, length**)**
  • UCASE( string_exp ) 

Numeric

  • ABS( numeric_exp ) 
  • ACOS( float_exp ) 
  • ASIN( float_exp ) 
  • ATAN( float_exp ) 
  • ATAN2( float_exp1, float_exp2) 
  • CEILING( numeric_exp ) 
  • COS( float_exp ) 
  • COT( float_exp ) 
  • DEGREES( numeric_exp ) 
  • EXP( float_exp ) 
  • FLOOR( numeric_exp ) 
  • LOG( float_exp ) 
  • LOG10( float_exp ) 
  • MOD( integer_exp1, integer_exp2) 
  • PI( ) 
  • POWER( numeric_exp, integer_exp) 
  • RADIANS( numeric_exp ) 
  • RAND([integer_exp]) 
  • ROUND( numeric_exp, integer_exp) 
  • SIGN( numeric_exp ) 
  • SIN( float_exp ) 
  • SQRT( float_exp ) 
  • TAN( float_exp ) 
  • TRUNCATE( numeric_exp, integer_exp) 

Date/Time

  • CURRENT_DATE( ) 
  • CURRENT_TIME[( time-precision )] 
  • CURTIME( ) 
  • CURRENT_TIMESTAMP [( timestamp-precision )] 
  • CURDATE( ) 
  • DAYNAME( date_exp ) 
  • DAYOFMONTH( date_exp ) 
  • DAYOFWEEK( date_exp ) 
  • DAYOFYEAR( date_exp ) 
  • EXTRACT( extract-field FROM extract-source ) 
  • HOUR( time_exp ) 
  • MINUTE( time_exp ) 
  • MONTH( date_exp ) 
  • MONTHNAME( date_exp ) 
  • NOW( ) 
  • QUARTER( date_exp ) 
  • SECOND( time_exp ) 
  • TIMESTAMPADD( interval, integer_exp, timestamp_exp ) 
  • TIMESTAMPDIFF( interval, timestamp_exp1, timestamp_exp2 ) 
  • WEEK( date_exp ) 
  • YEAR( date_exp ) 

System

  • DATABASE( ) 
  • IFNULL( exp, value ) 
  • USER( ) 

Conversion

  • CONVERT( value, type )

@slabko slabko force-pushed the scalar-functions branch 2 times, most recently from f82bc3d to 9929001 Compare January 9, 2026 19:23
slabko added 3 commits January 9, 2026 20:57
Scalar functions are declared in the same namespace (enum) as other
tokens, such as COMMA, SPACE, LPARENT, etc. This makes scalar function
names conflict with token names. For example, there is a SPACE token
and a SPACE scalar function. To avoid naming conflicts, all scalar
functions now have an FN_ prefix.
@slabko
Copy link
Contributor Author

slabko commented Jan 9, 2026

There seem to be a problem in our staging Cloud environment, it mixes positions of haystack and needle. This should return 7, but it returns 0:

SELECT locate('World', 'Hello World')

However for this it returns 7, but should return 0 (according to the documentation and all other non-staging instances)

SELECT locate('Hello World', 'World')

UPDATE: It turns out that function_locate_has_mysql_compatible_argument_order is disabled on staging, so it behaves exactly as position does in terms of order of parameters.

The order of parameters in `locate` can be configured, which might cause
problems in environments with non-default settings. `position` always
work the same in that sense.
Previously, the driver processed ODBC escape sequences before replacing
ODBC parameter markers (?) with ClickHouse positional parameters. This
caused incorrect parameter ordering for functions whose argument order
differs between the ODBC standard and ClickHouse (for example, LOCATE →
position).

Example (old behavior):

{FN LOCATE(?, ?)}
  → position(?, ?)
  → position({odbc_parameter_1:String}, {odbc_parameter_2:String})

At this point, the original parameter order is lost, making it
impossible to reorder arguments correctly. For LOCATE, the correct
mapping requires the second parameter to come first.

New behavior:

{FN LOCATE(?, ?)}
  → {FN LOCATE({odbc_parameter_1:String}, {odbc_parameter_2:String})}
  → position({odbc_parameter_2:String}, {odbc_parameter_1:String})

This change fixes the issue by replacing ODBC parameter markers before
processing ODBC escape sequences. The parser and lexer were also
extended to support ClickHouse positional parameter syntax.
For example:

SELECT {FN INSERT(?, ?, ?, ?)}

Additional add mapping from ODBC's INSERT to ClickHouse's overlayUTF8
@slabko slabko force-pushed the scalar-functions branch 2 times, most recently from b47d18a to 7e0e0b1 Compare January 20, 2026 17:26
@slabko slabko force-pushed the scalar-functions branch 2 times, most recently from 3dd0ea0 to 7032aa1 Compare January 22, 2026 13:42
@slabko slabko marked this pull request as ready for review January 26, 2026 09:28
@slabko slabko requested a review from mzitnik as a code owner January 26, 2026 09:28
Previously, the parser would produce a partially processed query if it managed
to parse part of the function. This led to confusing queries and even more
confusing error messages. It was also almost impossible to decipher what the
original query was.

Now, the parser either parses the entire query or returns the original query
unchanged. This will still produce an error message, but at least it will be
clear what the original query was, making it easier to debug.
@slabko slabko merged commit 5aea560 into master Jan 28, 2026
20 checks passed
@slabko slabko deleted the scalar-functions branch January 28, 2026 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants