Skip to content

Fix: Continuous Aggregate watermark in the future on full-refresh#17

Open
rogiervandergeer wants to merge 1 commit intosdebruyn:mainfrom
rogiervandergeer:fix/ca-refresh-end-offset
Open

Fix: Continuous Aggregate watermark in the future on full-refresh#17
rogiervandergeer wants to merge 1 commit intosdebruyn:mainfrom
rogiervandergeer:fix/ca-refresh-end-offset

Conversation

@rogiervandergeer
Copy link

When performing a full-refresh on a continuous aggregate with a refresh policy, the watermark is sometimes incorrectly set to a future timestamp. This is because the do_refresh_continuous_aggregate macro isn't not passing the end_offset from the model's configuration to the refresh_continuous_aggregate function, defaulting it to null. See also

This PR addresses the issue by:

  • Modifying the do_refresh_continuous_aggregate macro to correctly pass the end_offset from the model's refresh policy configuration. If no policy is defined, or end_offset is not set, it defaults to null.
  • Adding a new functional test test_continuous_aggregate_watermark.py to verify that the continuous aggregate watermark is always less than the current time after a full refresh, even when no data has been materialized.

This ensures that continuous aggregates behave as expected during full refreshes, preventing data visibility issues caused by a future watermark.

When performing a full-refresh on a continuous aggregate with a refresh policy, the watermark was sometimes incorrectly set to a future timestamp. This occurred because the `do_refresh_continuous_aggregate` macro was not passing the `end_offset` from the model's configuration to the `refresh_continuous_aggregate` function, defaulting it to `null`.

This commit addresses the issue by:
- Modifying the `do_refresh_continuous_aggregate` macro to correctly pass the `end_offset` from the model's configuration. If `end_offset` is not explicitly defined, it defaults to `null`.
- Adding a new functional test `test_continuous_aggregate_watermark.py` to verify that the continuous aggregate watermark is always less than the current time after a full refresh, even when no data has been materialized.

This ensures that continuous aggregates behave as expected during full refreshes, preventing data visibility issues caused by a future watermark.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant