Fix bug with handling of null values in dictionaries#70
Merged
Conversation
Collaborator
Author
|
Unfortunately:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #70 +/- ##
==========================================
+ Coverage 82.80% 83.02% +0.22%
==========================================
Files 15 15
Lines 1128 1143 +15
Branches 1128 1143 +15
==========================================
+ Hits 934 949 +15
Misses 132 132
Partials 62 62 ☔ View full report in Codecov by Sentry. |
Collaborator
|
Let's add a test case to show that we return nulls in the dictionary keys in cases when we would produce null values. I wonder if we should switch to using dictionary builders immediately if we're going through the effort to re-pack dictionaries at the end anyway. |
Collaborator
Author
|
I pushed a test and simplified to just set keys to null as you suggested. I can confirm the test used to fail. |
davidhewitt
approved these changes
Feb 4, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently given the query
json_get_text(col, 'a')on the data['{'x': 0}', '{'x': 0}', '{'a': 1}']wherecolis a dictionary encoded column originally with keys[0, 0, 1]and values['{'x': 0}', '{'x': 1}']we return a dictionary with keys[0, 0, 1]and values[null, null, 1].But if you look at how
arrow-rsbuilds up dictionaries they always put the nulls in the keys, not the values. The spec does not require this, but I think that things inarrow-rsor DataFusion assume it is so (based on panics I've seen in prod).This PR works around those bugs elsewhere while we investigate them further.