-
Notifications
You must be signed in to change notification settings - Fork 1k
fix(7407): added check for x1a in fread.c to avoid segfault #7570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7570 +/- ##
=======================================
Coverage 98.99% 98.99%
=======================================
Files 87 87
Lines 16729 16733 +4
=======================================
+ Hits 16561 16565 +4
Misses 168 168 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
src/fread.c
Outdated
| if ((uint8_t)*ch <= 13 && (ch == eof || eol(&ch))) return true; | ||
| if (!commentChar) return false; | ||
| return *ch == commentChar; | ||
| return *ch == sep || *ch == '\x1A'|| ((uint8_t)*ch <= 13 && (ch == eof || eol(&ch))) || (commentChar && *ch == commentChar); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of the code above was that the complicated single line OR statement gets easier to grasp, so we would favor removing line 352 and add an case for if (*ch == '\x1A') return true;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, in the file that code was written after the long single line one so it was dead code, hence removed it but I'll do the reverse,
keep those lines and remove this one
src/fread.c
Outdated
| if ((uint8_t)*ch <= 13 && (ch == eof || eol(&ch))) return true; | ||
| if (!commentChar) return false; | ||
| return *ch == commentChar; | ||
| return *ch == sep || *ch == '\x1A'|| ((uint8_t)*ch <= 13 && (ch == eof || eol(&ch))) || (commentChar && *ch == commentChar); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return *ch == sep || *ch == '\x1A'|| ((uint8_t)*ch <= 13 && (ch == eof || eol(&ch))) || (commentChar && *ch == commentChar); | |
| if (*ch == sep) return true; | |
| if (ch == eof) return true; // Check eof first to avoid reading past #7407 | |
| if ((uint8_t)*ch <= 13 && eol(&ch)) return true; | |
| if (!commentChar) return false; | |
| return *ch == commentChar; |
While I couldn't come up with another example than 0x1A it seems safer to check for eof instead of checking for 0x1A specifically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, adding eof should also work, the bug was that ch==eof was in and with ch<=13 but x1A is greater
NEWS.md
Outdated
|
|
||
| 3. `fread("file://...")` works for file URIs with spaces, [#7550](https://github.com/Rdatatable/data.table/issues/7550). Thanks @aitap for the report and @MichaelChirico for the PR. | ||
|
|
||
| 4. `fread(text = paste0("foo\n", strrep("a", 4096*100), "\x1a"))` gives seg fault [#7407](https://github.com/Rdatatable/data.table/issues/7407)which is solved by adding check for `\x1A` at `end_of_field`. Thanks @aitap for the report. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like
fread(text=)could segfault when reading text input ending with a\x1a(ASCII SUB) character after a long line,
inst/tests/tests.Rraw
Outdated
|
|
||
| # 7407 Test for fread() handling \x1A (ASCII SUB) at end of input | ||
| fread_sub_test_txt = paste0("foo\n", strrep("a", 4096 * 100), "\x1A") | ||
| test(2358.1, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only use { in tests if they make the tests clearer and concise.
Setting up the str with txt = paste0("foo\n", strrep("a", 4096 * 100), "\x1A") is fine.
For the test maybe test for nchar(fread(txt)) which ensure that the whole string is read?
caa5e6b to
b87930a
Compare
Updated NEWS.md with fixes and enhancements for fread and sum functions.
Fixed formatting and clarified the segfault issue for fread.
Closes #7407
Adding
ch ==eoftoend_of_fieldto fix seg fault at freadRemoved single lined condition code from
end_of_field