Skip to content

Conversation

@manmita
Copy link
Contributor

@manmita manmita commented Jan 6, 2026

Closes #7407

Adding ch ==eof to end_of_field to fix seg fault at fread
Removed single lined condition code from end_of_field

@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.99%. Comparing base (b472d86) to head (868e5be).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7570   +/-   ##
=======================================
  Coverage   98.99%   98.99%           
=======================================
  Files          87       87           
  Lines       16729    16733    +4     
=======================================
+ Hits        16561    16565    +4     
  Misses        168      168           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MichaelChirico MichaelChirico removed their request for review January 6, 2026 22:57
src/fread.c Outdated
if ((uint8_t)*ch <= 13 && (ch == eof || eol(&ch))) return true;
if (!commentChar) return false;
return *ch == commentChar;
return *ch == sep || *ch == '\x1A'|| ((uint8_t)*ch <= 13 && (ch == eof || eol(&ch))) || (commentChar && *ch == commentChar);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of the code above was that the complicated single line OR statement gets easier to grasp, so we would favor removing line 352 and add an case for if (*ch == '\x1A') return true;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, in the file that code was written after the long single line one so it was dead code, hence removed it but I'll do the reverse,

keep those lines and remove this one

src/fread.c Outdated
if ((uint8_t)*ch <= 13 && (ch == eof || eol(&ch))) return true;
if (!commentChar) return false;
return *ch == commentChar;
return *ch == sep || *ch == '\x1A'|| ((uint8_t)*ch <= 13 && (ch == eof || eol(&ch))) || (commentChar && *ch == commentChar);
Copy link
Member

@ben-schwen ben-schwen Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return *ch == sep || *ch == '\x1A'|| ((uint8_t)*ch <= 13 && (ch == eof || eol(&ch))) || (commentChar && *ch == commentChar);
if (*ch == sep) return true;
if (ch == eof) return true; // Check eof first to avoid reading past #7407
if ((uint8_t)*ch <= 13 && eol(&ch)) return true;
if (!commentChar) return false;
return *ch == commentChar;

While I couldn't come up with another example than 0x1A it seems safer to check for eof instead of checking for 0x1A specifically

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, adding eof should also work, the bug was that ch==eof was in and with ch<=13 but x1A is greater

NEWS.md Outdated

3. `fread("file://...")` works for file URIs with spaces, [#7550](https://github.com/Rdatatable/data.table/issues/7550). Thanks @aitap for the report and @MichaelChirico for the PR.

4. `fread(text = paste0("foo\n", strrep("a", 4096*100), "\x1a"))` gives seg fault [#7407](https://github.com/Rdatatable/data.table/issues/7407)which is solved by adding check for `\x1A` at `end_of_field`. Thanks @aitap for the report.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like

fread(text=) could segfault when reading text input ending with a \x1a (ASCII SUB) character after a long line,


# 7407 Test for fread() handling \x1A (ASCII SUB) at end of input
fread_sub_test_txt = paste0("foo\n", strrep("a", 4096 * 100), "\x1A")
test(2358.1, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use { in tests if they make the tests clearer and concise.

Setting up the str with txt = paste0("foo\n", strrep("a", 4096 * 100), "\x1A") is fine.
For the test maybe test for nchar(fread(txt)) which ensure that the whole string is read?

@manmita manmita force-pushed the fix/7407_xia_bug_fix branch from caa5e6b to b87930a Compare January 8, 2026 19:59
manmita and others added 4 commits January 9, 2026 01:33
Updated NEWS.md with fixes and enhancements for fread and sum functions.
Fixed formatting and clarified the segfault issue for fread.
@manmita manmita requested a review from ben-schwen January 8, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fread() allergic to \x1a bytes

2 participants