GH-49614: [C++] Fix silent truncation in base64_decode on invalid input#49660
GH-49614: [C++] Fix silent truncation in base64_decode on invalid input#49660Reranko05 wants to merge 3 commits intoapache:mainfrom
Conversation
|
|
4670ec5 to
5c7db64
Compare
|
Thanks for the feedback. I’ve updated the implementation and tests.
All tests pass locally. Please let me know if any further adjustments are needed. |
5c7db64 to
8f053b7
Compare
kou
left a comment
There was a problem hiding this comment.
Could you use arrow::Result<std::string> return type instead of using ARROW_LOG()?
|
FYI: You can run CI on your fork by enabling GitHub Actions on your fork. |
Thanks for the suggestion @kou ! Just to clarify, would you prefer changing the existing base64_decode API to return arrow::Resultstd::string, or introducing a separate checked variant while keeping the current API unchanged? I want to make sure the approach aligns with existing usage and expectations. |
|
"changing the existing base64_decode API to return arrow::Resultstd::string". |
|
@kou I checked the current usages of Updating to I can proceed with the API change and update the affected call sites accordingly. |
|
Hi @kou, just following up on this. I can proceed with updating Happy to proceed based on your guidance. |
|
Oh, sorry. I forgot to reply this... Yes. Let's proceed with |
8f053b7 to
34a388c
Compare
|
Hi @kou, thanks for confirming! I’ve updated All tests are passing locally. Please let me know if you’d like any changes or adjustments. |
There was a problem hiding this comment.
Pull request overview
This PR addresses a correctness issue in Arrow’s C++ base64 decoder by ensuring malformed base64 input is detected instead of producing silently truncated/partial output.
Changes:
- Adds pre-validation for base64 input (length, padding placement, invalid characters) in
base64_decode. - Changes
base64_decodeAPI to returnarrow::Result<std::string>withStatus::Invalidon malformed input. - Adds unit tests covering valid/invalid decoding cases (and adjusts a couple of
ToCharsassertions).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| cpp/src/arrow/vendored/base64.cpp | Adds base64 input validation and switches decode to return errors instead of partial output. |
| cpp/src/arrow/util/base64.h | Updates public API signature of base64_decode to return Result<std::string>. |
| cpp/src/arrow/util/string_test.cc | Adds tests for base64 decode validity/error cases and tweaks ToChars expectations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
34a388c to
ed84348
Compare
|
Hi @kou, I have addressed all review comments:
All tests pass locally. |
|
Could you enable GitHub Actions on your fork to run CI on your fork too? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0446f41 to
e35212c
Compare
|
Could you push e.g.: git remote add https://github.com/apache/arrow.git
git fetch --all --tags --prune --force
git push --tags origin |
e658d8f to
22e6c2c
Compare
|
Hi @kou, I have addressed all review comments. All checks are passing now |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
22e6c2c to
98e24de
Compare
|
Hi @kou, I have addressed the remaining feedback by using |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
kou
left a comment
There was a problem hiding this comment.
+1
@dmitry-chirkov-dremio Do you want to review this before we merge this?
Yes, give me 24h. |
dmitry-chirkov-dremio
left a comment
There was a problem hiding this comment.
Looks good to me however won't approve until conversations are resolved (including Copilot's comments). Leaving couple of comments of my own as well.
Good job, @Reranko05 - almost there.
cc @kou
98e24de to
3dddf70
Compare
Thank you @dmitry-chirkov-dremio, I've addressed all review comments and resolved outstanding threads. |
Rationale for this change
arrow::util::base64_decodepreviously allowed invalid input to be processed, which could result in silently truncated or incorrect output without signaling an error. This can lead to unintended data corruption.What changes are included in this PR?
base64_decodeto returnarrow::Result<std::string>instead ofstd::stringStatus::Invalid) for invalid input instead of producing partial outputResult<std::string>Are these changes tested?
Yes. Unit tests have been added to verify:
Are there any user-facing changes?
arrow::Result<std::string>instead ofstd::stringStatus::Invalid) instead of returning partial or incorrect outputThis PR contains a "Critical Fix".
This change fixes a correctness issue where invalid base64 input could result in silently truncated output, leading to incorrect data being produced. The fix ensures such inputs are detected and reported properly.