Conversation
…d surrogate characters
Member
Author
|
やったー |
Member
|
私Java読めませんけど?🫠 |
Siro256
reviewed
Jan 28, 2026
Comment on lines
59
to
67
| private static int calcBufferLength(String src) { | ||
| int srcCodePointCount = src.codePointCount(0, src.length()); | ||
|
|
||
| int offset = src.offsetByCodePoints(0, srcCodePointCount - 1); | ||
| int lastCodePoint = src.codePointAt(offset); | ||
| boolean isSevenBitsCode = lastCodePoint <= SEVEN_BITS_CP_FINAL; | ||
|
|
||
| return (isSevenBitsCode ? (srcCodePointCount - 1) * 15 + 7 : srcCodePointCount * 15) / 8; | ||
| } |
Member
There was a problem hiding this comment.
文字を総舐めしてcp数を数えるより、入力された文字が正しいBase32768エンコーディングであると仮定して文字数からバッファ長を計算したほうが良いかもしれない?
Member
Author
There was a problem hiding this comment.
Comment on lines
128
to
135
| final int n = src.length(); | ||
| if (n == 0) return new byte[0]; | ||
|
|
||
| int srcCodePointCount = src.codePointCount(0, src.length()); | ||
| byte[] bytes = new byte[calcBufferLength(src)]; | ||
|
|
||
| // Multithread-safeではない | ||
| final var ref = new Object() { | ||
| int srcIndex = 0; | ||
| int dstIndex = 0; | ||
| int buf = 0; | ||
| int bufCount = 0; | ||
| }; | ||
| src.codePoints().forEachOrdered(codePoint -> { | ||
| Integer byteBase = TABLE.get(codePoint & ~31); | ||
| if (byteBase == null) throw new IllegalBase32768TextException(ref.srcIndex + 1, codePoint); | ||
|
|
||
| if (codePoint <= SEVEN_BITS_CP_FINAL) { | ||
| if (ref.srcIndex != srcCodePointCount - 1) throw new IllegalBase32768TextException(codePoint); | ||
| ref.buf = (ref.buf << 7) + byteBase + codePoint % 32; | ||
| ref.bufCount += 7; | ||
| } else { | ||
| ref.buf = (ref.buf << 15) + byteBase + codePoint % 32; | ||
| ref.bufCount += 15; | ||
| final char last = src.charAt(n - 1); | ||
| final int lastBits = LAST_BITS[last] & 0xFF; | ||
| if (lastBits == 0) throw new IllegalBase32768TextException(n - 1, last); | ||
|
|
||
| final int outLen = ((n - 1) * 15 + lastBits) >>> 3; |
Member
There was a problem hiding this comment.
これをcalcBufferLengthの実装にすると良いかもしれません?
Member
Author
There was a problem hiding this comment.
calcBufferLengthと同じ実装になりましたが、そもそもの実装が意図していないものだったらこちらも同様かもしれません
Comment on lines
22
to
23
| private static final char[] DECODE = new char[1 << 16]; | ||
| private static final byte[] LAST_BITS = new byte[1 << 16]; |
Member
There was a problem hiding this comment.
15bits範囲と7bits範囲のテーブルを分けると、テーブルサイズが2^17から2^11 + 2^8に削減できそうな気がします
Member
Author
There was a problem hiding this comment.
とりあえず変更量が少なくて済む方法としてDECODEのサイズを削れるだけ削ってみました。これで分割とほぼ同じ削減量を得られる気がしますがどうでしょうか🤔
LAST_BITSはブロック単位にして、大幅削減しました。シフト演算が1回増えるけどパフォーマンスへの影響は誤差の範囲なはずです
Member
Author
There was a problem hiding this comment.
なんで最初からこうしなかったのかは不明なんですが、それはそうと破壊してないか不安です
テストは通ってますが...
Co-authored-by: Siro / MIYAGI Naoki <[email protected]>
…r length calculation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.