Skip to content

Conversation

@frankslin
Copy link
Contributor

No description provided.

@frankslin frankslin force-pushed the feature/support-dictionary-comments branch 2 times, most recently from ee7f55c to 5cae95e Compare January 15, 2026 01:51
本提交完整实现了txt词典的注释语法与排序规则,包括向后兼容的API设计和命令行工具支持。

 ## 注释语法支持

**基本语法:**
- 注释行:以 # 开头的整行
- 词典记录行:以tab分隔的 key/value pair
- 空行:不包含任何可见字符

**注释块分类:**
- Header block:文件开头注释块(在第一个词典记录前的最后一个空行之前)
- Footer block:文件结尾注释块(在最后一条词典记录之后)
- Attached block:紧贴词典记录行的注释块(中间无空行)
- Floating block:游离注释块(不满足attach条件的注释块)

**排序规则:**
- 排序最小单位为词典记录 + 其附加的注释块
- Header/Footer block固定在文件开头/结尾
- 仅对词典记录的key进行稳定排序
- Floating block在排序后插入到其锚点位置

 ## 向后兼容设计

**默认行为(preserveComments=false):**
- 完全兼容旧版本
- 遇到 # 开头的行会抛出异常(原行为)
- 不解析和保存注释结构

**新行为(preserveComments=true):**
- # 开头的行被识别为注释,不报错
- 保存注释块结构用于排序和序列化

 ## API修改

**核心API:**
- Lexicon::ParseLexiconFromFile(FILE* fp, bool preserveComments = false)
- TextDict::NewFromFile(FILE* fp, bool preserveComments = false)
- TextDict::NewFromSortedFile(FILE* fp, bool preserveComments = false)
- ConvertDictionary(..., bool preserveComments = false)

**命令行工具:**
opencc_dict 添加了 -p, --preserve-comments 参数

使用示例:
```bash
 # 默认行为(向后兼容)- 会对带注释的文件报错
opencc_dict -i input.txt -o output.txt -f text -t text

 # 保留注释并排序
opencc_dict -i input.txt -o output.txt -f text -t text --preserve-comments
```

 ## 实现细节

**数据结构:**
- CommentBlock:注释块结构
- AnnotatedEntry:带注释的词条
- 在Lexicon中添加了header/footer/annotated/floating blocks的存储

**核心逻辑:**
- 重写ParseLexiconFromFile,支持两种解析模式
- 实现SortWithAnnotations,确保注释块随词条移动
- 修改TextDict::SerializeToFile,正确输出注释块和空行

 ## 测试

添加了完整的测试覆盖(LexiconAnnotationTest):
- ParseCommentLines:解析注释行
- ParseAttachedComment:解析附加注释
- ParseFloatingComment:解析游离注释
- ParseFooterComment:解析尾部注释
- SerializeWithAnnotations:带注释的序列化
- SortWithAnnotations:带注释的排序
- DefaultBehaviorIgnoresComments:验证默认行为
- DefaultBehaviorRejectsCommentLines:验证向后兼容

所有8个测试通过。手动测试命令行工具功能正常。
@frankslin frankslin force-pushed the feature/support-dictionary-comments branch from 5cae95e to aca33bf Compare January 15, 2026 01:53
Add standardized headers listing the official config usage for each top-level dictionary file.
@frankslin frankslin force-pushed the feature/support-dictionary-comments branch from aca33bf to 34b4af5 Compare January 15, 2026 02:01
@frankslin frankslin force-pushed the feature/support-dictionary-comments branch from d96a31d to 2ed1fd4 Compare January 16, 2026 03:23
@BYVoid BYVoid merged commit 5810b60 into BYVoid:master Jan 16, 2026
25 checks passed
@frankslin frankslin deleted the feature/support-dictionary-comments branch January 16, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants