Performance/reduce allocations, increase test coverage, configure code coverage, fix race condition#115
Conversation
Introduce DecodeMapTests with extensive NUnit test coverage for the DecodeMap class, verifying token ID assignment, reuse, string reconstruction, and handling of edge cases such as empty segments and separators. Ensures correct round-trip behavior and robustness across various scenarios.
Refactored DecodeMap to use non-nullable types and a List for token ID management, replaced string.Split with manual scope parsing for better performance, and optimized token assignment and string reconstruction logic.
Introduced StringUtilsTests with NUnit covering SubstringAtIndexes, SliceAtIndexes, IsValidHexColor, StrCmp, and StrArrCmp. Tests include normal, edge, and error cases to ensure correct handling of valid/invalid inputs and nulls.
Replaced regex-based hex color validation with fast, allocation-free character checks using new helper methods. Added argument validation to substring and slice methods. Updated string comparison logic to use reference equality. Improves performance and reduces allocations.
Upgraded the BenchmarkDotNet NuGet package in TextMateSharp.Benchmarks.csproj from version 0.14.0 to 0.15.8 to ensure compatibility with the latest features and improvements. No other changes were made.
Streamlined variable usage by assigning local variables to reduce repeated method calls and improve readability. Simplified logic in CompileCaptures with early null checks, single-pass maximum ID calculation, and proper list initialization. Improved CompilePatterns by precomputing pattern count, consistent handling of includes, and more efficient return logic. Benchmark results: // Global total time: 00:01:15 (75.82 sec), executed benchmarks: 2 // // * Summary * // // BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.7840/25H2/2025Update/HudsonValley2) // AMD Ryzen 9 7945HX with Radeon Graphics 2.50GHz, 1 CPU, 32 logical and 16 physical cores // // .NET SDK 10.0.103 // [Host] : .NET 10.0.3 (10.0.3, 10.0.326.7603), X64 RyuJIT x86-64-v4 // //| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio | //|------------- |---------:|----------:|----------:|-------------:|--------:|-------:|-------:|----------:|------------:| //| PreOptimized | 2.116 μs | 0.0419 μs | 0.0827 μs | baseline | | 0.4158 | 0.0038 | 6.8 KB | | //| Optimized | 2.035 μs | 0.0407 μs | 0.0841 μs | 1.04x faster | 0.06x | 0.4120 | 0.0038 | 6.75 KB | 1.01x less |
Introduces RegexSourceTests covering EscapeRegExpCharacters, HasCaptures, and ReplaceCaptures methods. Tests include special character escaping, capture detection, replacement logic, edge cases, and error handling. Utilizes NUnit and Moq for assertions and mocking. No changes to production code.
Refactored RegexSource.cs to improve performance, safety, and code clarity. Key changes include using spans for efficient string parsing, adding null checks, making the regex field readonly, optimizing capture index parsing, and reducing string allocations. These updates modernize the code and enhance reliability. Benchmark Results: // Global total time: 00:11:40 (700.15 sec), executed benchmarks: 12 // * Summary * // // BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.7840/25H2/2025Update/HudsonValley2) // AMD Ryzen 9 7945HX with Radeon Graphics 2.50GHz, 1 CPU, 32 logical and 16 physical cores // // .NET SDK 10.0.103 // [Host] : .NET 10.0.3 (10.0.3, 10.0.326.7603), X64 RyuJIT x86-64-v4 // //| Method | Categories | Mean | Error | StdDev | Median | Ratio | RatioSD | Gen0 | Allocated | Alloc Ratio | //|------------------------------------------ |-------------------------------- |----------:|----------:|----------:|----------:|-------------:|--------:|-------:|----------:|------------:| //| EscapeRegExpCharacters_Plain | EscapeRegExpCharacters_Plain | 16.19 ns | 0.269 ns | 0.238 ns | 16.17 ns | baseline | | 0.0086 | 144 B | | //| EscapeRegExpCharacters_Plain_Optimized | EscapeRegExpCharacters_Plain | 16.47 ns | 0.364 ns | 0.357 ns | 16.46 ns | 1.02x slower | 0.03x | 0.0086 | 144 B | 1.00x more | //| | | | | | | | | | | | //| EscapeRegExpCharacters_Specials | EscapeRegExpCharacters_Specials | 85.03 ns | 2.223 ns | 6.379 ns | 84.67 ns | baseline | | 0.0248 | 416 B | | //| EscapeRegExpCharacters_Specials_Optimized | EscapeRegExpCharacters_Specials | 83.73 ns | 3.020 ns | 8.762 ns | 84.32 ns | 1.03x faster | 0.13x | 0.0248 | 416 B | 1.00x more | //| | | | | | | | | | | | //| HasCaptures_NoCapture | HasCaptures_NoCapture | 55.80 ns | 0.672 ns | 0.561 ns | 55.89 ns | baseline | | - | - | NA | //| HasCaptures_NoCapture_Optimized | HasCaptures_NoCapture | 55.79 ns | 0.725 ns | 0.678 ns | 55.67 ns | 1.00x faster | 0.02x | - | - | NA | //| | | | | | | | | | | | //| HasCaptures_WithCapture | HasCaptures_WithCapture | 99.10 ns | 2.801 ns | 8.169 ns | 99.42 ns | baseline | | 0.0162 | 272 B | | //| HasCaptures_WithCapture_Optimized | HasCaptures_WithCapture | 71.60 ns | 1.669 ns | 4.816 ns | 70.85 ns | 1.39x faster | 0.15x | - | - | NA | //| | | | | | | | | | | | //| ReplaceCaptures_Command | ReplaceCaptures_Command | 525.11 ns | 12.614 ns | 36.192 ns | 511.48 ns | baseline | | 0.0668 | 1128 B | | //| ReplaceCaptures_Command_Optimized | ReplaceCaptures_Command | 536.34 ns | 11.231 ns | 31.860 ns | 534.76 ns | 1.03x slower | 0.09x | 0.0591 | 1000 B | 1.13x less | //| | | | | | | | | | | | //| ReplaceCaptures_Numeric | ReplaceCaptures_Numeric | 420.10 ns | 9.319 ns | 25.666 ns | 414.95 ns | baseline | | 0.0525 | 880 B | | //| ReplaceCaptures_Numeric_Optimized | ReplaceCaptures_Numeric | 375.61 ns | 9.771 ns | 28.034 ns | 377.30 ns | 1.12x faster | 0.11x | 0.0496 | 832 B | 1.06x less |
Added atomic SetModel enforcement and safe model publication. Introduced GetModelIfAvailable to prevent premature TMModel callbacks. Improved documentation on lock ordering and initialization contracts.
|
Fixes #113 |
|
Thanks a lot for the work here. I'll take a look into it today. The added tests, race condition fix in AbstractLineList, and the extra validation/robustness improvements are valuable work thanks! On the performance side, I appreciate the detailed benchmarks. The gains are small overall but the lib already was quite optimized. Thanks again for your contribution! |
|
@danipen I created the extensive unit tests based on the pre-optimized code. This was done so i could ensure i didnt introduce any regressions. Also, i wasnt sure where in the library would make the most significant performance impact (memory or speed) so i just picked a couple files randomly. if you could point me to several files that could make such an impact Im happy to try and make more meaningful perf improvements. |
|
The last times I worked on allocation/perf improvements, I started from profiler data and focused strictly on the hottest paths. After a few rounds of that, most of the obvious wins were already addressed, so at this point the core is fairly optimized. That said, if you’d like to dig deeper using a profiler and see if something new pops up, please go ahead. Fresh eyes often spot things I might have missed. Really appreciate the initiative and the quality of the work you’ve put into this. |
|
absolutely! happy to contribute. i have TextMateSharp and AvaloniaEdit integrated into my MermaidPad desktop app (so i benefit from this too 😆) I was going to run a profiler or use memory dumps with WinDbg but wasnt sure the best way to run the app to profile it. i thought about using the benchmark app to profile against but i would expect that to skew things and send me down a rabbit hole. how did you run it when you profiled it? |
|
I used both Visual Studio integrated memory profiler in windows and dot memory in macos. |
right but did you use a real world app that uses TextMateSharp, some test harness, or something else like your benchmark harness? |
|
No I didn't. For that I'd stress the lib using AvaloniaEdit |
|
ok that makes sense. is there a specific app using AE that you used? i know theres the demo app. maybe that would work - unless you have a stress harness for AE you could share. |
|
There are some yeah, but you can easily stress the demo app pasting quite big files and scrolling also changing the grammar. |
|
lmk if you need any other changes here. I was waiting to submit the other PR's until this was in. |
|
Sorry for the delay. I had a quite busy week. Tomorrow I will try to take a look into the changes. |
no worries! I appreciate it. |
… API incompatibilities
danipen
left a comment
There was a problem hiding this comment.
All good!
I found a build issue while reviewing this locally:
Build failure on netstandard2.0 (RegexSource.cs)
commandSpan.SequenceEqual("downcase") and commandSpan.SequenceEqual("upcase") don't compile when targeting netstandard2.0. The implicit conversion from string to ReadOnlySpan<char> only exists in .NET Core 2.1+ runtimes — it's not provided by the System.Memory NuGet polyfill. CI didn't catch this because the solution build resolves the library through the net8.0 test projects, which masks the missing conversion.
Fixed by adding explicit .AsSpan() calls:
commandSpan.SequenceEqual("downcase".AsSpan())
commandSpan.SequenceEqual("upcase".AsSpan())I also added a CI step to dotnet.yml that builds both netstandard2.0 projects (TextMateSharp and TextMateSharp.Grammars) individually before the full solution build, so this kind of issue gets caught automatically going forward.
Apologies for any trouble with the PR. It is never my intention to cause a repo owner additional work :) I'm using VS Enterprise 2026 - I don't understand why that wasn't caught by the compiler on my box. 🤔 I've seen other compiler errors caught if I accidentally have code that isn't backwards compatible with NET Standard 2.0....oh well |
|
No worries |
PR Classification
Adds comprehensive unit tests, fixes a race condition, improves internal utility robustness, reduces memory allocations, and enhances code safety and maintainability.
Fixes #113
PR Summary
This pull request introduces extensive unit tests for core internal classes, refactors utility and model logic for correctness and performance, and strengthens thread safety and validation. It also updates code coverage configuration and dependencies.
AbstractLineListMain Benchmark Results:
TokenizeAllLines:Before
After
TextMateSharp.Benchmarks.BigFileTokenizationBenchmark-report-github.md
RegexSource Benchmarks
RuleFactory Benchmarks
DecodeMap Benchmarks