Fix #116 and Performance/reduce allocations by udlose · Pull Request #121 · danipen/TextMateSharp

udlose · 2026-03-02T12:46:04Z

Fixes #116 (separate PR coming for #117 )

This pull request introduces comprehensive improvements to the AttributedScopeStack and StateStack classes, focusing on performance, correctness, and test coverage. The most significant changes include optimizing equality and hashing for AttributedScopeStack, refactoring scope parsing logic for efficiency and correctness, and adding robust unit tests for StateStack's string representation. These changes enhance the reliability and speed of grammar stack operations in TextMateSharp.

Performance and Correctness Improvements

AttributedScopeStack now precomputes hash codes for each instance, enabling fast equality checks and preventing unnecessary traversal of the parent chain. The equality logic has also been updated to use these hash codes and to compare scope paths using ordinal string comparison for correctness. [1] [2]
The scope matching logic in AttributedScopeStack is refactored to avoid unnecessary string allocations and to improve correctness, especially for edge cases involving scope selectors.

Refactoring and Efficiency

The logic for pushing multiple scopes in AttributedScopeStack is rewritten to use ReadOnlySpan<char> for efficient, allocation-free parsing, and to correctly handle trailing spaces and empty segments. [1] [2]
The GenerateScopes method now precomputes the stack depth to pre-size the resulting list, reducing memory allocations and improving performance.

Testing and Validation

Added a comprehensive suite of unit tests for StateStack.ToString in StateStackTests.cs, covering single and multi-depth stacks, special rule IDs, boundary conditions, and repeated calls for consistency. This ensures the correctness and reliability of the stack's string representation.

Code Quality and Maintenance

The Equals method in StateStack is refactored for clarity and correctness, using pattern matching and improved structure.
The ToString method in StateStack is rewritten for efficiency, using a single pass to collect rule IDs and a StringBuilder to assemble the output, reducing allocations and improving readability.

Minor Fixes

Minor cleanup in SimpleJSON.cs to simplify dictionary assignment logic.

Benchmarks

Before:
TextMateSharp.Benchmarks.BigFileTokenizationBenchmark-report-github.md

After:

TextMateSharp.Benchmarks.BigFileTokenizationBenchmark-report-github-ScopeStack work - final.md

Converted test class to public and added extensive unit tests for AttributedScopeStack, covering constructors, equality, hash codes, scope name retrieval, attribute merging, and push operations. Includes edge cases, reflection-based tests, and helpers for stack creation and metadata. Validates robustness and correctness of all key behaviors.

…l stacks (null Parent / null ScopePath). Add parameter validation to PushAtributed to throw ArgumetNullException instead of a NullReferenceException

…le TFM #ifdef for GetHashCode - Refactored AttributedScopeStack for performance and clarity: precomputed hash codes, optimized Equals/GetHashCode, and improved scope matching logic. - Changed handling of empty parentScopes to "always matches" instead of throwing. - Enhanced Push logic to handle multi-scope strings and trailing spaces correctly. - Expanded and clarified unit tests for scope matching and stack manipulation, including new edge cases. - Improved resilience to malformed theme data and overall test coverage.

…hrow ArgumentOutOfRangeException thru the Matches method if an empty scopesList param was passed. This was because there was no length check on the collection before the collection was indexed. I opted to change Matches to return true if parentScopes is empty. Let me know if you agree with this change. Updated the Matches method to return true when parentScopes is null or empty, ensuring correct behavior when no parent scopes are provided.

Updated .gitignore to exclude Visual Studio Live Unit Testing files, specifically those with the .lutconfig extension, to prevent committing generated test artifacts.

Simplified dictionary add/update logic by directly assigning values without checking for key existence. Replaced ContainsKey with TryGetValue in Remove(string aKey) for efficiency. Refactored Remove(JSONNode aNode) to avoid LINQ and exception handling, using a straightforward iteration to find and remove the node. These changes improve code clarity and maintainability.

Introduced StateStackTests with extensive NUnit test coverage for the StateStack.ToString() method. Tests include various stack depths, special rule IDs, boundary conditions, repeated calls, and correct rule ordering. No changes to existing code.

Refactored the StateStack class to improve code clarity and performance. Updated using directives, modernized the Equals method with pattern matching, and reimplemented ToString using StringBuilder and an array of RuleIds for efficiency. Removed the old recursive string construction logic. Reformatted GetHashCode for readability. These changes enhance maintainability and performance.

Simplified type checking and casting for IncludeOnlyRule, BeginEndRule, and BeginWhileRule by using C# pattern matching. This improves code readability and removes redundant casts and variable declarations.

Introduce ParsedThemeTests and ThemeTests to cover a wide range of scenarios for theme parsing and management. Tests include handling of null/empty/whitespace includes, color ID management for various formats, GUI color dictionary behavior, color map correctness, scope matching, and rule overwriting. Utilizes Moq and NUnit for thorough coverage and reliability.

Added extensive unit tests for ParsedTheme and Theme, covering edge cases, concurrency, color and font style parsing, rule sorting, and default rule handling. Improved test structure with region markers and helper methods. Ensured robust validation of theme logic, including thread-safety and handling of malformed or unusual input.

Refactored theme parsing and caching to minimize memory allocations and improve performance. - Replaced String.Split/LINQ with span-based parsing for scopes and font styles. - Introduced thread-safe, lock-free caching using ConcurrentDictionary and atomic operations. - Improved sorting and default rule handling to avoid unnecessary allocations. - Enhanced code clarity with comments and better variable naming. These changes make theme handling faster, more scalable, and robust for concurrent use.

…ing allocations Added a fast hash code check to AttributedScopeStack equality for O(1) rejection of non-equal stacks. Refactored GenerateScopes to use a preallocated List<string> and reverse it, improving clarity and efficiency.

Removed unnecessary result list variable in ParseTheme. Now returns a new empty list directly on early exit, improving code clarity and efficiency.

Introduced RawTests in the TextMateSharp.Tests.Internal.Grammars.Parser namespace using NUnit. The new test suite covers merging, property accessors, captures, patterns, injections, repository, boolean flags, file type handling, deep cloning, and enumeration for the Raw class. These tests ensure correct behavior, deep copy semantics, and robust handling of edge cases, significantly improving test coverage for Raw and related interfaces.

Refactored Raw.cs to use pattern matching for type checks, replaced string concatenation with ToString(), and ensured proper initialization of file type lists. Optimized list and dictionary allocations, made ConvertToDictionary static, and improved code clarity by removing redundant casts and branches.

Added MatcherBuilderTests and NameMatcherTests for thorough coverage of matcher expression parsing, priority, and edge cases using NUnit and Moq. Updated MatcherTests.cs to fix namespace and fully qualify Matcher.CreateMatchers usage. Improves reliability and test coverage for matcher logic.

Added null checks to constructors and methods for safety. Optimized list initializations with expected capacities. Refactored MatcherBuilder parsing logic for clarity and efficiency. Made utility methods static/private where appropriate. Improved tokenizer immutability and input validation. Enhanced RegExpSource and RegExpSourceList with more efficient iteration and allocation. Removed unnecessary usings and improved code readability.

Refactored identifier matching to use a foreach loop instead of LINQ's All, ensuring lastIndex is updated correctly after each match. This prevents re-scanning scopes and improves matching accuracy. Added comments to clarify the bug and the fix.

Improved list initialization with expected capacity in theme rule handling for better performance. Fixed grammar package resource name construction in ResourceLoader. Minor formatting and whitespace adjustments for consistency.

…d-safe test assertion Updated the test assertion in ParsedThemeTests to use Volatile.Read when checking totalCalls. This ensures thread-safe access to the variable in multi-threaded scenarios, improving reliability of the test.

danipen · 2026-03-02T17:31:32Z

Other than the two comments above the PR overall LGTM. Thanks!

udlose · 2026-03-02T19:46:05Z

Other than the two comments above the PR overall LGTM. Thanks!

@danipen I don't see any comments from you

src/TextMateSharp/Internal/Matcher/IMatchesName.cs

src/TextMateSharp/Themes/Theme.cs

danipen · 2026-03-02T19:48:44Z

Other than the two comments above the PR overall LGTM. Thanks!

@danipen I don't see any comments from you

Sorry I didn't submit the review. Check now please.

Refactored ParsedTheme and Theme classes to eliminate the unused 'priority' parameter from ParseTheme, ParseInclude, and LookupThemeRules methods. Updated all method calls and related test cases accordingly. This cleanup simplifies method signatures and removes dead code.

Removed the unused second argument (0) from the ParsedTheme.ParseTheme call in ThemeParsingTest.cs, reflecting the updated method signature that now only requires the theme parameter.

udlose · 2026-03-02T20:30:32Z

@danipen how do you want me to handle unit tests where I'm testing for inequality but there is a super small risk of the hash of the different object being equal which could cause the test to fail?

example:

        public void GetHashCode_DifferentName_LikelyProducesDifferentHashCodes()
        {
            // arrange
            ParsedThemeRule rule1 = CreateSampleRule();
            ParsedThemeRule rule2 = new ParsedThemeRule(
                AlternateName,
                SampleScope,
                new List<string> { "parent1", "parent2" },
                SampleIndex,
                SampleFontStyle,
                SampleForeground,
                SampleBackground);

            // act/assert
            // Note: Not guaranteed, but should happen most of the time
            Assert.AreNotEqual(rule1.GetHashCode(), rule2.GetHashCode());
        }

I'd prefer to keep them as re-running would pass them, but if you're not cool with having something that could (very rarely) fail, I can remove them. But i did document the behavior.

danipen · 2026-03-03T06:51:54Z

I prefer not having non-deterministic unit tests.

udlose · 2026-03-04T01:28:15Z

I prefer not having non-deterministic unit tests.

cool - I can respect that! i will remove them

udlose · 2026-03-04T13:20:14Z

@danipen I just noticed that the upstream java implementation declares StateStack (and some other classes in the .internal.grammar package as final (sealed in .NET). So if you wanted to have parity with the upstream impl, we should include a breaking change by declaring these classes as sealed. Also, notice it's in the .internal.grammar package, suggesting it's not meant for public extension anyways. Implementers could still implement IStateStack if desired.

I don't know which is more important to you:

parity with upstream or
avoiding any breaking API changes (though you mentioned that you are intending for these changes to go out into a 3.0 version).

Here is a list of upstream differences I found - I only looked in this folder/namespace. So, there may also be others:

class name	sealed/unsealed (has parity with TextMateSharp)	class visibility (has parity with TextMateSharp)
`AttributedScopeStack`	sealed ❌	internal ❌
`BalancedBracketSelectors`	sealed ❌	public ✅
`BasicScopeAttributes`	sealed ❌	internal ❌
`BasicScopeAttributesProvider`	sealed ❌	internal ❌
`Grammar`	sealed ❌	public ✅
`LineTokenizer`	sealed ❌	internal ❌
`LineTokens`	sealed ❌	internal ❌
`TokenTypeMatcher`	sealed ❌	internal ✅
`TokenizeLineResult`	sealed ❌	internal ❌

Do you want me to implement these changes for these classes?
if so, do you want those changes in a separate PR (I would assume so)?

danipen · 2026-03-04T13:32:25Z

Do you want me to implement these changes for these classes?

I guess nobody is using this lib to extend it so I guess the breaking change will be minor impact. Anyway as we're incrementing the version after these changes we are ok.

if so, do you want those changes in a separate PR (I would assume so)?

Yes please.

udlose · 2026-03-04T14:51:40Z

since you are cool with the breaking changes I described above, do you also want me to hide the members that are used to calculate GetHashCode? Most of the classes I've seen are not immutable which breaks the "contract" with GetHashCode.

BTW, if I'm suggesting too many changes, please let me know :)

danipen · 2026-03-04T14:59:01Z

do you also want me to hide the members that are used to calculate GetHashCode?

If are hidable, probably yeah, makes sense to hide.

BTW, if I'm suggesting too many changes, please let me know :)

Appreciated the work. Thanks again.

… tests that could produce hash collisions for inequal objects - Add comprehensive IEquatable and operator ==/!= tests - Remove hash code difference tests for similar content - Use C# collection expressions for cleaner initialization - Refactor array usage in method invocations - Move reflection helper for private Equals to end of file - Improve test naming and coverage; no production code changes

udlose · 2026-03-04T15:19:14Z

do you also want me to hide the members that are used to calculate GetHashCode?

If are hidable, probably yeah, makes sense to hide.

BTW, if I'm suggesting too many changes, please let me know :)

Appreciated the work. Thanks again.

Can you clarify what you mean by 'if are hidable'? What defines if a field/property is hidable in your mind?
what if the upstream declares them as public (thus not hidden)? Should I break the upstream contract or the GetHashCode contract? My feeling is that the upstream implementation is actually incorrect if the fields are mutable.

Refactored AttributedScopeStack to implement IEquatable<T> and provide strongly-typed Equals methods. Added operator overloads for == and !=, and improved null safety using ReferenceEquals. Refactored ComputeHashCode for clarity and consistency, and added XML documentation to clarify method purposes. These changes improve value equality semantics and support for hash-based collections.

danipen · 2026-03-04T15:49:35Z

Can you clarify what you mean by 'if are hidable'? What defines if a field/property is hidable in your mind?

Hidable: currently internal or public but not used elsewhere.

the upstream implementation is actually incorrect if the fields are mutable.

Can we respect the upstream public definition but expose as read-only (private set) to avoid the issue?

udlose · 2026-03-04T15:53:10Z

Can you clarify what you mean by 'if are hidable'? What defines if a field/property is hidable in your mind?

Hidable: currently internal or public but not used elsewhere.

the upstream implementation is actually incorrect if the fields are mutable.

Can we respect the upstream public definition but expose as read-only (private set) to avoid the issue?

yes. I should have been more clear - when I said "hide" I meant to make the class immutable by making the setters private - and if the field/property was already internal or public, the get for the field/property would match the upstream

Expanded StateStackTests to cover GetHashCode, Equals, IEquatable, operator overloads, HasSameRuleAs, WithContentNameScopesList, WithEndRule, Pop/SafePop, and Reset. Tests verify correct behavior for equality, hash code determinism, structural equivalence, null handling, dictionary usage, and edge cases including deep stack scenarios. Helper methods added for test scope stack creation.

danipen · 2026-03-04T16:56:33Z

All good then

Added comprehensive tests for AttributedScopeStack, including equality (reflexivity, symmetry, deep equivalence, null scope path), hash code consistency, operator overloads (==, !=), and ToString formatting for various stack depths and null values. Improves reliability and correctness in edge scenarios.

…tream implementation Override ToString() in AttributedScopeStack to return a space-separated string of scope names from root to leaf. Includes XML documentation for the new method.

Refactored StateStack to implement IEquatable<StateStack> with efficient, null-safe equality and hash code logic. Equality now uses iterative structural comparison and precomputed hash codes for performance. Added operator overloads for == and !=, and replaced GetHashCode with a multiply-accumulate scheme. WithContentNameScopesList uses null-safe comparison. Enhanced HasSameRuleAs to traverse parent chain and check both RuleId and enter position, throwing ArgumentNullException for null input. Added XML documentation for clarity.

udlose · 2026-03-04T18:21:50Z

current unit test coverage is:

Initialized List<Predicate<T>> matchers with a capacity of 4 in two places to improve memory usage and performance when adding multiple matchers.

danipen · 2026-03-04T18:27:11Z

current unit test coverage is:

Quite decent! (you increased a lot actually)

…) checks that would be triggered since we overload ==, !=, etc. and are unable to use 'is not' because we are restricted to C# 8.0 Replaced standard null checks with ReferenceEquals to avoid issues with overloaded equality operators and improve performance. Added explanatory comments and made minor code style adjustments for clarity and consistency.

udlose · 2026-03-04T18:51:12Z

all done with this one - benchmarks posted above.

danipen

LGTM

udlose added 21 commits February 28, 2026 16:48

Fixes danipen#116 AttributedScopeStack.GetHashCode() throws for norma…

1c2b5f3

…l stacks (null Parent / null ScopePath). Add parameter validation to PushAtributed to throw ArgumetNullException instead of a NullReferenceException

Add .gitignore rules for Visual Studio Live Unit Testing files

6ad857b

Updated .gitignore to exclude Visual Studio Live Unit Testing files, specifically those with the .lutconfig extension, to prevent committing generated test artifacts.

Refactor rule type checks to use pattern matching to avoid casts

a3d00bc

Simplified type checking and casting for IncludeOnlyRule, BeginEndRule, and BeginWhileRule by using C# pattern matching. This improves code readability and removes redundant casts and variable declarations.

Remove unnecessary List allocation on success path in ParseTheme method

a67a707

Removed unnecessary result list variable in ParseTheme. Now returns a new empty list directly on early exit, improving code clarity and efficiency.

Optimize rule list allocation and fix resource naming

3e538dd

Improved list initialization with expected capacity in theme rule handling for better performance. Fixed grammar package resource name construction in ResourceLoader. Minor formatting and whitespace adjustments for consistency.

danipen requested changes Mar 2, 2026

View reviewed changes

src/TextMateSharp/Internal/Matcher/IMatchesName.cs Outdated Show resolved Hide resolved

src/TextMateSharp/Themes/Theme.cs Outdated Show resolved Hide resolved

udlose added 3 commits March 2, 2026 14:02

PR review feedback (remove extraneous comment)

6423dc5

Update ParseTheme call to match new method signature

467acde

Removed the unused second argument (0) from the ParsedTheme.ParseTheme call in ThemeParsingTest.cs, reflecting the updated method signature that now only requires the theme parameter.

udlose marked this pull request as ready for review March 2, 2026 21:12

udlose requested a review from danipen March 2, 2026 21:39

udlose added 3 commits March 4, 2026 11:06

Add ToString() to AttributedScopeStack for scope display to match ups…

9f53a61

…tream implementation Override ToString() in AttributedScopeStack to return a space-separated string of scope names from root to leaf. Includes XML documentation for the new method.

udlose added 2 commits March 4, 2026 12:22

Add unit test for param validation in HasSameRuleAs

29e09fe

Optimize matcher list allocation with initial capacity

b3972dc

Initialized List<Predicate<T>> matchers with a capacity of 4 in two places to improve memory usage and performance when adding multiple matchers.

danipen approved these changes Mar 4, 2026

View reviewed changes

danipen merged commit 7915749 into danipen:master Mar 4, 2026
5 checks passed

Uh oh!

Conversation

udlose commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance and Correctness Improvements

Refactoring and Efficiency

Testing and Validation

Code Quality and Maintenance

Minor Fixes

Benchmarks

Uh oh!

danipen commented Mar 2, 2026

Uh oh!

udlose commented Mar 2, 2026

Uh oh!

Uh oh!

Uh oh!

danipen commented Mar 2, 2026

Uh oh!

udlose commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danipen commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

udlose commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

udlose commented Mar 4, 2026

Uh oh!

danipen commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

udlose commented Mar 4, 2026

Uh oh!

danipen commented Mar 4, 2026

Uh oh!

udlose commented Mar 4, 2026

Uh oh!

danipen commented Mar 4, 2026

Uh oh!

udlose commented Mar 4, 2026

Uh oh!

danipen commented Mar 4, 2026

Uh oh!

udlose commented Mar 4, 2026

Uh oh!

danipen commented Mar 4, 2026

Uh oh!

udlose commented Mar 4, 2026

Uh oh!

danipen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

udlose commented Mar 2, 2026 •

edited

Loading

udlose commented Mar 2, 2026 •

edited

Loading

danipen commented Mar 3, 2026 •

edited

Loading

udlose commented Mar 4, 2026 •

edited

Loading

danipen commented Mar 4, 2026 •

edited

Loading