Skip to content

Fix #116 and Performance/reduce allocations#121

Merged
danipen merged 33 commits intodanipen:masterfrom
udlose:performance/reduce-allocations
Mar 4, 2026
Merged

Fix #116 and Performance/reduce allocations#121
danipen merged 33 commits intodanipen:masterfrom
udlose:performance/reduce-allocations

Conversation

@udlose
Copy link
Contributor

@udlose udlose commented Mar 2, 2026

Fixes #116 (separate PR coming for #117 )

This pull request introduces comprehensive improvements to the AttributedScopeStack and StateStack classes, focusing on performance, correctness, and test coverage. The most significant changes include optimizing equality and hashing for AttributedScopeStack, refactoring scope parsing logic for efficiency and correctness, and adding robust unit tests for StateStack's string representation. These changes enhance the reliability and speed of grammar stack operations in TextMateSharp.

Performance and Correctness Improvements

  • AttributedScopeStack now precomputes hash codes for each instance, enabling fast equality checks and preventing unnecessary traversal of the parent chain. The equality logic has also been updated to use these hash codes and to compare scope paths using ordinal string comparison for correctness. [1] [2]
  • The scope matching logic in AttributedScopeStack is refactored to avoid unnecessary string allocations and to improve correctness, especially for edge cases involving scope selectors.

Refactoring and Efficiency

  • The logic for pushing multiple scopes in AttributedScopeStack is rewritten to use ReadOnlySpan<char> for efficient, allocation-free parsing, and to correctly handle trailing spaces and empty segments. [1] [2]
  • The GenerateScopes method now precomputes the stack depth to pre-size the resulting list, reducing memory allocations and improving performance.

Testing and Validation

  • Added a comprehensive suite of unit tests for StateStack.ToString in StateStackTests.cs, covering single and multi-depth stacks, special rule IDs, boundary conditions, and repeated calls for consistency. This ensures the correctness and reliability of the stack's string representation.

Code Quality and Maintenance

  • The Equals method in StateStack is refactored for clarity and correctness, using pattern matching and improved structure.
  • The ToString method in StateStack is rewritten for efficiency, using a single pass to collect rule IDs and a StringBuilder to assemble the output, reducing allocations and improving readability.

Minor Fixes

  • Minor cleanup in SimpleJSON.cs to simplify dictionary assignment logic.

Benchmarks

Before:
TextMateSharp.Benchmarks.BigFileTokenizationBenchmark-report-github.md

After:

TextMateSharp.Benchmarks.BigFileTokenizationBenchmark-report-github-ScopeStack work - final.md

Converted test class to public and added extensive unit tests for AttributedScopeStack, covering constructors, equality, hash codes, scope name retrieval, attribute merging, and push operations. Includes edge cases, reflection-based tests, and helpers for stack creation and metadata. Validates robustness and correctness of all key behaviors.
…l stacks (null Parent / null ScopePath). Add parameter validation to PushAtributed to throw ArgumetNullException instead of a NullReferenceException
…le TFM #ifdef for GetHashCode

- Refactored AttributedScopeStack for performance and clarity: precomputed hash codes, optimized Equals/GetHashCode, and improved scope matching logic.
- Changed handling of empty parentScopes to "always matches" instead of throwing.
- Enhanced Push logic to handle multi-scope strings and trailing spaces correctly.
- Expanded and clarified unit tests for scope matching and stack manipulation, including new edge cases.
- Improved resilience to malformed theme data and overall test coverage.
…hrow ArgumentOutOfRangeException thru the Matches method if an empty scopesList param was passed. This was because there was no length check on the collection before the collection was indexed. I opted to change Matches to return true if parentScopes is empty. Let me know if you agree with this change.

Updated the Matches method to return true when parentScopes is null or empty, ensuring correct behavior when no parent scopes are provided.
Updated .gitignore to exclude Visual Studio Live Unit Testing files, specifically those with the .lutconfig extension, to prevent committing generated test artifacts.
Simplified dictionary add/update logic by directly assigning values without checking for key existence. Replaced ContainsKey with TryGetValue in Remove(string aKey) for efficiency. Refactored Remove(JSONNode aNode) to avoid LINQ and exception handling, using a straightforward iteration to find and remove the node. These changes improve code clarity and maintainability.
Introduced StateStackTests with extensive NUnit test coverage
for the StateStack.ToString() method. Tests include various
stack depths, special rule IDs, boundary conditions, repeated
calls, and correct rule ordering. No changes to existing code.
Refactored the StateStack class to improve code clarity and performance. Updated using directives, modernized the Equals method with pattern matching, and reimplemented ToString using StringBuilder and an array of RuleIds for efficiency. Removed the old recursive string construction logic. Reformatted GetHashCode for readability. These changes enhance maintainability and performance.
Simplified type checking and casting for IncludeOnlyRule, BeginEndRule, and BeginWhileRule by using C# pattern matching. This improves code readability and removes redundant casts and variable declarations.
Introduce ParsedThemeTests and ThemeTests to cover a wide range of scenarios for theme parsing and management. Tests include handling of null/empty/whitespace includes, color ID management for various formats, GUI color dictionary behavior, color map correctness, scope matching, and rule overwriting. Utilizes Moq and NUnit for thorough coverage and reliability.
Added extensive unit tests for ParsedTheme and Theme, covering edge cases, concurrency, color and font style parsing, rule sorting, and default rule handling. Improved test structure with region markers and helper methods. Ensured robust validation of theme logic, including thread-safety and handling of malformed or unusual input.
Refactored theme parsing and caching to minimize memory allocations and improve performance.
- Replaced String.Split/LINQ with span-based parsing for scopes and font styles.
- Introduced thread-safe, lock-free caching using ConcurrentDictionary and atomic operations.
- Improved sorting and default rule handling to avoid unnecessary allocations.
- Enhanced code clarity with comments and better variable naming.
These changes make theme handling faster, more scalable, and robust for concurrent use.
…ing allocations

Added a fast hash code check to AttributedScopeStack equality for O(1) rejection of non-equal stacks. Refactored GenerateScopes to use a preallocated List<string> and reverse it, improving clarity and efficiency.
Removed unnecessary result list variable in ParseTheme. Now returns a new empty list directly on early exit, improving code clarity and efficiency.
Introduced RawTests in the TextMateSharp.Tests.Internal.Grammars.Parser namespace using NUnit. The new test suite covers merging, property accessors, captures, patterns, injections, repository, boolean flags, file type handling, deep cloning, and enumeration for the Raw class. These tests ensure correct behavior, deep copy semantics, and robust handling of edge cases, significantly improving test coverage for Raw and related interfaces.
Refactored Raw.cs to use pattern matching for type checks, replaced string concatenation with ToString(), and ensured proper initialization of file type lists. Optimized list and dictionary allocations, made ConvertToDictionary static, and improved code clarity by removing redundant casts and branches.
Added MatcherBuilderTests and NameMatcherTests for thorough
coverage of matcher expression parsing, priority, and edge
cases using NUnit and Moq. Updated MatcherTests.cs to fix
namespace and fully qualify Matcher.CreateMatchers usage.
Improves reliability and test coverage for matcher logic.
Added null checks to constructors and methods for safety. Optimized list initializations with expected capacities. Refactored MatcherBuilder parsing logic for clarity and efficiency. Made utility methods static/private where appropriate. Improved tokenizer immutability and input validation. Enhanced RegExpSource and RegExpSourceList with more efficient iteration and allocation. Removed unnecessary usings and improved code readability.
Refactored identifier matching to use a foreach loop instead of LINQ's All, ensuring lastIndex is updated correctly after each match. This prevents re-scanning scopes and improves matching accuracy. Added comments to clarify the bug and the fix.
Improved list initialization with expected capacity in theme rule handling for better performance. Fixed grammar package resource name construction in ResourceLoader. Minor formatting and whitespace adjustments for consistency.
…d-safe test assertion

Updated the test assertion in ParsedThemeTests to use Volatile.Read when checking totalCalls. This ensures thread-safe access to the variable in multi-threaded scenarios, improving reliability of the test.
@danipen
Copy link
Owner

danipen commented Mar 2, 2026

Other than the two comments above the PR overall LGTM. Thanks!

@udlose
Copy link
Contributor Author

udlose commented Mar 2, 2026

Other than the two comments above the PR overall LGTM. Thanks!

@danipen I don't see any comments from you

@danipen
Copy link
Owner

danipen commented Mar 2, 2026

Other than the two comments above the PR overall LGTM. Thanks!

@danipen I don't see any comments from you

Sorry I didn't submit the review. Check now please.

udlose added 3 commits March 2, 2026 14:02
Refactored ParsedTheme and Theme classes to eliminate the unused 'priority' parameter from ParseTheme, ParseInclude, and LookupThemeRules methods. Updated all method calls and related test cases accordingly. This cleanup simplifies method signatures and removes dead code.
Removed the unused second argument (0) from the ParsedTheme.ParseTheme call in ThemeParsingTest.cs, reflecting the updated method signature that now only requires the theme parameter.
@udlose
Copy link
Contributor Author

udlose commented Mar 2, 2026

@danipen how do you want me to handle unit tests where I'm testing for inequality but there is a super small risk of the hash of the different object being equal which could cause the test to fail?

example:

        public void GetHashCode_DifferentName_LikelyProducesDifferentHashCodes()
        {
            // arrange
            ParsedThemeRule rule1 = CreateSampleRule();
            ParsedThemeRule rule2 = new ParsedThemeRule(
                AlternateName,
                SampleScope,
                new List<string> { "parent1", "parent2" },
                SampleIndex,
                SampleFontStyle,
                SampleForeground,
                SampleBackground);

            // act/assert
            // Note: Not guaranteed, but should happen most of the time
            Assert.AreNotEqual(rule1.GetHashCode(), rule2.GetHashCode());
        }

I'd prefer to keep them as re-running would pass them, but if you're not cool with having something that could (very rarely) fail, I can remove them. But i did document the behavior.

@udlose udlose marked this pull request as ready for review March 2, 2026 21:12
@udlose udlose requested a review from danipen March 2, 2026 21:39
@danipen
Copy link
Owner

danipen commented Mar 3, 2026

I prefer not having non-deterministic unit tests.

@udlose
Copy link
Contributor Author

udlose commented Mar 4, 2026

I prefer not having non-deterministic unit tests.

cool - I can respect that! i will remove them

@udlose
Copy link
Contributor Author

udlose commented Mar 4, 2026

@danipen I just noticed that the upstream java implementation declares StateStack (and some other classes in the .internal.grammar package as final (sealed in .NET). So if you wanted to have parity with the upstream impl, we should include a breaking change by declaring these classes as sealed. Also, notice it's in the .internal.grammar package, suggesting it's not meant for public extension anyways. Implementers could still implement IStateStack if desired.

I don't know which is more important to you:

  1. parity with upstream or
  2. avoiding any breaking API changes (though you mentioned that you are intending for these changes to go out into a 3.0 version).

Here is a list of upstream differences I found - I only looked in this folder/namespace. So, there may also be others:

class name sealed/unsealed (has parity with TextMateSharp) class visibility (has parity with TextMateSharp)
AttributedScopeStack sealed ❌ internal ❌
BalancedBracketSelectors sealed ❌ public ✅
BasicScopeAttributes sealed ❌ internal ❌
BasicScopeAttributesProvider sealed ❌ internal ❌
Grammar sealed ❌ public ✅
LineTokenizer sealed ❌ internal ❌
LineTokens sealed ❌ internal ❌
TokenTypeMatcher sealed ❌ internal ✅
TokenizeLineResult sealed ❌ internal ❌
  1. Do you want me to implement these changes for these classes?
  2. if so, do you want those changes in a separate PR (I would assume so)?

@danipen
Copy link
Owner

danipen commented Mar 4, 2026

Do you want me to implement these changes for these classes?

I guess nobody is using this lib to extend it so I guess the breaking change will be minor impact. Anyway as we're incrementing the version after these changes we are ok.

if so, do you want those changes in a separate PR (I would assume so)?

Yes please.

@udlose
Copy link
Contributor Author

udlose commented Mar 4, 2026

since you are cool with the breaking changes I described above, do you also want me to hide the members that are used to calculate GetHashCode? Most of the classes I've seen are not immutable which breaks the "contract" with GetHashCode.

BTW, if I'm suggesting too many changes, please let me know :)

@danipen
Copy link
Owner

danipen commented Mar 4, 2026

do you also want me to hide the members that are used to calculate GetHashCode?

If are hidable, probably yeah, makes sense to hide.

BTW, if I'm suggesting too many changes, please let me know :)

Appreciated the work. Thanks again.

… tests that could produce hash collisions for inequal objects

- Add comprehensive IEquatable and operator ==/!= tests
- Remove hash code difference tests for similar content
- Use C# collection expressions for cleaner initialization
- Refactor array usage in method invocations
- Move reflection helper for private Equals to end of file
- Improve test naming and coverage; no production code changes
@udlose
Copy link
Contributor Author

udlose commented Mar 4, 2026

do you also want me to hide the members that are used to calculate GetHashCode?

If are hidable, probably yeah, makes sense to hide.

BTW, if I'm suggesting too many changes, please let me know :)

Appreciated the work. Thanks again.

  1. Can you clarify what you mean by 'if are hidable'? What defines if a field/property is hidable in your mind?
  2. what if the upstream declares them as public (thus not hidden)? Should I break the upstream contract or the GetHashCode contract? My feeling is that the upstream implementation is actually incorrect if the fields are mutable.

Refactored AttributedScopeStack to implement IEquatable<T> and provide strongly-typed Equals methods. Added operator overloads for == and !=, and improved null safety using ReferenceEquals. Refactored ComputeHashCode for clarity and consistency, and added XML documentation to clarify method purposes. These changes improve value equality semantics and support for hash-based collections.
@danipen
Copy link
Owner

danipen commented Mar 4, 2026

Can you clarify what you mean by 'if are hidable'? What defines if a field/property is hidable in your mind?

Hidable: currently internal or public but not used elsewhere.

the upstream implementation is actually incorrect if the fields are mutable.

Can we respect the upstream public definition but expose as read-only (private set) to avoid the issue?

@udlose
Copy link
Contributor Author

udlose commented Mar 4, 2026

Can you clarify what you mean by 'if are hidable'? What defines if a field/property is hidable in your mind?

Hidable: currently internal or public but not used elsewhere.

the upstream implementation is actually incorrect if the fields are mutable.

Can we respect the upstream public definition but expose as read-only (private set) to avoid the issue?

yes. I should have been more clear - when I said "hide" I meant to make the class immutable by making the setters private - and if the field/property was already internal or public, the get for the field/property would match the upstream

Expanded StateStackTests to cover GetHashCode, Equals, IEquatable, operator overloads, HasSameRuleAs, WithContentNameScopesList, WithEndRule, Pop/SafePop, and Reset. Tests verify correct behavior for equality, hash code determinism, structural equivalence, null handling, dictionary usage, and edge cases including deep stack scenarios. Helper methods added for test scope stack creation.
@danipen
Copy link
Owner

danipen commented Mar 4, 2026

All good then

udlose added 3 commits March 4, 2026 11:06
Added comprehensive tests for AttributedScopeStack, including equality (reflexivity, symmetry, deep equivalence, null scope path), hash code consistency, operator overloads (==, !=), and ToString formatting for various stack depths and null values. Improves reliability and correctness in edge scenarios.
…tream implementation

Override ToString() in AttributedScopeStack to return a space-separated string of scope names from root to leaf. Includes XML documentation for the new method.
Refactored StateStack to implement IEquatable<StateStack> with efficient, null-safe equality and hash code logic. Equality now uses iterative structural comparison and precomputed hash codes for performance. Added operator overloads for == and !=, and replaced GetHashCode with a multiply-accumulate scheme. WithContentNameScopesList uses null-safe comparison. Enhanced HasSameRuleAs to traverse parent chain and check both RuleId and enter position, throwing ArgumentNullException for null input. Added XML documentation for clarity.
@udlose
Copy link
Contributor Author

udlose commented Mar 4, 2026

current unit test coverage is:
image

udlose added 2 commits March 4, 2026 12:22
Initialized List<Predicate<T>> matchers with a capacity of 4 in two places to improve memory usage and performance when adding multiple matchers.
@danipen
Copy link
Owner

danipen commented Mar 4, 2026

current unit test coverage is:

Quite decent! (you increased a lot actually)

…) checks that would be triggered since we overload ==, !=, etc. and are unable to use 'is not' because we are restricted to C# 8.0

Replaced standard null checks with ReferenceEquals to avoid issues with overloaded equality operators and improve performance. Added explanatory comments and made minor code style adjustments for clarity and consistency.
@udlose
Copy link
Contributor Author

udlose commented Mar 4, 2026

all done with this one - benchmarks posted above.

Copy link
Owner

@danipen danipen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danipen danipen merged commit 7915749 into danipen:master Mar 4, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AttributedScopeStack.GetHashCode() throws for normal stacks (null Parent / null ScopePath)

2 participants