Skip to content

Conversation

@gorsing
Copy link
Contributor

@gorsing gorsing commented Jan 9, 2026

Adds a specialized fast path to array equality for immutable arrays of
primitive element types.

For identical immutable element types (integral, char/wchar/dchar, bool),
equality is optimized by:

  • Early return when both slices reference the same memory
  • Direct element comparison via ptr access

All other cases continue to use the existing generic comparison logic.
This change is a pure performance optimization with no semantic impact.

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @gorsing! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#22366"

@dkorpel
Copy link
Contributor

dkorpel commented Jan 9, 2026

I'd assume LDC is able to inline and optimize the loop without explicitly using lhs.ptr. Do you have a benchmark / assembly output to show an improvement?

@gorsing
Copy link
Contributor Author

gorsing commented Jan 9, 2026

You are right that LDC is excellent at optimizing loops. However, the main goal of this PR is to introduce an Identity Check optimization.

Currently, comparing a large array to itself takes O(N) time. By adding if (lhs.ptr == rhs.ptr), we make it O(1) for identity cases.

Here is a benchmark showing the impact on identity matches (10M elements):

  • Old version: ~4500 μs (linear scan)
  • New version: ~0.1 μs (identity shortcut)

For the loop itself, I'm happy to use the standard lhs[i] if LDC's output is identical, but the pointer comparison at the start provides a massive win for a common pattern.

My draft benchmark on https://run.dlang.io/

import std.stdio;
import std.datetime.stopwatch;
import std.range : iota;
import std.array : array;

// old
bool __equals_old(T1, T2)(scope T1[] lhs, scope T2[] rhs) @trusted
{
    if (lhs.length != rhs.length)
        return false;
    if (lhs.length == 0)
        return true;

    static bool isEqualOld(T1, T2)(scope T1[] lhs, scope T2[] rhs, size_t length)
    {
        pragma(inline, true) static ref at(T)(scope T[] r, size_t i) @trusted
        {
            static if (is(T == void))
                return (cast(ubyte[]) r)[i];
            else
                return r[i];
        }

        foreach (const i; 0 .. length)
        {
            if (at(lhs, i) != at(rhs, i))
                return false;
        }
        return true;
    }

    return isEqualOld!(T1, T2)(lhs, rhs, lhs.length);
}

// new
bool __equals_new(T1, T2)(scope T1[] lhs, scope T2[] rhs) @trusted
{
    if (lhs.length != rhs.length)
        return false;
    if (lhs.length == 0)
        return true;

    // first check
    static if (is(immutable T1 == immutable T2) && (__traits(isIntegral, T1)
            || is(T1 == char) || is(T1 == bool)))
    {
        if (lhs.ptr == rhs.ptr)
            return true; // <-- main hook
    }

    static bool isEqualNew(T1, T2)(scope T1[] lhs, scope T2[] rhs, size_t length)
    {
        static if (is(immutable T1 == immutable T2) && (__traits(isIntegral, T1)
                || is(T1 == char) || is(T1 == bool)))
        {
            foreach (i; 0 .. length)
            {
                if (lhs.ptr[i] != rhs.ptr[i])
                    return false;
            }
            return true;
        }
        else
        {
            // old logic
            foreach (const i; 0 .. length)
            {
                if (lhs[i] != rhs[i])
                    return false;
            }
            return true;
        }
    }

    return isEqualNew!(T1, T2)(lhs, rhs, lhs.length);
}

void main()
{
    // Size 10000000
    enum size = 10_000_000;
    auto arr1 = iota(size).array;
    auto arr2 = arr1;
    writeln("Testing Identity Comparison (O(1) vs O(N))...");

    // old
    auto sw = StopWatch(AutoStart.no);
    sw.start();
    auto resOld = __equals_old(arr1, arr2);
    sw.stop();
    auto timeOld = sw.peek.total!"usecs";
    writeln("Old Version: ", timeOld, " μs");

    // new
    sw.reset();
    sw.start();
    auto resNew = __equals_new(arr1, arr2);
    sw.stop();
    auto timeNew = sw.peek.total!"usecs";
    writeln("New Version: ", timeNew, " μs");

    if (timeNew < timeOld)
    {
        writeln("\nSuccess! New version is ", timeOld / (timeNew + 0.1),
                "x faster for identity match.");
    }
}

@dkorpel
Copy link
Contributor

dkorpel commented Jan 9, 2026

However, the main goal of this PR is to introduce an Identity Check optimization.

Doing an identity check is a 1 line change, the extra special casing looks redundant.

@dkorpel
Copy link
Contributor

dkorpel commented Jan 9, 2026

The identity check doesn't work for custom opEquals and floating point types (because of nan comparisons), it should be limited to integral or class types. I don't see a reason to check for immutable.

@dkorpel
Copy link
Contributor

dkorpel commented Jan 9, 2026

Could you now undo the changes to bool isEqual(T1, T2) please?

@gorsing
Copy link
Contributor Author

gorsing commented Jan 9, 2026

I have finalized the implementation by addressing all feedback:

  • Moved the length == 0 check to the top for all types.
  • The Identity Check optimization is now a simple 1-line addition within a static if block.
  • Restricted the optimization to safe types (integral, characters, boolean, and classes) to avoid issues with NaN and custom opEquals.
  • Removed redundant immutable casting and pointer checks in the helper function. This version is clean, efficient, and passes all existing tests.

@gorsing
Copy link
Contributor Author

gorsing commented Jan 9, 2026

Update draft benchmark

import std.stdio;
import std.datetime.stopwatch;
import std.range : iota;
import std.array : array;
import std.algorithm : equal;

bool __equals_new(T1, T2)(scope T1[] lhs, scope T2[] rhs) @trusted
{
    if (lhs.length != rhs.length) return false;
    if (lhs.length == 0) return true;

    static if (is(T1 == T2) && 
              (__traits(isIntegral, T1) || is(T1 == class) || is(T1 == char) || 
               is(T1 == wchar) || is(T1 == dchar) || is(T1 == bool)))
    {
        if (lhs.ptr == rhs.ptr) return true;
    }

    foreach (i; 0 .. lhs.length)
        if (lhs[i] != rhs[i]) return false;
    return true;
}

bool __equals_old(T1, T2)(scope T1[] lhs, scope T2[] rhs) @trusted
{
    if (lhs.length != rhs.length) return false;
    if (lhs.length == 0) return true;

    foreach (i; 0 .. lhs.length)
        if (lhs[i] != rhs[i]) return false;
    return true;
}

void runBenchmark(T)(string label, T[] arr1, T[] arr2)
{
    enum iterations = 100;
    
    writeln("--- ", label, " ---");

    // Old
    auto sw = StopWatch(AutoStart.no);
    sw.start();
    foreach (_; 0 .. iterations)
    {
        auto res = __equals_old(arr1, arr2);
    }
    sw.stop();
    auto timeOld = sw.peek.total!"hnsecs";
    writeln("Old Version (total ", iterations, " runs): ", timeOld / 10.0, " μs");

    // New
    sw.reset();
    sw.start();
    foreach (_; 0 .. iterations)
    {
        auto res = __equals_new(arr1, arr2);
    }
    sw.stop();
    auto timeNew = sw.peek.total!"hnsecs";
    writeln("New Version (total ", iterations, " runs): ", timeNew / 10.0, " μs");

    double speedup = cast(double)timeOld / (timeNew + 1);
    writefln("Speedup: %.2f x", speedup);
    writeln();
}

void main()
{
    enum size = 1_000_000;
    
    auto iArr = iota(size).array;
    runBenchmark("Integers: Identity Match (O(1) expected)", iArr, iArr);

    auto sArr = "some very long string ".array;
    runBenchmark("Strings: Identity Match", sArr, sArr);

    auto iArr2 = iota(size).array;
    auto iArr3 = iota(size).array;
    iArr3[$-1] = 0;
    runBenchmark("Integers: Different Content (O(N) expected)", iArr2, iArr3);

    double[] dArr = [double.nan, 1.0, 2.0];
    runBenchmark("Doubles: NaN Check (Optimization must be disabled)", dArr, dArr);
}

@gorsing gorsing force-pushed the refactoring_core_internal_array_equality branch 2 times, most recently from a091421 to a1d0249 Compare January 9, 2026 19:59
@gorsing gorsing force-pushed the refactoring_core_internal_array_equality branch from a1d0249 to ad4fe6f Compare January 9, 2026 20:03
@gorsing gorsing requested a review from dkorpel January 9, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants