Skip to content

Conversation

@tgpfeiffer
Copy link

This is an attempt to fix #30

(Disclaimer: I implemented this using some support from Copilot.)

This goes the straightforward way and replaces u64 in the ImageHash struct by a Vec<u8>. I am not sure about the performance implications as I don't know the numbers before this PR, but on my machine the benchmarks now look like

$ python compare.py
Found 20 images in ../../../imgs/test/. Running benchmarks for 100 runs...

Benchmarking dHash...

Benchmark Results for dHash (in seconds per image):

Metric      imgdd (ns)  imagehash (ns)
----------------------------------------------
min_time    1316625.500000 3898336.000000
max_time    2176291.550000 4588001.600000
avg_time    1470818.250000 3979595.414500
median_time 1410523.850000 3951972.775000

Percentage Difference (imgdd vs. imagehash):

Metric      Difference (%) 
----------------------------
min_time    66.23          
max_time    52.57          
avg_time    63.04          
median_time 64.31          


Benchmarking aHash...

Benchmark Results for aHash (in seconds per image):

Metric      imgdd (ns)  imagehash (ns)
----------------------------------------------
min_time    1296290.500000 3864925.150000
max_time    2131858.950000 4151851.300000
avg_time    1462870.159000 3938232.084500
median_time 1409924.325000 3938366.125000

Percentage Difference (imgdd vs. imagehash):

Metric      Difference (%) 
----------------------------
min_time    66.46          
max_time    48.65          
avg_time    62.85          
median_time 64.20          


Benchmarking pHash...

Benchmark Results for pHash (in seconds per image):

Metric      imgdd (ns)  imagehash (ns)
----------------------------------------------
min_time    1340875.600000 4332915.000000
max_time    1787807.250000 4927223.450000
avg_time    1455338.274500 4425243.710500
median_time 1418410.925000 4392500.725000

Percentage Difference (imgdd vs. imagehash):

Metric      Difference (%) 
----------------------------
min_time    69.05          
max_time    63.72          
avg_time    67.11          
median_time 67.71          


Benchmarking pHash256...

Benchmark Results for pHash256 (in seconds per image):

Metric      imgdd (ns)  imagehash (ns)
----------------------------------------------
min_time    1337245.050000 4625274.400000
max_time    1836423.200000 6072616.000000
avg_time    1479914.261000 4774408.488000
median_time 1429946.300000 4707407.375000

Percentage Difference (imgdd vs. imagehash):

Metric      Difference (%) 
----------------------------
min_time    71.09          
max_time    69.76          
avg_time    69.00          
median_time 69.62

...

I've opted for the smallest possible change to the code, which made the external interface a bit ugly: I added an optional hash_size parameter to all the exposed functions, which is ignored by all algorithms except pHash. I guess a better way would be to define an enum where each variant corresponds to one algorithm and contains the parameters for this algorithm, but it changes the interface in a backwards-incompatible way and requires more work on the Python/Rust border, so I wanted to get your feedback first. Please let me know what you think @aastopher.

@deepsource-io
Copy link

deepsource-io bot commented Jan 12, 2026

Here's the code health analysis summary for commits 3dcfca5..a47380b. View details on DeepSource ↗.

Analysis Summary

AnalyzerStatusSummaryLink
DeepSource Rust LogoRust✅ SuccessView Check ↗

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

imgddcore: can we parameterize phash hash_size?

1 participant