Skip to content

Conversation

@Meow
Copy link
Member

@Meow Meow commented Sep 25, 2025

Original PR: #401

This PR replaces the old "image intensities" reverse image search, and has come about due to the confluence of several key factors within the past year:

  • Computer vision utilities like those in PyTorch have become more accessible than ever, with native language bindings like tch-rs removing the need for a Python server
  • The self-distillation vision transformers DINOv2 and DINOv2 with registers have been released, which come with pretrained weights to extract semantic features from images without the need for a finetuned head. The authors claim that these systems can extract robust features on any type of downstream task as-is. I believe they are underselling how good it is, and found the recall to be excellent during model selection.
  • The OpenSearch project has released the k-NN plugin, which enables nearest neighbor search over dense vectors, like the kind representing the CLS token of a ViT.

Together, these factors are used to implement a reverse image search system that uses semantic meaning in the images to identify them, rather than their overall appearance. To establish what is meant by this, here are some examples of an original image and matches found when executing on Derpibooru:

Demo Result
Line art 2025-01-12 18-41-24
Hamburger 2025-01-12 18-41-48
Trixie 2025-01-12 18-42-25
Scenery 2025-01-12 18-44-24

The fact that DINOv2 has semantic extraction can be determined through generated attention maps for these images. The code to generate these attention maps can be found in this repository. These have been reprocessed at a higher scale for visibility:

Scaled original Attention map
442297 442297_attention
1110529 1110529_attention
1188964 1188964_attention
3515313 3515313_attention

The system works as follows:

  1. Image/video is previewed into a raw RGB bitmap
  2. Bitmap is resampled to model target dimensions
  3. Classification vector is retrieved from model
  4. Classification vector is normalized to convert the k-NN search into one ordered by cosine similarity, and delivered back to the application
  5. For indexing, the normalized vector is stored as a nested field into into image search index; for search, the nearest neighbors are retrieved using a HNSW index

Indexing the classification vector using a nested field allows for the possibility of extracting multiple vectors from each image, and the database table has been set up to allow this should it be desired in the future.

I have pre-computed the DINOv2 with registers features for ~3.5M images on Derpibooru, ~400K images on Furbooru, and ~35K images on Tantabus. Batch inference was run on a 3060 Ti using code from this repository, with the entire process heavily bottlenecked by memory copy bandwidth and image decode performance rather than the GPU execution itself. However, the inference code is efficient enough to run on a CPU in less than 0.5 seconds per image, and this is what is implemented in the repository (with the expectation that there will be no GPU requirement on the server).

This PR must not be merged until OpenSearch releases version 2.19, as 2.18 contains a critical bug that prevents the system from working in all cases. Other bugs
relating to filtering may or may not also be fixed in the 2.19 release, but have been worked around for now.

Meow: we're on OpenSearch 3.2.0 now

This PR must also not be merged until its dependents #389 and #400 are merged.

Meow: these are merged now

Fixes #331 (method outdated)

@Meow Meow mentioned this pull request Sep 25, 2025
@Meow
Copy link
Member Author

Meow commented Sep 25, 2025

it appears like the reverse search no longer functions correctly, trying to search for an image using its thumbnail as reverse search image

  • image vectors are not created upon image creation
  • it's possible to create them via Philomena.ImageVectors.BatchProcessor.all_missing("full", batch_size: 32)
  • but even if they're created, reverse-search does not work, and even providing the original image to the reverse searcher again doesn't appear to produce any results

@liamwhite liamwhite marked this pull request as draft September 25, 2025 14:21
@liamwhite
Copy link
Contributor

I think this should not be merged until opensearch-project/k-NN#2222 is properly addressed

@liamwhite
Copy link
Contributor

liamwhite commented Sep 25, 2025

Also, it'd be good to implement a rudimentary form of batching (merge together up to 8 requests or on a 100ms timer?) to improve the efficiency of performing evaluations

@Meow
Copy link
Member Author

Meow commented Sep 25, 2025

I think this should not be merged until opensearch-project/k-NN#2222 is properly addressed

Right, I kinda assumed this would be fixed by now by them. Wow they're slow.

Are you sure it's wise to wait and not just do it? This issue seems to have almost zero traction or movement, I'm not convinced that they'll fix it in any sort of reasonable timely manner, and it'd be a shame to potentially hold a feature for months or even years because some other maintainers have other priorities.

Also, it'd be good to implement a rudimentary form of batching (merge together up to 8 requests or on a 100ms timer?) to improve the efficiency of performing evaluations

Maybe. I'm not convinced that this would ever be a condition that would be met, though, save for some sort of an attack. But in that case, I'd argue a 1 second window is more wise. People already wait several seconds for image upload, what's 1 second more.

@liamwhite
Copy link
Contributor

Are you sure it's wise to wait and not just do it?

Yes because the reverse search feature will straight up not work properly without it being fixed

@Meow
Copy link
Member Author

Meow commented Sep 25, 2025

Are you sure it's wise to wait and not just do it?

Yes because the reverse search feature will straight up not work properly without it being fixed

Did you not have a workaround? You mentioned it in the original PR.

@liamwhite
Copy link
Contributor

The workaround causes incomplete results if there is a separate filter applied. A previous bug (which I am not sure yet if it was fixed) caused the search engine to segfault when rebalancing vector documents between shards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reverse search improvement: store non-transparent intensities of transparent images

3 participants