Skip to content

Conversation

@davidkopp
Copy link
Contributor

The JSON file for the search index had a size of 2 MB on my site:

Image

I think this is too big. With the changes in this PR the search index on my site is now reduced to 292 kB:

image

Changes:

  • remove newlines and whitespace
  • remove link tags from markdown links properly (striptags(true) | link was not sufficient)
  • reduce the length of the content per note in the search index to 500 characters
  • minify the whole json

@oleeskild
Copy link
Owner

Nice, great start!
Do you have a good justification for the 500 character cutoff? I think it's reasonable to assume that the search will search the entire note, and not just the 500 first characters.
Maybe we could default it to not reduce the lenght, but introduce a config value somewhere to reduce it for those that needs it.
Thoughts?

@davidkopp
Copy link
Contributor Author

My goal was to reduce the index size with the trade off of having slightly worse search results. I made the decision to only use 500 characters, as I assume that the first 500 characters already contain the most relevant keywords.

But it makes sense to make it configurable.
As this is only relevant if the search feature is enabled, would it make sense to put it under "Enable search" in the default note settings? Or would you prefer it as part of the advanced settings?
At the moment I think a simple boolean flag would be sufficient for this, like SEARCH_INDEX_USE_FULL_NOTE or SEARCH_INDEX_TRUNCATE_CONTENT. Not sure if anybody would like to have a different character limit than 500.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants