This repository was archived by the owner on Jun 22, 2020. It is now read-only.

Description
There are some cases where the question of what to consider as a "document" -- i.e. the fundamental unit of search indexing in ElasticSearch -- is questionable.
Two prototypical cases:
- Really long documents, like hundred-some page reports. These are hard because they often cover multiple topics and it's hard to get ElasticSearch to tell us where in that sort of document a hit occurs. The temptation is to split them into chapters or individual pages for indexing. But then you may want to continue reading the whole document
- Smushed together documents. Sometimes FOIAs show up as one (or a few) PDFs with multiple responsive documents all squished together in one PDF. These documents are sometimes multiple pages long. Indexing, say, 5 very long documents is not a good idea, since the documents don't have anything in common. But splitting on pages, again, separates the pages of multi-page documents.
Possible solutions:
- add an additional field in ElasticSearch and a button in the interface to go the next/prev page in a multi-page document (regardless of type).
- continue to tweak elasticsearch to store locations so we can scroll you to the location of your hits in the detail view.
- other ideas???