Skip to content

Have OH LLM answers be more aware of scope limitations, with instructions and clearer access to OH counts#3282

Draft
jrochkind wants to merge 12 commits intomasterfrom
oh_category_count_extraction
Draft

Have OH LLM answers be more aware of scope limitations, with instructions and clearer access to OH counts#3282
jrochkind wants to merge 12 commits intomasterfrom
oh_category_count_extraction

Conversation

@jrochkind
Copy link
Contributor

@jrochkind jrochkind commented Feb 3, 2026

Ref #3276

  • We want to add real counts of oral histories searched to claude instructions, so the LLM can have an idea of what (small) portion of the total corpus it has analyzed, to be able to properly explain what it can do.

    • want it to be live and accurate for sub-collection chosen from radio buttons
    • Will use Rails caching though so we aren't looking it up every time
    • We extract the lookup that was previously used in the counts next to radio buttons, to a CategoryWithChunksCount service object
    • We use this to embed actual count into claude instructions
  • We move the count to the "user prompt" (that has the chunks) rather than the "system prompt"

    • So system prompt will remain more cacheable should we want to cache it on LLM side for efficiency later.
    • To be closer to the chunks it's about, might help
  • We expand instructions to Claude to ask it to provide clarification/disclaimers that it is only providing examples and can't provide exhaustive or quantitative results over the entire collection.

    • It is instructed to use some judgement on when these instructions are necessary -- they do end up necessary for most of our sample questions, which seems reasonable.
    • I did ask Claude itself for some advice on how to formulate these instructions; I don't accept it uncritically, I sometimes push back and ask Claude to change, and then adapt it myself when putting into code. Especially to keep the instructions shorter, the prompt can't get too long or it inhibits the LLM following it, and originally the answer wanted me to add so much! See conversation here

@jrochkind jrochkind marked this pull request as draft February 3, 2026 15:56
@jrochkind jrochkind marked this pull request as draft February 3, 2026 15:56
@jrochkind
Copy link
Contributor Author

You know what, since the number varies, let's move it to user prompt, so we can later add some caching for system prompt.

@jrochkind
Copy link
Contributor Author

OK, this gets even more important trying to address #3276, the LLM needs to know the size of what it's searching. So will bring this in now.

@jrochkind jrochkind changed the title Add real counts to Oral History AI instructions, by extracting counting/caching code Have OH LLM answers be more aware of scope limitations, with instructions and clearer access to OH counts Feb 5, 2026
…more flexibility, they were feeling a bit robotic
@jrochkind
Copy link
Contributor Author

OK just to throw everything and the kitchen sink at it, for stakeholder review, we added a programmatically generated (not LLM) prefix saying how many oral histories the chunks were from.

Good: Transparency, letting people know how it works, giving a sense of limits of comprehensiveness.

Bad: Busy? Confusing? over-complicated?

Let's see what stakeholders think. If we think we're targeting this only for internal users, the tendnecy might be to not worry about over-complicating, for better or worse?

Screenshot 2026-02-05 at 7 35 35 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant