-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Learning Objective
Bucket: Architecture
Focus: Identifying
Source Paper: Eliciting Behaviors in Multi-Turn Conversations
Published: 2025-12-29
Date: 2025-12-31
Core Research Question from Paper:
Identifying specific and often complex behaviors from large language models (LLMs) in conversational settings is crucial for their evaluation?
Active Recall (NO NOTES)
1. Definition + Boundary
Identifying is _; it is not _.
(Define the concept from the paper in your own words without looking at the source)
2. Failure Statement
The system/approach fails when _ because _.
(What are the known limitations or failure modes discussed in the paper?)
3. Mental Model
(Reconstruct the system architecture, data flow, or conceptual framework from memory)
(Include: components, interactions, feedback loops, uncertainty points)
Option 1: Mermaid Diagram
graph TD
A[Component A] --> B[Component B]
B --> C[Component C]
Option 2: ASCII Diagram
[Input] --> [Process] --> [Output]
^ |
| v
[Feedback]
4. Mechanism (Causal Chain)
(Write 3–5 linked causal statements explaining why this approach works)
- Input/Trigger →
- Process/Transform →
- Intermediate Effect →
- Feedback/Constraint →
- Output/Result
5. Constraints & Trade-offs
-
Computational Constraints:
-
Architectural Constraints:
-
Alignment/Safety Constraints:
-
Chosen trade-off and justification:
6. Transfer Test
Scenario: How would this approach perform in:
- Different modality (code → images, text → audio)?
- Different scale (10x parameters, 100x data)?
- Different domain (medical, legal, scientific)?
Prediction:
Failure hypothesis:
Self-Assessment (Rubric)
| Dimension | Score (0–4) | Notes |
|---|---|---|
| Conceptual Clarity | ||
| Mental Model Integrity | ||
| Causal Understanding | ||
| Failure Awareness | ||
| Trade-off Judgment | ||
| Transfer Ability | ||
| Calibration & Honesty |
Initial Confidence (0–100%): 50
Falsification Plan
Experiment design:
(One experiment or eval that could prove the paper's claims wrong or reveal hidden assumptions)
Expected result if correct:
Expected result if wrong:
Research Context
Related work mentioned in paper:
-
Open questions from the paper:
Carry-Forward Insight
(One sentence for Future Me about what matters most from this concept)
Delayed Recall (Fill 24-72 hours later)
- What did I forget?
- What was oversimplified?
- What was wrong?
- What surprised me when I re-read?
Completion Checklist
- Explained aloud without notes
- Identified ≥1 real failure mode from the paper
- Made a falsifiable claim about the approach
- Drew architecture/flow from memory
- Scored honestly
- Linked to ≥1 related paper or technique
Confidence Delta Reflection (Fill After Review)
- Initial confidence: 50%
- Reviewer signal (over / under / calibrated):
- My assessment:
- What I will adjust next time:
- Calibration error: ±____%
Implementation Notes (Optional)
Code experiment to try:
// Minimal reproduction or test of the core mechanismEval to run:
- Dataset:
- Metric:
- Baseline: