Evaluating LLM Answers with Citations: Practical Signals
For document QA systems, “looks good” isn’t enough. You need measurable signals that answers are grounded in the source. Here are the checks I use in practice.
Signals I Track
- Citation Presence: answer must include at least one source anchor
- Anchor Validity: anchors resolve to real pages/tables/sections
- Overlap Score: lexical overlap between cited chunk and answer
- Faithfulness Heuristics: penalize claims outside retrieved context
Workflow
- Retrieve top‑k chunks and generate answer with citation placeholders
- Post‑validate citations; drop or relabel low‑confidence answers
- Log metrics per query type; sample for human review
Grounded answers build trust. These lightweight checks catch failure modes early without heavy infra.