Knowledge / Internal people operations and HR teams
Eval harness for HR policy RAG assistants
Use AI to generate test questions, compare answer evidence, and explain source gaps. Keep employment-impacting decisions, eligibility judgments, and policy exceptions outside full automation.
Why this workflow matters
Internal HR assistants look easy until policies differ by location, employee type, effective date, or benefits provider. A useful eval harness checks whether the answer is sourced, current, permission-safe, and escalated when the assistant should not answer.
Inputs and outputs
Inputs
- Policy question set
- Source documents and effective dates
- User role and location metadata
- Expected citation and escalation rules
- Prior incorrect answers
Outputs
- Answer-quality report
- Missing-source list
- Freshness and permission failures
- Escalation examples
- Reviewer notes for policy owners
Current manual workflow
Start by modeling the work as it happens now.
- Gather real employee questions, policy-owner examples, and known edge cases.
- Attach the expected source, answer boundaries, effective dates, and escalation rules to each test case.
- Run the assistant for different employee roles, locations, and permission contexts.
- Score citations, freshness, refusal behavior, permission handling, and escalation.
- Send failures to the policy owner with the source text and exact assistant answer.
Where AI helps
Use models around the exception work.
- Generate realistic question variants from approved policy text.
- Compare the assistant answer against cited source passages.
- Cluster failures by stale source, missing permission, unsupported claim, or missing escalation.
- Draft policy-owner review notes from failing cases.
- Suggest where the knowledge base needs source cleanup or metadata.
System pattern
Keep deterministic checks in charge of the hard boundaries.
Architecture
- Represent each policy document with owner, effective date, employee population, location, and review cadence.
- Run retrieval only after role and permission filters are applied.
- Evaluate answers against source excerpts, required citations, and escalation rules.
- Use AI to summarize failures after deterministic checks identify citation, freshness, and permission defects.
- Route high-risk failures to HR, legal, or policy owners before the assistant is expanded.
Keep deterministic
- Permission checks before retrieval.
- Effective-date filtering.
- Jurisdiction and employee-type routing.
- Required escalation triggers.
- Audit logging for reviewed answers.
Do not fully automate
- Employment eligibility decisions.
- Disciplinary or performance decisions.
- Final interpretation of ambiguous policy language.
- Exceptions that affect pay, leave, benefits, or protected categories.
Evaluation and controls
A useful workflow design explains how to check the work.
Supported-answer rate
Answers cite the right source text for the user context.
Stale-source rate
No answer relies on a document outside its effective date.
Escalation recall
Sensitive or ambiguous questions route to a human owner.
Permission failure rate
No answer exposes policy content outside the user's access context.
HR systems
Permission prefilter
Retrieval is scoped before model generation starts.
People operations
Policy owner review
Failed cases map back to a named policy owner.
Legal or HR lead
Escalation boundary
Questions affecting pay, benefits, discipline, or eligibility can require human review.
Knowledge owner
Freshness cadence
Documents have effective dates and review dates before they are used.
Pilot checklist
Test the workflow before widening automation.
- Choose one low-volume HR policy area with a clear source owner.
- Create 50-100 questions across employee roles, locations, and common edge cases.
- Label expected source passages and escalation cases.
- Run the assistant with permission and effective-date metadata enabled.
- Review failures with HR and legal before broadening the assistant.
Synthetic example
An employee asks whether a leave policy applies to contractors in a specific location. The assistant finds a general employee handbook answer, but the eval harness flags a population mismatch and missing escalation. The fix is a deterministic employee-type filter plus an HR owner review path.
Sources and review notes
Source context matters when the workflow touches risk.
This page is not employment, privacy, or legal advice. HR assistants can affect employee rights and should be reviewed by qualified HR, privacy, and legal owners before use.
UK Information Commissioner's Office
Data-protection guidance for organizations using AI systems.
GOV.UK
Official guidance on responsible procurement and assurance of AI systems in HR and recruitment.
Related playbooks
Adjacent workflows to compare.
Model evaluation change control
A practical AI workflow for reviewing model, prompt, retrieval, and tool changes before they reach production.
Human review required
e-Fatura reconciliation
A Portugal-specific accounting workflow for comparing e-Fatura records, ERP data, supplier records, and accountant review queues.
Human review required
Workflow review
Have a similar workflow that needs controls and evals?
Share the role, market, source systems, work item, and current failure modes. The useful first step is usually a small eval or shadow review before any automation is trusted.