Benchmark design
Provenance-Bounded Evidence Packets
Research on evidence graph packets that help small models make more stable, better-grounded decisions under the same token budget.
Thesis
For sub-10B models, the decisive question is often not more context, but better evidence discipline: source-bounded packets, valid provenance, and stable decision transfer.
Why it matters now
Organizations want smaller, cheaper, local, or private models to support decisions, but weak grounding can create unsupported claims and brittle outputs.
Evidence surface
- PB-EGP scope defines provenance-bounded evidence graph packets against top-k text, summaries, graph pruning, and context compression.
- Support-Transfer Validation work advanced through adjudication, fallback packaging, release manifests, and semantic validation.
- Pilot design focuses on public SciFact and FEVEROUS records with unsupported-claim, provenance-validity, correctness, and stability metrics.
Validation path
- Public records
- Packet schema
- Audit rubric
- Decision stability
Current outputs
Research charter, schemas, pilot fixtures, validation gates, and manuscript strategy.
Scope control
Benchmark rows, gold files, prompts, packet internals, and raw source text stay in the execution repo until a deliberate public release channel is chosen.
Questions
- What evidence structure helps a small model stay grounded under a fixed token budget?
- How should provenance validity be measured separately from answer correctness?
- When is a fallback-scope research package ready for public artifact release?