Benchmark design

Provenance-Bounded Evidence Packets

Research on evidence graph packets that help small models make more stable, better-grounded decisions under the same token budget.

Thesis

For sub-10B models, the decisive question is often not more context, but better evidence discipline: source-bounded packets, valid provenance, and stable decision transfer.

Why it matters now

Organizations want smaller, cheaper, local, or private models to support decisions, but weak grounding can create unsupported claims and brittle outputs.

Evidence surface

  • PB-EGP scope defines provenance-bounded evidence graph packets against top-k text, summaries, graph pruning, and context compression.
  • Support-Transfer Validation work advanced through adjudication, fallback packaging, release manifests, and semantic validation.
  • Pilot design focuses on public SciFact and FEVEROUS records with unsupported-claim, provenance-validity, correctness, and stability metrics.

Validation path

  1. Public records
  2. Packet schema
  3. Audit rubric
  4. Decision stability

Current outputs

Research charter, schemas, pilot fixtures, validation gates, and manuscript strategy.

Scope control

Benchmark rows, gold files, prompts, packet internals, and raw source text stay in the execution repo until a deliberate public release channel is chosen.

Questions

  • What evidence structure helps a small model stay grounded under a fixed token budget?
  • How should provenance validity be measured separately from answer correctness?
  • When is a fallback-scope research package ready for public artifact release?