← Back to ORCAI

ORCAI Public Notes

Short research notes we want to explore in public.

Context-only reasoning in smaller models

Added 2026-04-27

Core concern

Research direction

Key question

Can post-training (SFT/RL) teach smaller models a policy to only use context, avoid unsupported conclusions, and say “insufficient evidence” when needed?

Hypothesis

We may be able to distill “reasoning code” from larger models into smaller ones via supervision, while explicitly constraining factual grounding to context. Open question: how separable are reasoning skills from language and memory skills?

Next lead

We already have signals that post-training can induce “thinking” behavior in SLMs. Next step is extending that toward faithful, context-grounded reasoning with explicit objectives.

Build a harness for in-context reasoning with efficient SLMs

Added 2026-04-27

We want to start by building a practical harness that pushes small language models to reason from context, use tokens efficiently, and abstain when the context does not support a conclusion. We can prototype this openly at orcai.eu/chat.

Reading links

Updated 2026-04-27