arxiv:2605.28969

Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization

Published on May 27

· Submitted by

Aarik Gulaya on Jun 1

Upvote

Authors:

Aarik Gulaya

Abstract

Representational accuracy measures how faithfully an AI system captures a person's interpretation through behavioral specifications, demonstrating improved predictive performance with reduced context costs while highlighting differences between interpretation-required and recall-required tasks.

AI-generated summary

If an AI agent makes decisions on a person's behalf, those decisions must align with its user. We introduce representational accuracy to measure how faithfully a system captures a person's interpretation. An interpretive layer is operationalized as a Behavioral Specification. Our reference implementation aggressively compresses a person's data into interpretive patterns, served as context to a language model. We evaluate the Specification on a prototype benchmark of held-out behavioral predictions scored by a calibrated 5-judge LLM panel. We test it independently and in composition with a range of context conditions: full raw corpus, full extracted facts, and four commercial memory systems (Mem0, Letta, Supermemory, Zep). Across 14 public-domain autobiographical corpora, the Specification lifts representational accuracy in aggregate and nearly eliminates model hedging. It recovers most of what the raw corpus delivers, at ~25x less context cost. The Specification lifts subjects toward a common predictive level regardless of pretraining baseline; the lift in absolute points is therefore largest where the baseline is lowest, suggesting the population of relevance is anyone not adequately represented in pretraining. Lift is greatest on interpretation-required questions, where providing an interpretive layer enables model behavior that extracted facts or raw corpus do not. Conversely, on recall-required questions, this layer can interfere rather than help. We conclude that representational accuracy is distinct from recall and that human-AI alignment is dependent on how accurately the user is represented. Representational accuracy makes that alignment testable.

View arXiv page View PDF Project page GitHub 1 Add to collection

Community

agulaya24

Paper author Paper submitter about 3 hours ago

We have been optimizing memory systems for recall, and treating an accurate representation of the user as a separate alignment problem. What a system recalls is dictated by the reasoning frame it applies. There are limited approaches to measure how accurately those reasoning frames represent the user an AI is acting on behalf of. This paper proposes and tests a prototype benchmark to define and measure this representational accuracy dimension.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.28969

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.28969 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.28969 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.28969 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.