Porting agent memory means porting behavior

A memory port can pass the easy demo while changing the thing people actually rely on: what gets remembered, what gets merged, and what comes back later.

mem0-rs, my Rust port of mem0, starts from that constraint. The operational goal is straightforward: keep mem0’s behavior while removing the costs that make Python awkward in dense agent systems: interpreter overhead, high resident memory, a heavy dependency tree, and a deployment surface that includes a Python environment.

In the equal-workload benchmark from the repo, with the network removed on both sides, mem0-rs is roughly 3 to 3.5 times faster per operation than Python mem0 v2.0.4. On the 2000-add / 500-search run, add latency was 18.7 us/op in Rust versus 64.8 us/op in Python, search was 6.1 ms/op versus 18.4 ms/op, and peak RSS was 15.4 MB versus 100.9 MB. On the smaller run, peak RSS was 9.5 MB versus 96.9 MB.

Those numbers matter because a memory service tends to sit on the hot path around agent turns. The LLM and embedding calls are still the shared unavoidable cost, but everything around them gets cheaper: message parsing, scope/filter construction, md5 hashing, BM25 lemmatization, SQLite history writes, payload construction, and local ranking.

The engineering constraint is less forgiving than a normal rewrite. Agent memory is not compatible because the endpoint names match. It is compatible only if the behavior stays the same.

The compatibility surface is not the API

For long-term memory, the public methods are the smallest part of the contract:

add
search
get
update
delete
history
reset

The meaningful surface sits underneath those calls:

extraction prompts
short-term message context
existing-memory retrieval before extraction
deduplication hashes
semantic search
BM25 scoring
entity boosts
metadata filters
history rows
error behavior

If any of those drift, two implementations can both expose add and search while behaving differently in production. One remembers a preference. The other drops it. One deduplicates a repeated fact. The other stores it twice. One returns the scoped memory. The other leaks a neighbor because a filter was interpreted differently.

So mem0-rs treats the behavioral pipeline as the port.

What gets ported

On an inferred add, mem0-rs follows the same broad sequence as Python mem0:

load recent scoped messages
retrieve related existing memories
build the additive extraction prompt
call the configured LLM
parse returned facts
deduplicate by md5
embed and persist new memories
write history rows
update the recent-message buffer

On search, it combines semantic retrieval, BM25 keyword scoring over a lemmatized text field, optional entity boosts, metadata filters, thresholding, and result truncation.

Those choices sound mundane until you change one. A different prompt changes what gets extracted. Different BM25 normalization changes result order. A different dedup rule changes memory growth and future context. A different history order changes auditability.

For an agent, wrong memory is worse than no memory in a surprisingly direct way. It becomes false context in the next turn.

Prompts are source code here

mem0-rs keeps the upstream prompt constants byte-identical. A small verifier compares the Rust constants against Python mem0:

ADDITIVE_EXTRACTION_PROMPT
DEFAULT_UPDATE_MEMORY_PROMPT
AGENT_CONTEXT_SUFFIX
PROCEDURAL_MEMORY_SYSTEM_PROMPT
MEMORY_ANSWER_PROMPT

That may look like a narrow detail, but in this system the extraction prompt is one of the main determinants of memory quality. If the Rust port rewrites it “a little better,” the port is no longer testing the same behavior.

That pattern carries into agent infrastructure more generally. Prompts that decide durable state are not copy. They are part of the program.

Deterministic parity beats vibes

The parity harness removes the model as a source of randomness. Both implementations are driven through the same scenario script using deterministic embeddings and scripted LLM output. That lets the comparison catch porting bugs rather than provider variation.

The checks cover the pieces that matter for compatibility:

storage results
deduplication
ranked search output
metadata filters
update and history events
delete behavior
prompt fidelity
scoring math
text parsing fallbacks

The speed numbers only matter if the port preserves what memory means. A faster memory system that remembers different facts is not a faster mem0; it is a different product.

The one deliberate divergence

There is one important difference in mem0-rs: the entity layer is dependency-free. Python mem0 can use spaCy for richer entity extraction when spaCy is installed. mem0-rs implements an always-on proper-name and quoted-text subset.

The trade is intentional. The entity signal is best-effort and secondary; semantic and BM25 ranking still carry the main retrieval behavior. The Rust port avoids pulling a large NLP runtime into the default path, but the limitation is documented rather than hidden.

That kind of divergence is acceptable in a port when it is narrow, named, and attached to the part of the ranking pipeline it can affect.

Memory is a write path

For agent systems, memory is not a cache. It is a write path into future model context.

That changes the standard for a port. Matching request shapes is not enough. Matching behavior means preserving the prompts, scoring, storage semantics, filters, and history that determine what the agent will believe later.

mem0-rs is built around that line. The Rust implementation is interesting only if the memory remains recognizably the same memory.

Chris Chabot · June 2026

technical blog