Persistent Claude assistants with NanoClaw: how I set them up for individuals and small teams
A self-hosted Claude assistant on a Raspberry Pi, with compounding memory via a knowledge graph and synthesised wiki pages.
I keep running into the same problem when clients try to roll out a Claude-based assistant for their team: it forgets everything between sessions. Chat history is not memory. The default solution is RAG over a document store, but that retrieves text chunks, not facts. After the third client conversation that ended with “okay, but how do we make it actually remember things,” I started running a different setup on my own infrastructure.
This is what I run, why I think it’s the right shape for individuals and small teams, and how I’m now setting it up for clients.
The system is called NanoClaw. It’s open source, written by qwibitai, and the public design notes are in this gist. I am not the author. I run it, extend it for client setups, and write about it because almost nothing else in the self-hosted assistant space gets the memory model right.
The problem with stateless assistants
Three things are true about a default LLM assistant:
- It does not remember the last conversation unless you paste the transcript back in.
- Long context windows do not solve this. They then to paper over it for a while, but the agent forgets the middle of the document anyway.
- Standard RAG retrieves chunks of raw text. If your document says “I prefer dbt over Airflow for transformations,” the retrieved chunk is the surrounding sentence, not the discrete fact.
For a personal assistant or a team workspace, that pattern fails the same way every time. The agent answers well in a single session, then a week later forgets that you decided not to use Snowflake stages. You burn the same conversation twice. After a month, you are not getting compounding value out of the assistant. Instead, you’re paying for a stateless chatbot with extra steps.
The fix is not “more context window.” The fix is structured memory.
The architecture
NanoClaw separates raw input, structured facts, and human-readable summaries into three layers:
Raw sources → mnemon graph → wiki pages
(transcripts, (structured facts, (narrative syntheses,
articles, graph nodes, human-readable,
web clips) semantic retrieval) cross-referenced)
Layer 1 Raw sources. Speech transcripts in markdown, articles saved from URL ingest, mobile web clips through Obsidian Web Clipper. Append-only, never modified after storage.
Layer 2 mnemon, the knowledge graph. A SQLite-backed graph database where each entry is a self-contained fact: content, category, importance score, tags, timestamp, and graph edges to related entries. Queried semantically using local vector embeddings (Ollama with nomic-embed-text running on the Pi itself). Two stores: a global one shared across all groups, and a local one per group that only that group’s agent can write to.
Layer 3 Wiki pages. Synthesised markdown files compiled from mnemon facts. Not raw extracts, but full narrative pages organised into entities/, concepts/, and timelines/ subdirectories, with cross-references. The pattern follows Andrej Karpathy’s LLM Wiki idea, to extract structured knowledge from raw sources rather than indexing them whole.
Every agent invocation triggers a semantic recall against the graph using the user’s message as the query. Relevant facts surface automatically as a system reminder. The agent never has to decide to look something up, as the recall is free and reliable.
What’s running
The full stack on a Raspberry Pi 5 (aarch64):
| Component | Role |
|---|---|
| NanoClaw orchestrator (Node.js + TypeScript) | Message loop, container management, channel routing |
| Claude Agent SDK | Agent logic inside isolated Docker containers per group |
| Baileys | WhatsApp Web protocol, no business API needed |
| mnemon | Custom CLI knowledge graph tool |
Ollama + nomic-embed-text | Local vector embeddings for semantic recall |
| whisper.cpp | Local voice transcription for voice notes |
| OneCLI | Credential proxy, containers never see raw API keys |
| SQLite | Message store, group registry, task scheduler |
| systemd | Process management |
The whole thing fits on a Raspberry Pi 5 or any similar device with 8GB RAM. No cloud-hosted services required beyond the Anthropic API itself. Voice notes never leave the device. Document content never leaves the device. The graph and the wiki are on local disk.
Why this works for you
The immediate value is the loop closing on inputs that already exist:
- A WhatsApp voice note becomes a transcript, becomes a set of mnemon entries, becomes part of the wiki.
- An article clipped from a phone via Obsidian Web Clipper triggers an
inotifywaitwatcher, which kicks off the ingest pipeline, which extracts facts and updates the relevant wiki pages. - A scheduled task (morning briefing) runs through a bash pre-check first, it only wakes the agent if there is actually something to brief on. This keeps API costs low.
The result over time is an assistant that knows what you have read, what you have said in voice notes, and what you have decided. You can ask it “what did I say about the Snowflake migration last week?” and it answers from the graph, not from chat history.
The wiki layer is human-editable on purpose, exactly because LLM-extracted facts need supervision.
Why this works
The multi-group isolation is the thing that makes this team-ready:
- Each registered group (a WhatsApp chat, a Slack channel, a department) gets its own Docker container, filesystem, local mnemon store, and Claude session.
- Containers cannot read each other’s memory or messages. Source identity is verified by directory path, not by message content.
- A runaway agent in one group cannot affect others. Container lifetime is tied to conversation activity; they shut down after idle timeout.
- The credential proxy means the API key never sits inside a container, so a compromised group cannot exfiltrate it.
For a team of five to twenty people, this is enough to set up:
- A daily team briefing channel that summarises what was decided yesterday.
- A research channel where articles get clipped and turned into a shared wiki of “what we read this quarter.”
- A specific project channel with its own memory of decisions, blockers, and open questions, isolated from everything else.
- A leadership channel with its own private memory that the rest of the team’s agent cannot read.
I have set this up for a small team in Munich and the most useful side effect was unexpected: the wiki became a canonical source for “what did we actually decide” that survived someone going on holiday. The agent stopped being the interesting part. The wiki became the interesting part.
How I set this up for clients
When I set NanoClaw up for a client, the work is roughly two phases:
Phase 1 (week 1) install and tune. Provision the Pi or a small VPS, install NanoClaw, configure channels, set up the credential proxy, integrate with the team’s existing messaging. Start with the global mnemon empty and the wiki empty. Add one or two seed groups. Tune the bash pre-check scripts to keep API costs under control.
Phase 2 (week 2) fit and handover. Configure scheduled tasks (briefings, ingestion watchers). Run the team through how to use it: what the wiki is for, how to correct facts, how to add new groups. Document the maintenance ritual of about an hour a week of wiki review by someone trusted. Hand it over.
This fits inside the production-readiness audit format I already offer (€4,500 fixed for one week), or as a co-build engagement at €900/day if the client wants me to run it longer and customise channels and tasks for their workflow. I am the integrator, not the author of NanoClaw. That credit goes to qwibitai. What I sell is the setup, the tuning, the team training, and the knowledge of how to keep it running cheaply.
A few opinionated bits
Why a Raspberry Pi. Because privacy-sensitive data should not leave the network. The whole point of running embeddings and transcription locally is that the inputs stay on the device. The Pi 5 has enough headroom to run nomic-embed-text and whisper.cpp base in the same process budget as the orchestrator. For a small team, this is plenty. For a larger team, the same code runs on a small VPS or a Mac Mini.
Why Docker per group, not one process. Isolation. Five years of teaching across cultures will eventually convince you that one badly-behaved actor is normal, not an edge case. Containers per group means a runaway agent in one chat cannot reach another chat’s data. The cost is some overhead per active group; the benefit is a security model that does not depend on the agent being well-behaved.
Why iCloud + rsync to sync wiki pages to Obsidian. Because iOS git clients have been unreliable for as long as I have used them. iCloud is native to iOS, zero-config, and free. rsync from a Mac Mini bridge to the Pi is directional and battle-tested. This is a boring tools that work choice.
Why the wiki layer at all. Because LLM-extracted facts need supervision. The wiki gives a human a place to read, correct, and shape what the agent thinks it knows. A pure graph would be opaque; a pure wiki would not retrieve well. The split is doing real work.
What I’d do differently
Three things I have learned.
One: start the bash pre-check scripts on day one. It is tempting to defer cost optimisation, but a chatty agent with no pre-check will burn through API costs the first week. Spend an hour up front writing pre-checks for every scheduled task. The pre-check is allowed to be dumb (grep the inbox, count the file system entries), it just needs to gate whether the agent runs at all.
Two: version the wiki from the start. I did not put the wiki under git for the first two months. When I made a structural change to the synthesis prompt, I lost some context I could not easily reconstruct. The wiki is markdown files. Put it in git. Let the agent commit and review the diffs.
Three: set the recall budget low and let it grow. The first instinct is to inject a lot of context into every turn. The better default is to inject the top three to five facts, see if the agent’s answers improve, and only widen the recall budget if you observe gaps. Wide recall is expensive and reduces the precision of what surfaces.
Gists and the repo at github.com/qwibitai/nanoclaw are the source code. If you want me to set this up for your team or your personal workflow, that is what the contact form is for.
Eddie Beloiu
Freelance Data Platform Engineer · Munich