Eddie Beloiu

Persistent Claude assistants with NanoClaw: how I set them up for individuals and small teams

Tue, 28 Apr 2026 00:00:00 GMT

I keep running into the same problem when clients try to roll out a Claude-based assistant for their team: it forgets everything between sessions. Chat history is not memory. The default solution is RAG over a document store, but that retrieves text chunks, not facts. After the third client conversation that ended with "okay, but how do we make it actually remember things," I started running a different setup on my own infrastructure.

This is what I run, why I think it's the right shape for individuals and small teams, and how I'm now setting it up for clients.

The system is called NanoClaw. It's open source, written by qwibitai, and the public design notes are in this gist. I am not the author. I run it, extend it for client setups, and write about it because almost nothing else in the self-hosted assistant space gets the memory model right.

The problem with stateless assistants

Three things are true about a default LLM assistant:

It does not remember the last conversation unless you paste the transcript back in.
Long context windows do not solve this. They then to paper over it for a while, but the agent forgets the middle of the document anyway.
Standard RAG retrieves chunks of raw text. If your document says "I prefer dbt over Airflow for transformations," the retrieved chunk is the surrounding sentence, not the discrete fact.

For a personal assistant or a team workspace, that pattern fails the same way every time. The agent answers well in a single session, then a week later forgets that you decided not to use Snowflake stages. You burn the same conversation twice. After a month, you are not getting compounding value out of the assistant. Instead, you're paying for a stateless chatbot with extra steps.

The fix is not "more context window." The fix is structured memory.

The architecture

NanoClaw separates raw input, structured facts, and human-readable summaries into three layers:

Raw sources           →    mnemon graph          →    wiki pages
(transcripts,              (structured facts,         (narrative syntheses,
 articles,                  graph nodes,               human-readable,
 web clips)                 semantic retrieval)        cross-referenced)

Layer 1 Raw sources. Speech transcripts in markdown, articles saved from URL ingest, mobile web clips through Obsidian Web Clipper. Append-only, never modified after storage.

Layer 2 mnemon, the knowledge graph. A SQLite-backed graph database where each entry is a self-contained fact: content, category, importance score, tags, timestamp, and graph edges to related entries. Queried semantically using local vector embeddings (Ollama with nomic-embed-text running on the Pi itself). Two stores: a global one shared across all groups, and a local one per group that only that group's agent can write to.

Layer 3 Wiki pages. Synthesised markdown files compiled from mnemon facts. Not raw extracts, but full narrative pages organised into entities/, concepts/, and timelines/ subdirectories, with cross-references. The pattern follows Andrej Karpathy's LLM Wiki idea, to extract structured knowledge from raw sources rather than indexing them whole.

Every agent invocation triggers a semantic recall against the graph using the user's message as the query. Relevant facts surface automatically as a system reminder. The agent never has to decide to look something up, as the recall is free and reliable.

What's running

The full stack on a Raspberry Pi 5 (aarch64):

Component	Role
NanoClaw orchestrator (Node.js + TypeScript)	Message loop, container management, channel routing
Claude Agent SDK	Agent logic inside isolated Docker containers per group
Baileys	WhatsApp Web protocol, no business API needed
mnemon	Custom CLI knowledge graph tool
Ollama + `nomic-embed-text`	Local vector embeddings for semantic recall
whisper.cpp	Local voice transcription for voice notes
OneCLI	Credential proxy, containers never see raw API keys
SQLite	Message store, group registry, task scheduler
systemd	Process management

The whole thing fits on a Raspberry Pi 5 or any similar device with 8GB RAM. No cloud-hosted services required beyond the Anthropic API itself. Voice notes never leave the device. Document content never leaves the device. The graph and the wiki are on local disk.

Why this works for you

The immediate value is the loop closing on inputs that already exist:

A WhatsApp voice note becomes a transcript, becomes a set of mnemon entries, becomes part of the wiki.
An article clipped from a phone via Obsidian Web Clipper triggers an inotifywait watcher, which kicks off the ingest pipeline, which extracts facts and updates the relevant wiki pages.
A scheduled task (morning briefing) runs through a bash pre-check first, it only wakes the agent if there is actually something to brief on. This keeps API costs low.

The result over time is an assistant that knows what you have read, what you have said in voice notes, and what you have decided. You can ask it "what did I say about the Snowflake migration last week?" and it answers from the graph, not from chat history.

The wiki layer is human-editable on purpose, exactly because LLM-extracted facts need supervision.

Why this works

The multi-group isolation is the thing that makes this team-ready:

Each registered group (a WhatsApp chat, a Slack channel, a department) gets its own Docker container, filesystem, local mnemon store, and Claude session.
Containers cannot read each other's memory or messages. Source identity is verified by directory path, not by message content.
A runaway agent in one group cannot affect others. Container lifetime is tied to conversation activity; they shut down after idle timeout.
The credential proxy means the API key never sits inside a container, so a compromised group cannot exfiltrate it.

For a team of five to twenty people, this is enough to set up:

A daily team briefing channel that summarises what was decided yesterday.
A research channel where articles get clipped and turned into a shared wiki of "what we read this quarter."
A specific project channel with its own memory of decisions, blockers, and open questions, isolated from everything else.
A leadership channel with its own private memory that the rest of the team's agent cannot read.

I have set this up for a small team in Munich and the most useful side effect was unexpected: the wiki became a canonical source for "what did we actually decide" that survived someone going on holiday. The agent stopped being the interesting part. The wiki became the interesting part.

How I set this up for clients

When I set NanoClaw up for a client, the work is roughly two phases:

Phase 1 (week 1) install and tune. Provision the Pi or a small VPS, install NanoClaw, configure channels, set up the credential proxy, integrate with the team's existing messaging. Start with the global mnemon empty and the wiki empty. Add one or two seed groups. Tune the bash pre-check scripts to keep API costs under control.

Phase 2 (week 2) fit and handover. Configure scheduled tasks (briefings, ingestion watchers). Run the team through how to use it: what the wiki is for, how to correct facts, how to add new groups. Document the maintenance ritual of about an hour a week of wiki review by someone trusted. Hand it over.

This fits inside the production-readiness audit format I already offer (€4,500 fixed for one week), or as a co-build engagement at €900/day if the client wants me to run it longer and customise channels and tasks for their workflow. I am the integrator, not the author of NanoClaw. That credit goes to qwibitai. What I sell is the setup, the tuning, the team training, and the knowledge of how to keep it running cheaply.

A few opinionated bits

Why a Raspberry Pi. Because privacy-sensitive data should not leave the network. The whole point of running embeddings and transcription locally is that the inputs stay on the device. The Pi 5 has enough headroom to run nomic-embed-text and whisper.cpp base in the same process budget as the orchestrator. For a small team, this is plenty. For a larger team, the same code runs on a small VPS or a Mac Mini.

Why Docker per group, not one process. Isolation. Five years of teaching across cultures will eventually convince you that one badly-behaved actor is normal, not an edge case. Containers per group means a runaway agent in one chat cannot reach another chat's data. The cost is some overhead per active group; the benefit is a security model that does not depend on the agent being well-behaved.

Why iCloud + rsync to sync wiki pages to Obsidian. Because iOS git clients have been unreliable for as long as I have used them. iCloud is native to iOS, zero-config, and free. rsync from a Mac Mini bridge to the Pi is directional and battle-tested. This is a boring tools that work choice.

Why the wiki layer at all. Because LLM-extracted facts need supervision. The wiki gives a human a place to read, correct, and shape what the agent thinks it knows. A pure graph would be opaque; a pure wiki would not retrieve well. The split is doing real work.

What I'd do differently

Three things I have learned.

One: start the bash pre-check scripts on day one. It is tempting to defer cost optimisation, but a chatty agent with no pre-check will burn through API costs the first week. Spend an hour up front writing pre-checks for every scheduled task. The pre-check is allowed to be dumb (grep the inbox, count the file system entries), it just needs to gate whether the agent runs at all.

Two: version the wiki from the start. I did not put the wiki under git for the first two months. When I made a structural change to the synthesis prompt, I lost some context I could not easily reconstruct. The wiki is markdown files. Put it in git. Let the agent commit and review the diffs.

Three: set the recall budget low and let it grow. The first instinct is to inject a lot of context into every turn. The better default is to inject the top three to five facts, see if the agent's answers improve, and only widen the recall budget if you observe gaps. Wide recall is expensive and reduces the precision of what surfaces.

Gists and the repo at github.com/qwibitai/nanoclaw are the source code. If you want me to set this up for your team or your personal workflow, that is what the contact form is for.

3 things that break when you move Databricks notebooks to dbt

Mon, 27 Apr 2026 00:00:00 GMT

I've helped multiple teams move their data transformations from Databricks notebooks to dbt. The case for doing it is usually clear: version control, proper testing, reproducible builds. But the migration itself keeps hitting the same three snags.

1. Spark SQL syntax differences

Databricks notebooks accumulate Spark-specific SQL over time. Some of it looks like standard SQL and isn't. When you paste it into a dbt model, you get errors that aren't always obvious to diagnose.

The most common examples:

current_date() instead of CURRENT_DATE
Date arithmetic that uses INTERVAL 30 DAYS (Spark) vs INTERVAL '30' DAY (ANSI SQL)
Column names with special characters handled differently

The fix is a translation pass before migration. Make a reference sheet of your Databricks functions and their standard SQL equivalents, and go through the notebooks systematically:

-- Databricks
SELECT * FROM my_table WHERE date_col > current_date() - INTERVAL 30 DAYS

-- Standard SQL (dbt)
SELECT * FROM my_table WHERE date_col > CURRENT_DATE - INTERVAL '30' DAY

2. Missing schema references

Databricks resolves table references implicitly against the cluster's default database. dbt doesn't. When you move the same query, it fails because there's no schema prefix.

3. State management and incremental logic

Databricks notebooks sometimes rely on Spark's caching between cells, or on checkpoint files on DBFS. Neither of these exists in dbt. Incremental patterns that look fine in a notebook need to be rethought.

Specific failure modes:

Assuming intermediate data is still cached when the next cell runs
Complex incremental logic built around Spark DataFrames rather than SQL
Late-arriving data that the original notebook handled manually

dbt's incremental materialization handles most of these cases cleanly:

{{ config(materialized='incremental') }}

SELECT
  order_id,
  customer_id,
  order_total,
  order_date
FROM {{ ref('raw_orders') }}

{% if is_incremental() %}
  WHERE order_date > (SELECT MAX(order_date) FROM {{ this }})
{% endif %}

Migration approach

Don't try to convert everything at once. I usually start with the models that run most often; they have the most test coverage and the clearest business logic. Convert simple SELECT statements first, then work up to the incremental models once you have confidence in the simpler ones.

Keep a migration log. Which notebooks map to which dbt models, what was changed during translation, what tests were added. That document pays for itself the first time someone asks why a model looks different from what's in the old notebook.

What I'd do differently next time: set up the dbt schema configuration before touching any SQL. Most of the schema-related failures I've seen happened because people started converting models before the environments were properly configured.

Data warehouse modernization: from SQL Server to Snowflake for a German company

Mon, 27 Apr 2026 00:00:00 GMT

The situation

A mid-sized German company had 250 business intelligence users across claims, underwriting, and actuarial teams, all running off a SQL Server 2012 data warehouse. The platform had been stretched past its design limits. Reports ran overnight. The concurrent user cap was 50. Above that, performance fell off. A 50TB dataset was growing 20% per year. And the system had no meaningful data lineage, which was becoming a GDPR problem the compliance team kept flagging.

Specifically:

Underwriters were making decisions on yesterday's data because nothing ran in real time
No audit trail or lineage for personal data
Pipelines needed frequent manual intervention to stay running
Hard cap of 50 concurrent users before queries started queuing

What I did

Assessment (weeks 1–2)

Started by cataloguing 200+ tables and mapping PII fields to understand the GDPR exposure. Mapped 50+ existing reports and dashboards before touching anything. The architecture decision to split the data into hot (Snowflake) and cold (object storage) data came out of this work.

Foundation (weeks 3–5)

Configured Snowflake with multi-cluster virtual warehouses and auto-scaling. Used Data Vault 2.0 to model the claims and policy data, which gave the audit-friendly structure GDPR requires. Row-level security and automated audit logging went in at this stage.

Migration (weeks 6–11)

Used Snowflake's native SQL Server connectors for incremental loading. Built the transformation layer in dbt. Added data validation checks at each pipeline stage, not just at the end.

Handover (weeks 12–13)

Migrated SQL Server Reporting Services dashboards to Tableau. Two workshop sessions for BI users on the self-service features. A support channel during the transition period.

Results

Reporting latency dropped from 24 hours to 2 hours. Complex actuarial queries that previously ran overnight now complete in minutes. The concurrent user limit went from 50 to 250. The queuing problem is gone.

Infrastructure costs came down about 30% by moving from SQL Server's fixed licensing to Snowflake's usage-based pricing. Maintenance overhead dropped significantly once the manual data loading tasks were automated.

Full GDPR compliance: data lineage, audit trails, role-based access with encryption at rest and in transit. A DPA audit that had been a recurring concern passed without issues.

What I'd do differently

Data profiling at the start surfaced schema inconsistencies I underestimated. I'd build more time into the assessment phase for that. I'd also start user training earlier.

Client details anonymized. Metrics are from the actual project.

Lakehouse migration: from Databricks to Snowflake for a European media company

Mon, 27 Apr 2026 00:00:00 GMT

The situation

A European media company had built their analytics platform on Databricks. It had worked during a period of rapid growth, but by the time I came in the monthly cost was €50K+ and the platform was ungoverned. The cost was high because clusters ran around the clock instead of spinning up on demand. There were 50+ data scientists working off 500+ tables spread across Delta Lake, Parquet, and JSON formats, with no central catalog and no clear lineage.

The harder problem was trust. When quarterly reports took two hours and the numbers sometimes differed depending on who ran them, teams started routing around the platform instead of using it. That's usually where things stand when I get called in.

What I did

Assessment (weeks 1–4)

Profiled all 200+ notebooks to understand compute vs. storage usage patterns. Built a cost model comparing the two platforms under realistic usage assumptions. Designed a hybrid architecture: Snowflake as the compute layer over the existing S3 storage, which avoided duplicating data and reduced migration risk.

Foundation and tooling (weeks 5–10)

Set up multi-cluster Snowflake warehouses with separate resource pools for ETL, analytics, and reporting workloads. Built migration pipelines with dbt and Airflow. Added data validation and reconciliation at each step. Most migrations fail here because teams assume data arrived correctly without checking.

Incremental migration (weeks 11–18)

Started with high-traffic datasets: user behavior and content metadata. Both platforms ran in parallel through the transition. I decommissioned Databricks notebooks gradually as Snowflake equivalents proved out, rather than doing a hard cutover.

Optimisation (weeks 19–22)

Right-sized compute based on actual usage patterns. Added materialized views for common reporting aggregations. Set up cost monitoring and alerts.

Results

Monthly infrastructure dropped from €50K to around €30K (a 40% reduction). Most of the savings came from moving away from always-on clusters to Snowflake's per-second billing.

Query performance roughly tripled on typical analytical workloads. Quarterly reports went from two hours to fifteen minutes. Concurrent analyst capacity doubled.

Data lineage is now fully tracked. There's a central catalog. The self-service analytics that had been the original promise of the Databricks setup started getting used once people could find data and trust it.

What I'd do differently

I spent more time on the cost model at the start than I needed to. The architecture decision was right, but I could have reached it faster and spent that time on better tooling for schema evolution edge cases. There were Delta Lake schema changes mid-migration that required manual intervention I hadn't fully planned for.

Client details anonymized. Metrics are from the actual project.

Three failure modes I've seen in enterprise lakehouses (and the cheap fixes)

Wed, 22 Apr 2026 00:00:00 GMT

I've reviewed enough enterprise lakehouse implementations to have opinions about where they tend to go wrong. Three patterns come up more than anything else. Each is fixable if you catch it early.

Failure mode 1: the swamp lakehouse

Symptom: everything goes into the lakehouse with no distinction between raw, cleaned, and curated data. Users can't trust anything because they don't know what state it's in.

Root cause: no zoning strategy. Teams treat the lakehouse as a dumping ground rather than implementing the medallion architecture properly.

The fix is clear zone boundaries with automated validation:

Bronze: raw ingestion (immutable, exactly as received)
Silver: cleaned and validated (business rules applied, basic quality checks)
Gold: business-ready (dimensionally modeled, performance optimized)

Add zone enforcement at the pipeline level:

-- Example: prevent silver-to-gold promotion without quality checks
CREATE OR REPLACE TRUSTED SILVER_TO_GOLD_CHECK AS
CASE
  WHEN (SELECT failed_checks FROM silver_quality_metrics WHERE table_name = CURRENT_TABLE()) = 0
  THEN 'ALLOW'
  ELSE BLOCK
END;

Failure mode 2: the performance mirage

Symptom: works fine in development with 10GB datasets, crawls in production with 1TB. Costs explode because patterns that look reasonable at small scale don't hold up.

Root cause: development uses unrealistic data volumes. Teams optimise for developer convenience rather than production economics.

The fix: production-realistic testing from the start.

Clone the production schema with 1% of real data (statistically valid sample)
Automate performance regression testing in CI/CD
Set cost monitoring alerts at 80% of budget
Require performance justification for any new pipeline

One thing worth remembering: a pipeline that's 2x slower but 10x cheaper to run is often the better business choice.

Failure mode 3: the metadata ghost town

Symptom: data exists but nobody knows what it means, where it came from, or how to use it correctly. This leads to misinterpretation and wrong business decisions.

Root cause: metadata management treated as an afterthought. No investment in documentation, lineage, or a semantic layer.

Lightweight but effective fixes:

Automatically capture technical metadata (schema, size, update frequency)
Require business owners to add semantic descriptions during data onboarding
Use open-source tools like Amundsen or DataHub for discovery
Implement simple data contract validation at pipeline boundaries

The most effective technique I've seen: a "data passport" that travels with each dataset, updated automatically by pipelines and manually enriched by data owners.

Spotting them early

The implementation work is mostly straightforward once you know what you're looking for. What takes time to develop is pattern recognition, catching these before they cause damage rather than remediating after the fact. When I start a new engagement, I can usually identify the problem within the first week from three tells:

Missing zone enforcement in pipeline orchestration
Performance tests that only run on tiny datasets
Zero business documentation on core datasets

Catching these early typically saves three to six months of remediation time. The loss of executive trust that follows a failed data platform launch is much harder to recover from than fixing the architecture before anyone notices.

What I'd do differently next time: I would create a standardised lakehouse health check that runs in the first two weeks of any engagement, providing a clear, actionable scorecard before significant resources are committed.

GDPR-aligned RAG: a checklist that survived three audits

Mon, 20 Apr 2026 00:00:00 GMT

Most RAG tutorials stop at "make it work." Chunk some documents, embed them, store in a vector database, retrieve context for your LLM. That's fine for a prototype. In production, GDPR creates constraints that typical RAG architectures don't account for.

After helping three different DACH enterprises get their RAG implementations through GDPR audits, I've settled on a checklist that covers the gaps auditors actually flag.

Where GDPR and RAG conflict

The tension is specific. GDPR requires data minimization, purpose limitation, storage limitation, data subject rights (access, rectify, erase), and accountability. Standard RAG implementations violate at least three of these by default: the vector database usually contains more than you need, deletion is an afterthought, and there's no audit trail for what got retrieved.

My GDPR-aligned RAG checklist

1. Data inventory and classification

Before building anything, know what you're working with:

Document all data sources going into your RAG system
Classify data by sensitivity (PII, special categories, business confidential)
Identify which data requires GDPR protections
Document legal basis for processing each data type

2. Purpose-limited system design

Your RAG system should have a clearly defined, documented purpose:

Write a specific purpose statement for your RAG system
Ensure all data processing aligns with this purpose
Prevent function creep through technical controls
Regularly review purpose alignment

3. Data minimization in chunking

How you prepare data for embedding matters:

Chunk documents at semantic boundaries (not fixed sizes)
Remove or pseudonymize unnecessary PII during preprocessing
Consider summary-based approaches for sensitive documents
Log what data was excluded and why

4. Secure embedding and storage

Where and how you store vectors has privacy implications:

Use encrypted vector databases (at rest and in transit)
Implement access controls and audit logging
Consider on-prem or private cloud for highly sensitive data
Regularly test encryption key rotation procedures

5. Retrieval privacy controls

What gets retrieved affects what the LLM sees:

Implement relevance thresholds to limit unnecessary data exposure
Consider hybrid search (keyword + vector) for precision
Log retrievals for audit purposes (anonymized)
Implement retrieval rate limiting to prevent scraping

6. Generation guardrails

The LLM output needs protection too:

Implement output filtering for accidental PII disclosure
Use prompt engineering to discourage PII generation
Consider confidence scoring for responses
Provide clear attribution to source documents

7. Data subject rights implementation

Make it possible to honor GDPR rights:

Build deletion pipelines that remove data from all systems (including backups)
Implement data portability exports in standard formats
Create access request workflows that show what data is in your RAG
Design rectification processes for inaccurate data

8. Documentation and accountability

Prove your compliance:

Maintain a data flow diagram showing PII through your RAG system
Document all technical and organizational measures
Conduct regular DPIAs (Data Protection Impact Assessments)
Train your team on GDPR responsibilities for AI systems

Technical implementation notes

Vector database choices

PGVector with pgcrypto: good balance of features and security if you're already on PostgreSQL
Weaviate: strong security features including RBAC and encryption
Milvus: enterprise-grade with good security controls
Avoid public vector databases with unclear data handling practices

Preprocessing pipeline example

# Pseudocode for GDPR-aware preprocessing
def prepare_document_for_rag(doc):
    # 1. Identify document type and sensitivity
    sensitivity = classify_document_sensitivity(doc)

    # 2. Apply appropriate transformations
    if sensitivity == "high":
        # Remove or pseudonymize direct identifiers
        doc = remove_pii(doc)
        doc = pseudonymize_identifiers(doc)

    # 3. Create semantic chunks
    chunks = semantic_chunk(doc, max_tokens=512)

    # 4. Add metadata for auditing
    for chunk in chunks:
        chunk.metadata.update({
            "source_document_id": doc.id,
            "chunk_index": chunk.index,
            "processing_timestamp": datetime.utcnow(),
            "sensitivity_level": sensitivity
        })

    return chunks

Audit trail essentials

Every RAG interaction should leave a trace:

When was the data accessed?
What specific chunks were retrieved?
What was the user query?
What model was used for generation?
What was the final output (hashed for storage)?

What auditors actually flagged

Things auditors liked: clear data flow diagrams showing PII handling, automated deletion pipelines tested quarterly, purpose statements reviewed and signed off, encryption key management with rotation procedures, staff training completion records.

Common findings: vague purpose statements like "to improve customer service," no documentation on what data goes into the vector database, no process for honoring deletion requests, insufficient access controls on vector databases, missing legal basis documentation.

Putting it together

The systems that pass audits are the ones where you can clearly explain what data goes in and why, how it's protected throughout its lifecycle, how you honor data subject rights, and how you demonstrate ongoing compliance. Start with your data inventory, build purpose-limited systems, implement audit trails. The technical implementation follows from there.

What I'd do differently next time: I would implement automated compliance testing in the CI/CD pipeline earlier, checking for common configuration mistakes before they reach production rather than finding them during an audit.

Why I built ECL: synthesis over retrieval for enterprise knowledge

Sat, 21 Mar 2026 00:00:00 GMT

I have shipped enough RAG systems to know what they cannot do. They retrieve the closest matching document. They do not tell you that two documents disagree, that one of them is wrong, or that you should not be answering the question in the first place.

That gap is what I built Enterprise Context Layer (ECL) for. It is open source under GPL-3.0 and I have been running it on a client platform for the past few months. The architecture work that used to take months took days, and the answers the system gives are ones the client's senior engineers actually trust.

This post is about why I built it, what it borrows from, and where the RAG pattern hits a wall.

Where RAG breaks

The RAG promise is straightforward. Chunk your documents, embed them, retrieve the top-k chunks for a query, hand them to an LLM. For "what does the policy say about X" it works fine. For most real semantic questions inside a company, it does not.

Take one I keep seeing in the wild: how long do we keep customer data after churn? The answer lives in four places. A legal policy document. An engineering deletion runbook. A Slack thread from eight months ago. The head of one senior engineer who has rebuilt the pipeline twice. They do not all agree.

A retrieval system returns the closest match. It does not flag the conflict, does not know which source has authority, and does not know that this question should probably be routed to the security team rather than answered to a customer at all. The retrieval layer treats every chunk as equally valid. That is the bug.

In my GDPR audit work I have watched this fail audits three times. Auditors do not want "the closest paragraph." They want a cited, reasoned answer with a defensible chain back to the primary source. RAG cannot give them that without bolting on machinery that ends up looking suspiciously like ECL.

What ECL actually is

ECL is a git repo of markdown files. LLM agents read the company's source systems (Slack, Jira, Confluence, code, calls) and write synthesised, cited knowledge into the repo. Every claim has an inline citation. Every conflict between sources is documented in place rather than silently picked. The git history is the audit trail.

The original idea is Andy Chen's, from his Substack piece The Enterprise Context Layer. He built the first production version at Abnormal Security. The line from his post that stuck with me: retrieval and synthesis are different problems. Glean finds the best matching document. ECL builds the reasoning framework an expert uses, and tells you which questions should never be answered at all.

The version I open-sourced takes Andy's pattern and combines it with two other ideas that solve problems his original write-up did not fully cover.

What I added

The C-Compiler pattern, for parallel agents. Nicholas Carlini's Building a C Compiler with Parallel Claudes at Anthropic showed how to coordinate multiple Claude agents without a message broker. Each task is a YAML file. An agent claims a task by writing a .LOCKED sidecar and pushing to git. If the push is rejected, another agent got there first. Git's push rejection is the mutex.

I wanted ECL to scale to ten or twenty workers running on whatever compute is cheap that day, without a central coordinator. The C-Compiler pattern gave me that for free. No Redis, no RabbitMQ, no Kubernetes operator. Just git.

Superpowers, for process discipline. Jesse Vincent's Superpowers makes the case that agent quality is more about process than prompt. Before any non-trivial task, an agent loads a SKILL.md file that defines the mandatory workflow for that task type.

I store team workflows as skill files inside the ECL itself. The "how we close a deal" workflow lives in domains/skills/closing-a-deal/SKILL.md. It has citations, a last_verified date, and gets re-flagged when the underlying process changes. Process becomes a first-class, versioned, citable artifact. That is the part I think most enterprise teams will not see coming, and the part that has paid off most for the client I built this for.

Why the result holds up where RAG does not

A few things fall out of the design that I did not have to design for.

Conflicts are visible. When two sources disagree, the topic file documents both, cites both, and either resolves the conflict with a note or routes the question. Nothing is silently picked. This is what auditors want. It is also what senior engineers want, because it preserves the disagreement instead of pretending it did not exist.

Sensitive questions get routed, not answered. Each topic file can carry a routing note. "Do not answer data deletion timeline questions directly in customer-facing contexts. Route to the security team." That instruction sits next to the cited answer, so any agent or human reading the file sees both. RAG has no equivalent. RAG always answers.

The system gets smarter from use. There is a file called meta/how-to-get-accurate-information.md. It starts empty. Agents add to it as they discover stale sources, unreliable APIs, and questions that should always be escalated. After a few hundred runs it becomes a dense, experience-grounded guide to the company's specific information landscape. You do not pre-fill it. Invented wisdom is worse than none.

The audit trail is git. Every synthesis is a commit. Every conflict resolution is a commit. Every error is committed before the agent process can crash. I have watched this pay off in two audits already. Showing an auditor git log for a topic file is a different conversation than showing them a vector database.

What the client deployment looked like

Here's what I can say:

We replaced an early-stage RAG prototype with an ECL repo seeded over a 90-minute interview.
Worker agents populated the first three domains in two days.
Architecture decisions that previously needed three or four meetings to reach consensus on were resolved by reading the relevant topic file, because the conflict between two source systems was already documented and routed.
The architecture phase finished in roughly half the time I had budgeted for it.

The headline result was that the team stopped re-litigating the same questions. The ECL made the company's existing disagreements legible. That alone changed how the client's engineers worked.

Is it production-ready

I would call it solid beta, edging into prod-ready. Andy ran the original at scale at Abnormal. The C-Compiler coordination pattern is tested at Anthropic. The Superpowers skill model has thousands of users. My contribution is the integration and the implementation template. That part has been running on a real client platform without me having to babysit it.

The repo is at github.com/TMFNK/Enterprise-Context-Layer. It is GPL-3.0. There are three README files, one for humans, one for agents, one for engineers who want the design rationale. If you give the agent README to Claude Code or CODEX and tell it to follow the Quick-Start Checklist, it will interview you and scaffold the repo for your company.

What I'd do differently next time

Three things.

One: write the routing rules before the agents run. I let the first client deployment populate domain files before we had a finalised list of sensitive topics. The agents wrote good content, but some of it should have been a routing note, not an answer. We had to go back and edit... Define what must not be answered before you let agents answer anything.

Two: keep how-to-get-accurate-information.md empty. I was tempted the first time to seed it with my guesses about which sources were reliable. Well, I was wrong about most of them. The file is more useful when it grows from real agent experience than from my upfront assumptions.

Three: budget for the interview. The 90-minute interview that bootstraps the ECL is essential. The quality of every later answer depends on the domain mapping and source authority hierarchy that come out of that conversation. Treat it like the discovery phase of an audit, not like onboarding paperwork.