Back to research

Why Your RAG's Source of Truth Shouldn't Be a Vector Database

Most RAG architectures treat the vector database as the knowledge base. I did the opposite: plain markdown files are the source of truth, and the vector index is a disposable cache that can be rebuilt at any time. Here's why, and how it plays out in practice.

The standard RAG write path is a trap

The typical RAG setup looks like this: content goes into a chunking pipeline, gets embedded, and lands in a vector store. The vector store is the knowledge base. If you want to inspect what the system "knows," you query the vector DB. If you want to edit knowledge, you re-embed and upsert.

This works until it doesn't. Switching embedding models means re-embedding everything. Debugging bad retrieval means reverse-engineering what chunks exist and how they were split. Migrating between vector databases means exporting opaque binary blobs. The vector store becomes a roach motel: data goes in, but getting it out in a useful form is painful.

In homeclaw (an open-source AI household assistant), I inverted this. The write path is a plain file append. The vector index is derived.

Filesystem writes, vector reads

When the agent saves a memory, it appends a timestamped line to a markdown file. That's it. No embeddings, no async pipeline, no failure modes beyond disk I/O.

The files live at predictable paths:

workspaces/
  household/
    memory/
      routines.md        # shared household knowledge
      food.md
    channels/
      group-family/
        2026-03-25.md    # group chat logs, auto-indexed
  alice/
    memory/
      health.md          # alice's private knowledge
      work.md
    notes/
      2026-03-25.md      # daily notes
  bob/
    memory/
      hobbies.md

Each memory file is append-only with timestamps:

# Food

- [2026-03-10 08:15] Alice is vegetarian, Bob eats everything
- [2026-03-12 19:30] The family likes Thai food on Fridays
- [2026-03-25 12:00] Bob is trying a low-carb diet this month

memsearch (a lightweight wrapper around Milvus Lite) watches the filesystem and indexes changes automatically. The agent writes files; memsearch notices and embeds them in the background. The two systems never talk to each other directly.

Agent writes        Filesystem          memsearch watches
  memory_save ──> append to .md ──> watch() detects change
                       │                     │
                       │              embed & index
                       │                     │
                       ▼                     ▼
                  Source of truth      Derived index
                  (inspectable,       (disposable,
                   git-trackable)      rebuildable)

How initialization works

On startup, SemanticMemory collects every workspace subdirectory and hands them to memsearch for indexing. Then it starts a file watcher so new content becomes searchable without a restart.

class SemanticMemory:
    async def initialize(self) -> None:
        from memsearch import MemSearch

        paths = self._collect_paths()  # every workspace subdir

        self._mem = MemSearch(
            paths=paths,
            milvus_uri=f"{self._workspaces_path}/semantic_index.db",
            embedding_provider=self._embedding_provider,
        )

        n = await self._mem.index()         # build initial index
        self._watcher = self._mem.watch()  # watch for file changes
homeclaw/memory/semantic.py

Because the files are the source of truth, the vector index is freely rebuildable. This matters when you change embedding providers. If you switch from a local model to OpenAI embeddings (or vice versa), the vector dimensions won't match. Rather than maintaining a migration system, I just drop and rebuild:

try:
    self._mem = MemSearch(**kwargs)
except ValueError as ve:
    if "dimension mismatch" in str(ve).lower():
        # Embedding provider changed — nuke the index and rebuild
        client = MilvusClient(uri=milvus_uri)
        client.drop_collection("memsearch_chunks")
        client.close()
        self._mem = MemSearch(**kwargs)  # re-indexes from files
homeclaw/memory/semantic.py

No data is lost. No migration needed. The files haven't changed; only the derived index was stale.

Privacy through path prefixes

In a multi-user household, Alice shouldn't see Bob's private memories during semantic search. The conventional approach would be metadata filters or separate collections per user. I use filesystem paths instead.

The recall method fetches 3x the requested results from the vector index, then filters by checking whether each result's source file path is visible to the requesting person:

async def recall(
    self, query: str, top_k: int = 3,
    person: str | None = None,
) -> list[dict]:
    # Over-fetch to compensate for post-hoc filtering
    results = await self._mem.search(query, top_k=top_k * 3)

    household_prefix = f"{self._workspaces_path}/household"
    person_prefix = f"{self._workspaces_path}/{person}"

    filtered = []
    for r in results:
        source = r.get("source", "")
        if source.startswith(household_prefix):
            filtered.append(r)       # household = always visible
        elif person_prefix and source.startswith(person_prefix):
            filtered.append(r)       # own workspace = visible
        # other members' paths silently excluded

    return filtered[:top_k]
homeclaw/memory/semantic.py
Why this works

The filesystem path is the access control model. workspaces/household/ is shared. workspaces/alice/ is private to Alice. There's no separate ACL to keep in sync, no metadata tags to attach at indexing time, no risk of a tagging bug leaking private data into shared search results. The path is intrinsic to the data.

The 3x over-fetch means that even if two-thirds of results belong to other users, we still fill the requested top_k. In practice, most queries hit a mix of household and personal content, so the over-fetch is rarely needed in full.

Group chats become searchable knowledge

When someone talks to homeclaw in a family group chat, the exchange is logged as a daily markdown file:

def _append_chat_log(
    workspaces: Path, channel: str,
    user_text: str, assistant_text: str,
) -> None:
    channel_dir = workspaces / "household" / "channels" / channel
    channel_dir.mkdir(parents=True, exist_ok=True)

    today = datetime.now(UTC).strftime("%Y-%m-%d")
    log_path = channel_dir / f"{today}.md"

    timestamp = datetime.now(UTC).strftime("%H:%M")
    entry = f"- [{timestamp}] {user_text}\n- [{timestamp}] homeclaw: {assistant_text}\n"

    with open(log_path, "a") as f:
        f.write(entry)
homeclaw/agent/loop.py

These files land under workspaces/household/channels/, which means memsearch indexes them automatically (it watches the whole workspace tree), and they pass the household_prefix check in recall, so they're visible to every household member.

The practical effect: Alice can DM homeclaw and say "what did we decide about dinner in the family chat?" and get a relevant answer, because the group chat is searchable household knowledge. No explicit "save to memory" step needed. The conversation is the knowledge, filed in a place the search layer can find it.

Two-layer context: deterministic + semantic

Semantic search alone isn't reliable enough for critical information. If Alice has a medication reminder due today, that can't depend on whether the embedding model ranks it highly enough against her query. So I built a two-layer context system with explicit priority ordering:

Priority Content Source
1 (never dropped) Current time, person, household profile, today's reminders Deterministic file reads
2 Recent notes, memory topics, skill catalog, routines, decisions Deterministic file reads
3 (dropped first) Semantically recalled chunks Vector search on the user's message

The context builder always injects structured data first, then appends semantic recall results at the end:

async def build_context(
    message: str, person: str, workspaces: Path,
    semantic_memory: SemanticMemory | None = None,
) -> str:
    parts: list[str] = []

    # Priority 1: always present
    parts.append(f"You are talking to: {person}")
    parts.append(f"Current time: {now}")
    # ... household profile, today's reminders ...

    # Priority 2: structured personal data
    # ... recent notes, memory topics, routines, decisions ...

    # Priority 3: semantic recall (additive, dropped first)
    if semantic_memory and semantic_memory.enabled:
        recalled = await semantic_memory.recall(
            message, top_k=cfg.max_semantic_chunks, person=person,
        )
        if recalled:
            parts.append("Relevant context from memory:")
            for item in recalled:
                parts.append(f"  {item['text']}")
homeclaw/agent/context.py

The key insight: semantic search is a supplement, not the primary retrieval mechanism. It adds relevant background that structured lookups might miss. But the system works without it. The LLM still knows who it's talking to, what's on the schedule, and what reminders are due. Semantic recall adds color; it doesn't carry load.

Optional by design

The entire semantic layer is behind a conditional import:

try:
    from memsearch import MemSearch
    # ... initialize, index, watch ...
    self._enabled = True
except ImportError:
    self._enabled = False  # everything still works

Every call to recall() starts with if not self._enabled: return []. The context builder checks semantic_memory.enabled before attempting recall. There's no crash path, no degraded mode flag, no "semantic memory unavailable" warning polluting the UX. The feature simply isn't there, and the system is complete without it.

This matters for deployment flexibility. homeclaw runs on home servers (Unraid, Raspberry Pi, NAS boxes). Not everyone wants to run an embedding model. The semantic layer adds genuine value when present, but the system can't require it without shrinking the set of machines it runs on.


What I gave up

This isn't free. Some tradeoffs are worth naming:

For the household use case, these are acceptable. The simplicity of "write a line to a file, everything else follows" is worth more than the flexibility of a custom pipeline.

The pattern

If you're building a RAG system and your data is human-authored or human-readable, consider this architecture:

  1. Write to files (or any inspectable, versionable store). The write path should be boring — no embeddings, no async pipeline, no failure modes beyond storage I/O.
  2. Watch and index with a separate process. The indexer reads from the files and builds a vector index. It can crash, restart, or be replaced without data loss.
  3. Filter after retrieval using the data's natural structure (file paths, directories, naming conventions) rather than bolt-on metadata.
  4. Layer deterministic retrieval first, semantic search second. If information is critical (reminders, contacts, scheduled events), don't rely on embedding similarity to surface it.
  5. Make the semantic layer optional. If your system doesn't work without vector search, you've made vector search load-bearing. That's a choice, not a necessity.

The boring version of your architecture is usually the right one.

homeclaw is open source: github.com/Jayphen/homeclaw. memsearch is at github.com/zilliztech/memsearch.