Most RAG architectures treat the vector database as the knowledge base. I did the opposite: plain markdown files are the source of truth, and the vector index is a disposable cache that can be rebuilt at any time. Here's why, and how it plays out in practice.
The typical RAG setup looks like this: content goes into a chunking pipeline, gets embedded, and lands in a vector store. The vector store is the knowledge base. If you want to inspect what the system "knows," you query the vector DB. If you want to edit knowledge, you re-embed and upsert.
This works until it doesn't. Switching embedding models means re-embedding everything. Debugging bad retrieval means reverse-engineering what chunks exist and how they were split. Migrating between vector databases means exporting opaque binary blobs. The vector store becomes a roach motel: data goes in, but getting it out in a useful form is painful.
In homeclaw (an open-source AI household assistant), I inverted this. The write path is a plain file append. The vector index is derived.
When the agent saves a memory, it appends a timestamped line to a markdown file. That's it. No embeddings, no async pipeline, no failure modes beyond disk I/O.
The files live at predictable paths:
workspaces/
household/
memory/
routines.md # shared household knowledge
food.md
channels/
group-family/
2026-03-25.md # group chat logs, auto-indexed
alice/
memory/
health.md # alice's private knowledge
work.md
notes/
2026-03-25.md # daily notes
bob/
memory/
hobbies.md
Each memory file is append-only with timestamps:
# Food
- [2026-03-10 08:15] Alice is vegetarian, Bob eats everything
- [2026-03-12 19:30] The family likes Thai food on Fridays
- [2026-03-25 12:00] Bob is trying a low-carb diet this month
memsearch (a lightweight wrapper around Milvus Lite) watches the filesystem and indexes changes automatically. The agent writes files; memsearch notices and embeds them in the background. The two systems never talk to each other directly.
Agent writes Filesystem memsearch watches
memory_save ──> append to .md ──> watch() detects change
│ │
│ embed & index
│ │
▼ ▼
Source of truth Derived index
(inspectable, (disposable,
git-trackable) rebuildable)
On startup, SemanticMemory collects every workspace subdirectory and hands them to memsearch for indexing. Then it starts a file watcher so new content becomes searchable without a restart.
class SemanticMemory:
async def initialize(self) -> None:
from memsearch import MemSearch
paths = self._collect_paths() # every workspace subdir
self._mem = MemSearch(
paths=paths,
milvus_uri=f"{self._workspaces_path}/semantic_index.db",
embedding_provider=self._embedding_provider,
)
n = await self._mem.index() # build initial index
self._watcher = self._mem.watch() # watch for file changeshomeclaw/memory/semantic.py
Because the files are the source of truth, the vector index is freely rebuildable. This matters when you change embedding providers. If you switch from a local model to OpenAI embeddings (or vice versa), the vector dimensions won't match. Rather than maintaining a migration system, I just drop and rebuild:
try:
self._mem = MemSearch(**kwargs)
except ValueError as ve:
if "dimension mismatch" in str(ve).lower():
# Embedding provider changed — nuke the index and rebuild
client = MilvusClient(uri=milvus_uri)
client.drop_collection("memsearch_chunks")
client.close()
self._mem = MemSearch(**kwargs) # re-indexes from fileshomeclaw/memory/semantic.py
No data is lost. No migration needed. The files haven't changed; only the derived index was stale.
In a multi-user household, Alice shouldn't see Bob's private memories during semantic search. The conventional approach would be metadata filters or separate collections per user. I use filesystem paths instead.
The recall method fetches 3x the requested results from the vector index, then filters by checking whether each result's source file path is visible to the requesting person:
async def recall(
self, query: str, top_k: int = 3,
person: str | None = None,
) -> list[dict]:
# Over-fetch to compensate for post-hoc filtering
results = await self._mem.search(query, top_k=top_k * 3)
household_prefix = f"{self._workspaces_path}/household"
person_prefix = f"{self._workspaces_path}/{person}"
filtered = []
for r in results:
source = r.get("source", "")
if source.startswith(household_prefix):
filtered.append(r) # household = always visible
elif person_prefix and source.startswith(person_prefix):
filtered.append(r) # own workspace = visible
# other members' paths silently excluded
return filtered[:top_k]homeclaw/memory/semantic.py
The filesystem path is the access control model. workspaces/household/ is shared. workspaces/alice/ is private to Alice. There's no separate ACL to keep in sync, no metadata tags to attach at indexing time, no risk of a tagging bug leaking private data into shared search results. The path is intrinsic to the data.
The 3x over-fetch means that even if two-thirds of results belong to other users, we still fill the requested top_k. In practice, most queries hit a mix of household and personal content, so the over-fetch is rarely needed in full.
When someone talks to homeclaw in a family group chat, the exchange is logged as a daily markdown file:
def _append_chat_log(
workspaces: Path, channel: str,
user_text: str, assistant_text: str,
) -> None:
channel_dir = workspaces / "household" / "channels" / channel
channel_dir.mkdir(parents=True, exist_ok=True)
today = datetime.now(UTC).strftime("%Y-%m-%d")
log_path = channel_dir / f"{today}.md"
timestamp = datetime.now(UTC).strftime("%H:%M")
entry = f"- [{timestamp}] {user_text}\n- [{timestamp}] homeclaw: {assistant_text}\n"
with open(log_path, "a") as f:
f.write(entry)homeclaw/agent/loop.py
These files land under workspaces/household/channels/, which means memsearch indexes them automatically (it watches the whole workspace tree), and they pass the household_prefix check in recall, so they're visible to every household member.
The practical effect: Alice can DM homeclaw and say "what did we decide about dinner in the family chat?" and get a relevant answer, because the group chat is searchable household knowledge. No explicit "save to memory" step needed. The conversation is the knowledge, filed in a place the search layer can find it.
Semantic search alone isn't reliable enough for critical information. If Alice has a medication reminder due today, that can't depend on whether the embedding model ranks it highly enough against her query. So I built a two-layer context system with explicit priority ordering:
| Priority | Content | Source |
|---|---|---|
| 1 (never dropped) | Current time, person, household profile, today's reminders | Deterministic file reads |
| 2 | Recent notes, memory topics, skill catalog, routines, decisions | Deterministic file reads |
| 3 (dropped first) | Semantically recalled chunks | Vector search on the user's message |
The context builder always injects structured data first, then appends semantic recall results at the end:
async def build_context(
message: str, person: str, workspaces: Path,
semantic_memory: SemanticMemory | None = None,
) -> str:
parts: list[str] = []
# Priority 1: always present
parts.append(f"You are talking to: {person}")
parts.append(f"Current time: {now}")
# ... household profile, today's reminders ...
# Priority 2: structured personal data
# ... recent notes, memory topics, routines, decisions ...
# Priority 3: semantic recall (additive, dropped first)
if semantic_memory and semantic_memory.enabled:
recalled = await semantic_memory.recall(
message, top_k=cfg.max_semantic_chunks, person=person,
)
if recalled:
parts.append("Relevant context from memory:")
for item in recalled:
parts.append(f" {item['text']}")homeclaw/agent/context.py
The key insight: semantic search is a supplement, not the primary retrieval mechanism. It adds relevant background that structured lookups might miss. But the system works without it. The LLM still knows who it's talking to, what's on the schedule, and what reminders are due. Semantic recall adds color; it doesn't carry load.
The entire semantic layer is behind a conditional import:
try:
from memsearch import MemSearch
# ... initialize, index, watch ...
self._enabled = True
except ImportError:
self._enabled = False # everything still works
Every call to recall() starts with if not self._enabled: return []. The context builder checks semantic_memory.enabled before attempting recall. There's no crash path, no degraded mode flag, no "semantic memory unavailable" warning polluting the UX. The feature simply isn't there, and the system is complete without it.
This matters for deployment flexibility. homeclaw runs on home servers (Unraid, Raspberry Pi, NAS boxes). Not everyone wants to run an embedding model. The semantic layer adds genuine value when present, but the system can't require it without shrinking the set of machines it runs on.
This isn't free. Some tradeoffs are worth naming:
For the household use case, these are acceptable. The simplicity of "write a line to a file, everything else follows" is worth more than the flexibility of a custom pipeline.
If you're building a RAG system and your data is human-authored or human-readable, consider this architecture:
The boring version of your architecture is usually the right one.