Memex: Advanced LLM Wiki with Critical Database Limitations


Evaluation: Memex — A Thoughtful Implementation Still Built on Sand

Executive Summary

Memex is the most sophisticated LLM Wiki implementation I have seen. Unlike the simple tutorials and agency hacks, Memex demonstrates genuine architectural awareness: it has git versioning, citation tracking, provenance analysis, contradiction policies, revertible ingests, and a zero-dependency dashboard. The creator clearly understands many of the problems with naive LLM Wiki approaches — and has built thoughtful mitigations.

Yet Memex still fails the Four Pillars of a proper knowledge base. It is a beautifully engineered prototype, not a production knowledge base. The fundamental substrate remains markdown files in a folder, not a database with foreign keys, schemas, permissions, and SQL.

This evaluation is not a dismissal. Memex is the best attempt in this space. But “best attempt” does not equal “fitness for purpose.”

https://gnu.support/images/2026/04/2026-04-23/800/the-tide-is-coming.webp


What Memex Gets Right (Substantially)

Feature Why It Matters Verdict
Git-backed history Every ingest is a commit. Revertible. Audit trail exists (at file level). ✅ Excellent
Inline citations [^src-*] Tries to solve provenance. Each claim links to a source. ✅ Good
Provenance dashboard Shows per-page citation coverage, missing citations. ✅ Good
Contradiction policies Historical/Disputed/Superseded — acknowledges that LLMs create contradictions. ✅ Insightful
WHY reports Every ingest explains its decisions. Transparency. ✅ Excellent
Wiki Ratio gauge Measures how often Claude reads wiki vs raw files. Self-awareness. ✅ Clever
4-layer raw immutability Protects source files from corruption. ✅ Good
Lint + auto-fix 16-point health check. Attempts to maintain integrity. ✅ Good
Zero-dependency dashboard Python stdlib only. No npm, no Docker. ✅ Pragmatic
CLI + GUI parity Works from terminal or dashboard. ✅ Good
Adaptive indexing Flat → hierarchical → indexed. Scales better than static index.md. ✅ Smart

These are not trivial. The creator has thought deeply about the failure modes of LLM Wiki and built real mitigations. Most LLM Wiki videos show 4 articles and a graph view. Memex shows 600+ sources, revertible ingests, and a provenance dashboard. Respect.


The Four Pillars Evaluation

1. Store with Integrity → ❌ FAIL

Requirement Memex Implementation Status
Start with raw, immutable source ✅ Yes — raw/ folder with 4-layer protection PASS
Use a real database (schemas, FK, indexes) ❌ No — markdown files + git FAIL
Store any knowledge type ⚠️ Text/markdown only — no PDFs, images, spreadsheets, emails, code, voice, video PARTIAL
Record files or locations ✅ Yes — file paths PASS
Preserve integrity / immutability ✅ Git provides versioning, but no cryptographic verification, no field-level immutability PASS (partial)

The Problem: Git tracks files, not fields. It cannot prevent two conflicting edits to the same paragraph. It cannot enforce that a citation link remains valid after a page rename. It is a version control system, not a database with referential integrity.

The Creator Seems Aware: The README mentions raw/ immutability — 4 layers of protection. But wiki/ pages are not immutable. They are edited by Claude on every ingest that touches a related entity. No foreign keys means no automatic updates when a source changes.


2. Relate with Precision → ❌ FAIL

Requirement Memex Implementation Status
Typed relationships ⚠️ Entities, concepts, sources — but types are implicit in filenames/frontmatter, not enforced PARTIAL
Foreign keys ❌ None — citations are text patterns [^src-*], not database FKs. Rename a source, citations break. FAIL
Bidirectional links ⚠️ Obsidian graph view shows backlinks, but not enforced at storage level PARTIAL
Explicit structure ✅ Folders + CLAUDE.md schema PASS
Hyperlinking ✅ Markdown links + citation badges PASS

The Problem: Memex uses text pattern citations [^src-*] instead of foreign keys. This is clever — but it is still text. If you rename raw/gpt-1.md to raw/gpt1.md, every citation to it breaks silently. There is no cascade update. No database means no referential integrity.

The Creator’s Mitigation: Git history allows revert. The provenance dashboard shows missing citations. But detection is not prevention. Links still break.


3. Trust with Provenance → ⚠️ PARTIAL (Best of all LLM Wikis)

Requirement Memex Implementation Status
Provenance — every fact knows its source ✅ Inline citations [^src-*] + provenance dashboard PASS
Permissions / access control ❌ None — anyone with file access sees everything. No user roles. No object-level ACL. FAIL
Audit trails ✅ Git commits + WHY reports + log.md PASS
Human curation ✅ Dashboard provides approval/revert. Human in loop. PASS
Cryptographic verification ❌ None — no signatures FAIL

Milestone: Memex is the first LLM Wiki implementation that actually attempts provenance. The citation system is real. The provenance dashboard shows coverage. This is genuine progress.

But the fatal flaw: No permissions. Memex is single-user by design. The instant you add a second person, or want to share with a client, or host it on a server — zero access control. Every user sees every page. No read/write separation. No role-based access.

The README does not mention this limitation. It presents Memex as a “personal knowledge base” — that is accurate. But “personal” is not “production.”


4. Retrieve with Speed → ❌ FAIL

Requirement Memex Implementation Status
Search by any dimension ⚠️ TF-IDF full-text search. No facet search, no metadata filters. PARTIAL
SQL queries ❌ No — cannot SELECT * FROM entities WHERE type = 'company' FAIL
Queryable structure ❌ No — queries go through Claude (LLM), not deterministic FAIL
Access file properties ⚠️ Git provides dates, but not custom properties PARTIAL

The Problem: Memex’s query is not deterministic. When you ask a question, Claude reads relevant wiki pages and synthesizes an answer. That is RAG, just with a different retrieval source. The answer is probabilistic, not guaranteed.

The Wiki Ratio gauge is honest about this: it measures how often Claude reads wiki vs raw files. But it does not change the fundamental fact that retrieval is LLM-mediated, not database-query deterministic.

What a real knowledge base does: SELECT answer FROM facts WHERE question = 'Who is John's sister?' — deterministic, sub-second, free, guaranteed correct if data is correct.

What Memex does: Claude reads files, infers, guesses, maybe correct, maybe hallucinates, costs tokens, takes seconds.


One Comparison Table: Memex vs. Four Pillars

Pillar Requirement Memex Status
STORE Real database (FKs, indexes) Markdown + git ❌ FAIL
STORE Any knowledge type Text only ⚠️ PARTIAL
RELATE Foreign keys Text citations [^src-*] ❌ FAIL
RELATE Typed relationships Implicit (filenames/frontmatter) ⚠️ PARTIAL
TRUST Provenance ✅ Citations + dashboard PASS
TRUST Permissions / ACL None ❌ FAIL
TRUST Audit trails ✅ Git + WHY reports PASS
RETRIEVE SQL / deterministic No — LLM-mediated ❌ FAIL
RETRIEVE Search by any dimension TF-IDF only ⚠️ PARTIAL

Tally: ❌ 4 failures | ⚠️ 3 partial | ✅ 2 passes

Memex passes provenance and audit trails — which is more than any other LLM Wiki. But it still fails on database, foreign keys, permissions, and deterministic retrieval.


What Memex Reveals About LLM Wiki Architecture

Observation Implication
The creator had to build git versioning, citation parsing, provenance dashboards, linting, and contradiction policies just to make markdown files barely trustworthy Markdown is the wrong substrate. All this engineering is compensating for missing database features.
Permissions are completely absent Memex is explicitly “personal.” The moment you need multi-user or client access, it collapses.
Query is still LLM-mediated The Wiki Ratio gauge is honest, but the fundamental retrieval is still probabilistic, expensive, and non-deterministic.
Citations are text patterns, not foreign keys [^src-*] is clever, but it breaks on rename. No cascade. No referential integrity.
No SQL, no structured queries Complex questions require Claude to infer, not query. Fragile.

The Uncomfortable Truth

Memex is impressive. The creator has built more thoughtful infrastructure than 99% of LLM Wiki promoters. The provenance system, git backing, contradiction policies, and dashboard are genuinely good work.

But it is still markdown files in a folder.

Every engineering hour spent building citation parsers, provenance dashboards, and lint checkers is an hour not spent migrating to a real database. PostgreSQL would give you:

Memex is polishing a turd. A very shiny, well-engineered turd. But still a turd.


Recommendation for the Creator (Nicholas Yoo)

You are clearly talented. Your dashboard is clean, your architecture is thoughtful, your provenance system is the best in class for LLM Wiki.

Now build it on PostgreSQL.

Keep the same concepts:

But replace the fragile foundation with a real one. You will eliminate:

You are so close to something actually good. Don’t stop at “best LLM Wiki.” Build a real Dynamic Knowledge Repository.


Final Verdict

Aspect Evaluation
Technical execution ✅ Excellent — best LLM Wiki by far
Provenance ✅ First implementation to actually solve it (mostly)
Git integration ✅ Smart — revertible ingests
Honest metrics ✅ Wiki Ratio gauge admits LLM mediation
Fails Four Pillars ❌ Yes — no database, no FKs, no permissions, no SQL
Production-ready ❌ No — single-user, no permissions, markdown substrate
Personal knowledge base ⚠️ Maybe — for a technical user who accepts limits
Business / team / client ❌ Absolutely not — no permissions, no multi-user

Overall: 🐑 Not a sheep. A thoughtful engineer building on sand. 🏗️

Memex is the best argument for why LLM Wiki needs to die — because even the best implementation still fails the Four Pillars. The problem is not the engineering. The problem is the substrate.

Move to PostgreSQL. Then we talk. 🐘


A Note to the Reader

If you are considering Memex for personal use: it is the best LLM Wiki option available. The creator has done real work. You will have a better experience than with raw markdown files.

But understand what you are getting:

Use it as a personal research notebook. Do not use it for anything that requires security, collaboration, or deterministic answers.

🐑💀🧙

⚠️ THE WORD “WIKI” HAS BEEN PERVERTED ⚠️

⚠️ ARCHITECTURAL CRIME SCENE ⚠️

⚠️ THE WORD "WIKI" HAS BEEN PERVERTED ⚠️

By Andrej Karpathy and the Northern Karpathian School of Doublespeak

✅ A REAL WIKI — Honoring Ward Cunningham, Wikipedia, and every human curator worldwide
❌ KARPATHY'S "LLM WIKI" — An insult to the very concept
Human-curated
Real people write, edit, debate, verify, and take responsibility.
LLM-generated
Hallucinations are permanent. No human took ownership of any "fact."
Versioned history
Every edit has author, timestamp, reason. Rollback is trivial.
No audit trail
Who changed what? When? Why? Nobody knows. Git is an afterthought.
Source provenance
Every claim links back to its original source. You can verify.
"Trust me, I'm the LLM"
No traceability from summary back to source sentence. Errors become permanent.
Foreign keys / referential integrity
Links are database-backed. Rename a page, links update automatically.
Links break when you rename a file
No database. No foreign keys. Silent link rot guaranteed.
Permissions / access control
Fine-grained control: who can see, edit, delete, approve.
Anyone with file access sees everything
Zero access control. NDAs, medical records, client secrets — all exposed.
Queryable (SQL, structured)
Ask complex questions. Get precise answers. Join tables.
Browse-only markdown
Full-text search at best. No SQL. No structured queries.

🕯️ This is an insult to every Wikipedia editor, every MediaWiki contributor, every human being who spent hours citing sources, resolving disputes, and building the largest collaborative knowledge repository in human history. 🕯️

KARPATHY'S "WIKI" has:
❌ No consensus-building
❌ No talk pages
❌ No dispute resolution
❌ No citation requirements
❌ No editorial oversight
❌ No way to say "this fact is disputed"
❌ No way to privilege verified information over hallucinations
❌ No way to trace any claim back to its source

In the doublespeak of Northern Karpathia:

"Wiki" means "folder of markdown files written by a machine that cannot remember what it wrote yesterday, linked by strings that snap when you breathe on them, viewed through proprietary software that reports telemetry to people you do not know, containing 'facts' that came from nowhere and go nowhere, protected by no permissions, audited by no one, and trusted by no one with a functioning prefrontal cortex."

🙏 Respect to Ward Cunningham who invented the wiki in 1995 — a tool for humans to collaborate.
🙏 Respect to Wikipedia editors worldwide who defend verifiability, neutrality, and consensus.
🙏 Respect to every real wiki participant who knows that knowledge is built through human effort, not machine hallucination.

⚠️ THIS IS NOT A WIKI. THIS IS A FOLDER OF LLM-GENERATED FILES. ⚠️

Calling it a "wiki" is linguistic fraud. Do not be fooled.

🐑💀🧙

— The Elephant, The Wizard, and every human wiki editor who ever lived

Related pages