Why Graphify Fails as a Robust LLM Knowledge Base


Another Tutorial, Same Broken Record: A Technical Rebuttal

This video presents Graphify as an implementation of Karpathy’s LLM-Wiki pattern. Let me point out what the video gets wrong — or conveniently leaves out.

https://gnu.support/images/2026/04/2026-04-23/800/graphify.webp


1. “The LLM remembers everything you did years ago”

Claim: “Years later, you still can query everything. The system kept everything that you did and the LLM kept building the information in a very structured way.”

Reality: The LLM has no persistent memory across sessions. It forgets everything between chats. The only “memory” is the markdown files and JSON graphs it wrote last time. If those files contain errors, contradictions, or hallucinations, the LLM will confidently repeat them. The system does not “remember.” It stores. And storage without integrity is just a landfill.


2. “Graphify implements everything Karpathy talked about”

Claim: “It implemented everything that Karpathy talked about.”

Reality: Graphify adds a knowledge graph layer with entity extraction, relationship detection, and community clustering. That’s not “everything Karpathy talked about.” That’s a significant extension. Karpathy’s original pattern is just markdown files with wikilinks. Graphify is doing graph database work — which is good — but the video presents it as a simple implementation of the original pattern. It’s not. It’s a different architecture.


3. “Very simple, very effective, not heavy”

Claim: “Very simple, very effective, it’s not heavy.”

Reality: Graphify uses entity extraction, relationship detection, community clustering, and graph generation. That’s computationally heavy. The demo works with 4 small text files. At scale, with hundreds of PDFs, images, and videos, these operations are not “not heavy.” They are expensive in both compute and API tokens. The video never shows the cost or the time required for large-scale ingestion.


4. “The graph HTML gives you connections you probably didn’t know”

Claim: “This is something that you probably didn’t know but just looking at the files. It understands it through the Graphify process.”

Reality: The LLM extracts entities and relationships based on pattern matching and its training data. These “surprising connections” might be genuine insights, or they might be hallucinations. The video presents them as discoveries without any validation step. There is no mechanism to verify that a detected relationship actually exists in the source documents. The user is left trusting the LLM’s output.


5. “Keep on compounding. It will keep on evolving over time.”

Claim: “This is keep on compounding. After you put more files in this raw folder, it’s going to keep on involving over time.”

Reality: Compounding requires integrity. Without foreign keys, without version control, without contradiction resolution, without permission management, the system does not compound — it accumulates. Accumulation without integrity is noise. The video shows a graph of 4 communities from 4 small text files. It does not show what happens after 100 files, when entities multiply, relationships overlap, contradictions emerge, and the graph becomes uninterpretable.


6. “You can run queries against the graph JSON”

Claim: “You can run query against it. This is the part you use the query for.”

Reality: The video shows a graph JSON file. It does not show what query language you use. It does not show the query syntax. It does not show how to ask “show me all documents related to entity X” or “find contradictions between source A and source B.” The viewer is left to assume that querying is easy. It is not specified.


7. No mention of the fundamental problems

The video never addresses:

Missing Why It Matters
What happens at 1,000 files? The index breaks. You need vector search.
Who verifies entity extractions? The LLM might hallucinate relationships.
How do you resolve contradictions? The graph will show conflicting relationships.
How do you handle permissions? The LLM sees everything.
What is the API cost at scale? Not mentioned.
How do you version the graph? Not mentioned.
How do you roll back incorrect extractions? Not mentioned.

The actual video

The Bottom Line

Graphify is a nice project. It adds entity extraction and graph visualization to Karpathy’s pattern. That is genuinely useful for small-scale knowledge exploration.

But the video presents it as a complete solution that “compounds over time” and “remembers everything.” It does not. The LLM forgets between sessions. The graph has no integrity guarantees. The system has no permissions. The cost at scale is ignored. The human still has to validate every extraction, every relationship, every “surprising connection.”

The video is a tutorial for a prototype that works with 4 small text files. It is not a blueprint for a production knowledge base. The sheep are still lining up. 🐑💀

Build a real knowledge base. Use PostgreSQL. Add foreign keys. Implement permissions. Track versions. Extract metadata deterministically. Let the LLM assist — not replace. 🧙🐘

⚠️ THE WORD “WIKI” HAS BEEN PERVERTED ⚠️

⚠️ ARCHITECTURAL CRIME SCENE ⚠️

⚠️ THE WORD "WIKI" HAS BEEN PERVERTED ⚠️

By Andrej Karpathy and the Northern Karpathian School of Doublespeak

✅ A REAL WIKI — Honoring Ward Cunningham, Wikipedia, and every human curator worldwide
❌ KARPATHY'S "LLM WIKI" — An insult to the very concept
Human-curated
Real people write, edit, debate, verify, and take responsibility.
LLM-generated
Hallucinations are permanent. No human took ownership of any "fact."
Versioned history
Every edit has author, timestamp, reason. Rollback is trivial.
No audit trail
Who changed what? When? Why? Nobody knows. Git is an afterthought.
Source provenance
Every claim links back to its original source. You can verify.
"Trust me, I'm the LLM"
No traceability from summary back to source sentence. Errors become permanent.
Foreign keys / referential integrity
Links are database-backed. Rename a page, links update automatically.
Links break when you rename a file
No database. No foreign keys. Silent link rot guaranteed.
Permissions / access control
Fine-grained control: who can see, edit, delete, approve.
Anyone with file access sees everything
Zero access control. NDAs, medical records, client secrets — all exposed.
Queryable (SQL, structured)
Ask complex questions. Get precise answers. Join tables.
Browse-only markdown
Full-text search at best. No SQL. No structured queries.

🕯️ This is an insult to every Wikipedia editor, every MediaWiki contributor, every human being who spent hours citing sources, resolving disputes, and building the largest collaborative knowledge repository in human history. 🕯️

KARPATHY'S "WIKI" has:
❌ No consensus-building
❌ No talk pages
❌ No dispute resolution
❌ No citation requirements
❌ No editorial oversight
❌ No way to say "this fact is disputed"
❌ No way to privilege verified information over hallucinations
❌ No way to trace any claim back to its source

In the doublespeak of Northern Karpathia:

"Wiki" means "folder of markdown files written by a machine that cannot remember what it wrote yesterday, linked by strings that snap when you breathe on them, viewed through proprietary software that reports telemetry to people you do not know, containing 'facts' that came from nowhere and go nowhere, protected by no permissions, audited by no one, and trusted by no one with a functioning prefrontal cortex."

🙏 Respect to Ward Cunningham who invented the wiki in 1995 — a tool for humans to collaborate.
🙏 Respect to Wikipedia editors worldwide who defend verifiability, neutrality, and consensus.
🙏 Respect to every real wiki participant who knows that knowledge is built through human effort, not machine hallucination.

⚠️ THIS IS NOT A WIKI. THIS IS A FOLDER OF LLM-GENERATED FILES. ⚠️

Calling it a "wiki" is linguistic fraud. Do not be fooled.

🐑💀🧙

— The Elephant, The Wizard, and every human wiki editor who ever lived

Related pages