Why Graphify Fails as a Robust LLM Knowledge Base


Another Tutorial, Same Broken Record: A Technical Rebuttal

This video presents Graphify as an implementation of Karpathy’s LLM-Wiki pattern. Let me point out what the video gets wrong — or conveniently leaves out.

https://gnu.support/images/2026/04/2026-04-23/800/graphify.webp


1. “The LLM remembers everything you did years ago”

Claim: “Years later, you still can query everything. The system kept everything that you did and the LLM kept building the information in a very structured way.”

Reality: The LLM has no persistent memory across sessions. It forgets everything between chats. The only “memory” is the markdown files and JSON graphs it wrote last time. If those files contain errors, contradictions, or hallucinations, the LLM will confidently repeat them. The system does not “remember.” It stores. And storage without integrity is just a landfill.


2. “Graphify implements everything Karpathy talked about”

Claim: “It implemented everything that Karpathy talked about.”

Reality: Graphify adds a knowledge graph layer with entity extraction, relationship detection, and community clustering. That’s not “everything Karpathy talked about.” That’s a significant extension. Karpathy’s original pattern is just markdown files with wikilinks. Graphify is doing graph database work — which is good — but the video presents it as a simple implementation of the original pattern. It’s not. It’s a different architecture.


3. “Very simple, very effective, not heavy”

Claim: “Very simple, very effective, it’s not heavy.”

Reality: Graphify uses entity extraction, relationship detection, community clustering, and graph generation. That’s computationally heavy. The demo works with 4 small text files. At scale, with hundreds of PDFs, images, and videos, these operations are not “not heavy.” They are expensive in both compute and API tokens. The video never shows the cost or the time required for large-scale ingestion.


4. “The graph HTML gives you connections you probably didn’t know”

Claim: “This is something that you probably didn’t know but just looking at the files. It understands it through the Graphify process.”

Reality: The LLM extracts entities and relationships based on pattern matching and its training data. These “surprising connections” might be genuine insights, or they might be hallucinations. The video presents them as discoveries without any validation step. There is no mechanism to verify that a detected relationship actually exists in the source documents. The user is left trusting the LLM’s output.


5. “Keep on compounding. It will keep on evolving over time.”

Claim: “This is keep on compounding. After you put more files in this raw folder, it’s going to keep on involving over time.”

Reality: Compounding requires integrity. Without foreign keys, without version control, without contradiction resolution, without permission management, the system does not compound — it accumulates. Accumulation without integrity is noise. The video shows a graph of 4 communities from 4 small text files. It does not show what happens after 100 files, when entities multiply, relationships overlap, contradictions emerge, and the graph becomes uninterpretable.


6. “You can run queries against the graph JSON”

Claim: “You can run query against it. This is the part you use the query for.”

Reality: The video shows a graph JSON file. It does not show what query language you use. It does not show the query syntax. It does not show how to ask “show me all documents related to entity X” or “find contradictions between source A and source B.” The viewer is left to assume that querying is easy. It is not specified.


7. No mention of the fundamental problems

The video never addresses:

Missing Why It Matters
What happens at 1,000 files? The index breaks. You need vector search.
Who verifies entity extractions? The LLM might hallucinate relationships.
How do you resolve contradictions? The graph will show conflicting relationships.
How do you handle permissions? The LLM sees everything.
What is the API cost at scale? Not mentioned.
How do you version the graph? Not mentioned.
How do you roll back incorrect extractions? Not mentioned.

The actual video

The Bottom Line

Graphify is a nice project. It adds entity extraction and graph visualization to Karpathy’s pattern. That is genuinely useful for small-scale knowledge exploration.

But the video presents it as a complete solution that “compounds over time” and “remembers everything.” It does not. The LLM forgets between sessions. The graph has no integrity guarantees. The system has no permissions. The cost at scale is ignored. The human still has to validate every extraction, every relationship, every “surprising connection.”

The video is a tutorial for a prototype that works with 4 small text files. It is not a blueprint for a production knowledge base. The sheep are still lining up. 🐑💀

Build a real knowledge base. Use PostgreSQL. Add foreign keys. Implement permissions. Track versions. Extract metadata deterministically. Let the LLM assist — not replace. 🧙🐘

⚠️ THE WORD “WIKI” HAS BEEN PERVERTED ⚠️

Related pages