LLM Knowledge Bases

Architecture based on Andrej Karpathy's workflow — hover each component for details
wiki/ — .md directory structure Obsidian Web Clipper Articles → .md + local images Papers & Repos arXiv, GitHub, datasets raw/ directory Source documents staging LLM Compiler raw/ → structured wiki compiles Index & Summaries Auto-maintained — always consulted first Concept Articles (*.md) ~100 articles, ~400K words, backlinked Derived Outputs Slides (Marp), charts, filed-back answers Backlinks & Cross-links Auto-generated link graph Obsidian IDE View wiki + visualizations Q&A Agent Complex queries → research Search Engine Web UI + CLI tool for LLM Linting Health checks & data integrity always if relevant indexed scan all file back into wiki enhance LLM Compilation Pipeline — incremental, each step enhances the wiki Phase 1 Ingest raw data Phase 2 Compile wiki Phase 3 Query & enhance Phase 4 Lint & maintain cycle — always adding up Future: Synthetic Data Generation → Fine-tuning Have the LLM "know" the data in its weights instead of just context windows Direct flow Feedback loop — outputs enhance the wiki