Running Local Large Language Model Should Be a Must for Every Free Software Supporterx
Hey, Jean Louis here from gnu.support. You know Richard Stallman and the GNU Project have been screaming about the four software freedoms for decades—freedom 0 to run the program as you wish, freedom 1 to study and change it, freedom 2 to redistribute copies, freedom 3 to distribute modified versions. Well, world of Large Languge Models is no different. In fact, with LLMs, it’s even more critical.
Let me show you why I run everything locally on my RTX 3090, and why you should too.
What I’m Running Right Now
Just look at what pgrep -fa llama-serv shows me:
3076 /usr/local/bin/llama-server --rerank -m /mnt/nvme0n1/LLM/quantized/bge-reranker-v2-m3-q8_0.gguf -v -c 8192 -ub 2048 --log-timestamps --host 192.168.1.68 --port 7676 144022 /usr/local/bin/llama-server -ngl 999 -v -c 8192 -ub 1024 --embedding --log-timestamps --host 192.168.1.68 --port 9999 -m /mnt/nvme0n1/LLM/nomic-ai/quantized/nomic-embed-text-v1.5-Q8_0.gguf 144235 /usr/local/bin/llama-server --jinja -fa on -c 32768 -ngl 64 -v --log-timestamps --host 192.168.1.68 -ub 512 --threads 16 --embeddings --webui-mcp-proxy --pooling mean --reasoning off -m /mnt/nvme0n1/LLM/quantized/Qwen3.6-35B-A3B-UD-Q3_K_XL.gguf --mmproj /mnt/nvme0n1/LLM/quantized/Qwen3.6-35B-A3B-mmproj-F16.gguf
And then there’s my vision stuff:
pgrep -fa image 2963 /bin/bash -x /home/data1/protected/bin/rcd/rcd-llm-start-image-embeddings-model.sh pgrep -fa vision 3066 python /mnt/data/LLM/nomic-ai/nomic-embed-vision-v1.5-api.py
Why Local Large Language Model? Let Me Count The Ways
First, speed. That 35B Qwen 3.6 on my RTX 3090? I get about 145 tokens per second. It does vision, image understanding, OCR—everything. No waiting for some corporate server to maybe respond. No rate limiting. No “we’re experiencing high demand.”
Second, embeddings that actually work together. I run nomic-embed-vision to generate embeddings for images and find similar images by text or by image. And because I also run nomic-embed-text-v1.5 (same embedding space!), I can do cross-modal search. Find me a photo of a sunset using a text description. Find me all images similar to this one. All local. All private.
Third, proper RAG. That bge-reranker isn’t just sitting there idle. It helps sort my retrieval-augmented generation results so the most relevant stuff comes first. When you’re searching through your own documents, you want quality, not just “well, here’s something.”
Emacs Lisp: My Secret Weapon
Most of the stuff I program myself with Emacs Lisp. Yeah, I know, people look at me funny. But hear me out—Emacs is the ultimate LLM hacking environment. Let me show you my rcd-llm function. This thing is my daily driver:
(defun rcd-llm (&optional prompt) "Send PROMPT to your locally running LLM and handle the response intelligently. This is your Swiss Army knife for LLM interaction in Emacs. It respects the four freedoms by working exclusively with your local models. How to use it: - Just call it with no prefix: replaces the selected region with the LLM's response - `C-u` (one prefix): inserts the response AFTER the current region or cursor - `C-u C-u` (two prefixes): copies the response to kill ring (memory) without inserting - `C-u C-u C-u` (three prefixes): pops up a new buffer showing both prompt and response What it does automatically: 1. Grabs any selected text as context (using `rcd-region-string`) 2. Adds your user memory if you've enabled it (so the LLM remembers your preferences) 3. Asks you for a prompt if you didn't provide one 4. Intelligently combines your prompt with any selected region 5. If you've enabled function calling, it handles tool calls in a loop (up to 5 rounds) 6. Logs everything so you can audit what the LLM did 7. Adds both your prompt and the response to the LLM's memory for conversation continuity 8. Can speak \"Finished\" when done if you like audio feedback The real magic is in the context handling around line 80 or so—when you hit `C-u C-u` with a region selected, it actually grabs text before and after your selection, building a full context window. The LLM sees what came before, what comes after, and ONLY modifies the middle part. Perfect for editing inside existing text. It uses whatever LLM you've configured in `rcd-llm-users-llm-function`, so you can swap models freely. No vendor lock-in. No surveillance. The prefix argument conditions (around line 100-150) give you precise control: - Region + C-u → insert response after region - Region + C-u C-u → copy response to kill ring - Region + three prefixes → pop up report buffer - Otherwise → replace region or just insert And if your buffer is read-only? It gracefully falls back to showing the response in a separate window. This is what software freedom looks like—I can modify this function, share it with friends, run it on any computer I own, and nobody can take it away from me." (interactive) (let* ((region (rcd-region-string)) (memory (when rcd-llm-use-users-llm-memory (rcd-llm-user-memory))) (original-prompt (or prompt (rcd-ask rcd-llm-prompt))) (output-prompt (cond (rcd-llm-show-prompt original-prompt) (t nil))) ;; Process prompt with region (without adding function prompts) (user-prompt (cond ((and region (not (string-empty-p original-prompt))) (concat (string-add original-prompt ":\n\n") region)) ((not (string-empty-p original-prompt)) original-prompt) ((and region (string-empty-p original-prompt)) region) (t nil))) ;; Get tools JSON if functions are enabled (tools-json (when rcd-llm-use-functions (rcd-llm-tools-json))) (rcd-message-date nil) ;; Store the LLM function to use (llm-func (cond ((and (boundp 'rcd-llm-users-llm-function) (functionp rcd-llm-users-llm-function)) rcd-llm-users-llm-function) (t (error "Could not find a model. Missing model setup?")))) ;; Get system message for follow-up calls (system-message (when rcd-llm-use-functions (or (rcd-db-get-entry "llmmodels" "llmmodels_systemmessage" (rcd-db-users-defaults "llmmodels") rcd-db) "You are helpful assistant.")))) (cond (user-prompt (rcd-message "Requesting LLM...") (let* ((initial-response (funcall llm-func user-prompt memory nil nil nil nil nil nil nil tools-json)) (response initial-response) (max-tool-calls 5) ; Prevent infinite loops (tool-call-count 0)) ;; Handle tool calls in a loop (multiple rounds if needed) (while (and rcd-llm-use-functions response (rcd-llm-has-tool-calls-p response) (< tool-call-count max-tool-calls)) (setq tool-call-count (1+ tool-call-count)) ;; Execute tool calls and get results (let ((tool-results (rcd-llm-execute-tool-calls response))) (when tool-results ;; Build messages with system message and tool results (let* ((tool-messages (rcd-llm-build-messages-with-tools user-prompt tool-results)) (full-messages (vconcat `[((role . "system") (content . ,system-message))] tool-messages)) ;; Prepare arguments for follow-up call (args (list nil memory nil nil nil nil nil nil nil ; prompt through stream nil ; tools-json (none) (json-encode full-messages)))) ; messages-json ;; Call with apply to ensure correct number of arguments (setq response (apply llm-func args)))))) (cond (response (rcd-log-llm user-prompt response) (cond ((and (equal (prefix-numeric-value current-prefix-arg) 5) (called-interactively-p)) (let* ((context (rcd-region-with-context)) (before (nth 0 context)) (region (nth 1 context)) (after (nth 2 context)) (full-prompt (format "## Context before (text before):\n\n%s\n\n## === MAIN CONTEXT BEGIN (this is text in middle)===\n%s\n=== END MAIN CONTEXT ===\n\nContext after (this is text that is after main):\n\n%s\n\n## INSTRUCTION\n\nYou must act on the main context, while knowing what as before, and what comes after in the text. Your instruction: %s" before region after user-prompt)) (middle-response (funcall llm-func full-prompt memory nil nil nil nil nil nil nil tools-json))) (when middle-response (rcd-log-llm full-prompt middle-response) (insert middle-response)))) ;; when there is region ((and region (called-interactively-p) ;; When one C-u (eql (car current-prefix-arg) 4)) (goto-char (cdar (region-bounds))) (setq deactivate-mark t) (cond (buffer-read-only (rcd-message "Buffer is read-only, showing response in separate window") (rcd-llm-pop-report llm-func output-prompt response)) (t (insert "\n\n" response "\n\n")))) ;; When region and two C-u ((and region (called-interactively-p) ;; When two C-u (eql (car current-prefix-arg) 16)) (setq deactivate-mark t) (rcd-kill-new response)) ;; with 3 x C-u open buffer ((or (and (eql (car current-prefix-arg) 64) (called-interactively-p)) rcd-llm-pop-to-window) (rcd-llm-pop-report llm-func user-prompt response)) ;; otherwise if region ((and region (called-interactively-p)) (cond ((not buffer-read-only) (rcd-region-string-replace response)) (t (rcd-llm-pop-report llm-func output-prompt response)))) ;; if called interactively, insert into buffer ((called-interactively-p) (cond ((not buffer-read-only) (insert response)) (t (rcd-llm-pop-report llm-func output-prompt response)))) ;; otherwise just return response (t response)) ;; Handle memory (when (and rcd-llm-add-to-memory rcd-llm-use-users-llm-memory) (hyperscope-add-to-column (rcd-db-current-user-llm-memory) "hyobjects_text" user-prompt) (hyperscope-add-to-column (rcd-db-current-user-llm-memory) "hyobjects_text" response)) (setq rcd-llm-last-response response) (when rcd-llm-speak (when rcd-speech (rcd-tts-and-speak "Finished."))) ;; Return or kill response (cond ((called-interactively-p) (rcd-kill-new response)) (t response))) (t (prog2 (rcd-warning-message "Could not reach LLM server") nil))))) (t (rcd-warning-message "LLM: Empty Prompt")))))
But Wait, There’s More: Cherry Studio
Now, I love Emacs. But I also run Cherry Studio—a free software, cross-platform LLM productivity desktop client. This thing is amazing if you’re not as deep into Emacs as I am.
Cherry Studio unifies access to over 300 leading LLM models and 50+ providers. But here’s the kicker—it works perfectly with local models via Ollama (or any OpenAI-compatible endpoint). So you get all the fancy features:
- Smart chat with document and image support
- Autonomous LLM agents that can run tasks
- LLM drawing and translation
- Customizable knowledge base for your personal data
- Built-in assistants
- Multi-model comparison tools
- MCP server extensibility
And because it stores everything locally, your privacy is respected. No corporate spyware.
The best part? Cherry Studio can use everything I just mentioned—my bge-reranker for RAG, my embedding models, my Qwen vision model. It’s all there, working together, fully local, fully free.
Shell Scripts, Emacs, and a Complete Ecosystem
I also have shell scripts. Lots of them. You know that rcd-llm-start-image-embeddings-model.sh you saw? That’s a shell script. The combination means I can plug LLM into almost any program on my system.
Emacs Lisp is ideal for fiddling with LLMs because you’re already inside a text editor that’s also an operating system. Want to process the current paragraph? One function. Want to refactor a whole file based on a prompt? Easy. Want to build a RAG system that searches your org-mode notes? Done.
And Cherry Studio gives me a professional, polished interface for when I want that. The point is: both are free software. Both respect the four freedoms. Both run locally.
The Bottom Line
Running local LLM should be a must for every free software supporter. Not “nice to have.” Not “when I get around to it.” Right now.
Because the alternative is sending your data to Microsoft, Google, OpenAI, or some other corporation that will absolutely use it to profile you, train their models, and lock you into their ecosystem. That’s not freedom. That’s the opposite of freedom.
Stallman warned us about this with SaaS. LLM is just SaaS on steroids unless you run it yourself.
So go ahead. Grab an RTX 3090 if you can afford it (used ones are fine). Run llama-server. Set up Cherry Studio. Hack your Emacs config. But most importantly: keep your LLM local, keep your data yours, and keep your freedoms intact.
The models are out there. The software is free. There’s no excuse anymore.
— Jean Louis, gnu.support
Four freedoms or bust