← Blog

Self-Hosting an LLM on a Mac Mini

January 18, 2026 · 5 min read

Contents

First lesson: right-size your models
What's running
How a request flows
What starts at boot
Two tunnels, two purposes
Locking it down with Cloudflare Access
Is it actually useful?
How to set this up yourself
One more thing: how this post was written

I've always wanted to run a local LLM. Not for any grand reason — just to see what it's like to have one sitting on my own hardware, accessible from anywhere. The kind of thing that's always worth trying once.

Turns out it's surprisingly straightforward. Here's what I set up, how the pieces connect, and what I learnt along the way.

#First lesson: right-size your models

My first instinct was to pull the biggest models Ollama offered. I ended up with 78GB of models on disk — including a 120B and a 20B parameter model. Neither could actually run on 8GB of unified memory. They'd either crash or swap so aggressively that every response took minutes.

The fix was obvious in hindsight: ollama rm the oversized ones, pull llama3.2:3b (about 2GB on disk), and move on. It runs comfortably within the Mini's memory and responds in seconds. Match the model to your hardware, not your ambitions.

#What's running

Three things on a base-model Mac Mini (M2, 8GB RAM):

Ollama (port 11434) — the inference engine running llama3.2:3b (~2GB). This is the backend that actually runs the model on the Mac Mini's chip.
Open WebUI (port 3000, via Docker) — a web app that gives you a ChatGPT-style chat interface. It talks to Ollama's API to send prompts and stream responses.
Cloudflare Tunnel — creates an encrypted outbound connection from the Mini to Cloudflare's edge. When someone visits the URL, Cloudflare routes the request through the tunnel to localhost:3000.

Open WebUI running llama3.2:3b on the Mac Mini, accessed via Cloudflare Tunnel

#How a request flows

Browser (phone/laptop)
  → https://yash.voltade.com
  → Cloudflare CDN (DNS resolves here)
  → Cloudflare routes via tunnel to Mac Mini
  → cloudflared forwards to localhost:3000 (Open WebUI)
  → Open WebUI sends prompt to localhost:11434 (Ollama)
  → Ollama runs inference on llama3.2:3b
  → Response flows back the same path

The key insight: no port forwarding required. Cloudflared makes an outbound connection to Cloudflare (your Mini connects out, not in). Cloudflare acts as a reverse proxy. Your router doesn't need to expose any ports.

#What starts at boot

Everything is configured to survive a restart:

Ollama — macOS app, set to launch at login.
Open WebUI — Docker container with --restart always.
Cloudflared tunnel — LaunchAgent plist at ~/Library/LaunchAgents/.

So if the Mini loses power and comes back, the whole stack comes up on its own.

#Two tunnels, two purposes

I already had a Cloudflare tunnel for SSH access to the Mini. The LLM tunnel is separate:

Tunnel	Purpose	Config
`mac-mini-yash`	SSH access	Dashboard-managed (token)
`mac-mini-llm`	Open WebUI	Locally-managed (config file)

Keeping them separate means I can tear down the LLM tunnel without losing SSH access.

#Locking it down with Cloudflare Access

Without any auth, anyone with the URL could use your LLM. Not ideal.

Cloudflare Access fixes this in about two minutes. I added an Application in the Cloudflare Zero Trust dashboard, pointed it at the tunnel's hostname, and set up an email-based allow policy. Now visiting the URL prompts for an email code before you can reach Open WebUI.

It's the right default for anything self-hosted. Even if the model is small and the stakes are low, there's no reason to leave it open.

#Is it actually useful?

Honestly — llama3.2:3b on 8GB RAM is more proof of concept than daily driver. It's responsive enough for casual queries, but it's no Claude or GPT-4. The point was never to replace those. It was to understand the stack: what Ollama does, how inference works locally, how tunnelling connects it all.

And now I have a setup I can swap models on. Bigger Mini, bigger model, same architecture.

#How to set this up yourself

If you've got a Mac Mini (or any always-on machine) and a Cloudflare account with a domain, the whole thing takes under an hour:

Install Ollama — download from ollama.com, pull a model that fits your RAM (ollama pull llama3.2:3b for 8GB machines). Test with ollama run llama3.2:3b "hello".

Run Open WebUI — one Docker command:

docker run -d --name open-webui --restart always \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

Create a Cloudflare Tunnel — brew install cloudflared, then cloudflared login and cloudflared tunnel create mac-mini-llm. Point it at localhost:3000 in your ~/.cloudflared/config.yml.
Add DNS — cloudflared tunnel route dns mac-mini-llm your-subdomain.yourdomain.com.
Make it persist — Ollama launches at login, Docker's --restart always handles Open WebUI, and cloudflared service install creates a LaunchAgent for the tunnel.
Lock it down — in the Cloudflare Zero Trust dashboard, add an Access Application for your subdomain with an email-based allow policy. Takes two minutes, and now only whitelisted emails can reach your LLM.

That's the whole stack. No Kubernetes. No cloud VMs. No port forwarding. Just a Mac Mini under your desk, serving an LLM to your phone.

#One more thing: how this post was written

This blog post was drafted by Claude Code during the same session where we set everything up. My site has a hook — a persistent instruction in its config — that tells Claude Code to suggest a blog post whenever something interesting is learnt during a session. Set something up, write about it, publish. All in one flow.

If you're curious about that workflow, I wrote about it in How This Site Was Built With Claude Code.