Run a model locally

Run an AI model on your own machine.

You don’t always need a cloud account to use AI. With Ollama or LM Studio you can download an open-weight model and run it entirely on your own PC — private, offline, and free per request. It won’t match the biggest cloud models, but it’s real, it’s yours, and it’s a great way to actually understand how these things work. Here is the honest, hands-on version.

Learn how AI works Compare with cloud compute

Why run a model locally

Three real reasons — and they all matter more from Nigeria, where data and uptime cost you.

Privacy by default

The model runs on your machine, so your prompts and files never leave it. For client work under NDA, sensitive notes, or anything you don't want sent to a company's servers, local is the safe default.

Works offline

Once a model is downloaded it runs with no internet at all. No data plan burning while you experiment, and no dead tool when NEPA takes the light and your router with it.

No per-token cost

After the one-time download, you pay nothing per request. You can run a small model in a loop all day to learn, test prompts, or build a feature, without watching a Naira meter tick.

Two ways to do it

Pick the one that fits how you like to work. You can use both.

Ollama — the command-line way

A single tool that downloads and runs open-weight models from your terminal. It also exposes a local API on http://localhost:11434, so your own code and editors can talk to the model. This is the simplest path and what most builders start with.

LM Studio — the app way

A desktop app (Windows, macOS, Linux) with a chat window and a model browser you click through — no terminal needed. It can also run a local server that mimics the OpenAI API, so it drops into tools that expect that. Good if you prefer buttons over commands.

What hardware you need

Small models run on almost anything. Big ones need a real GPU — be honest about which you want.

Small models — a normal laptop

8–16GB RAM · any recent CPU · a few GB of free disk per model

Models in the 1B–8B range (like Llama 3.2 1B/3B or a 7B–8B model) run on an everyday laptop with no graphics card. They're slower than the cloud and less sharp, but they genuinely work for learning, drafting, and simple tasks.

Bigger models — you need a GPU

NVIDIA with lots of VRAM, or Apple Silicon with 32GB+ unified memory

Larger models (30B and up) need real GPU memory to run at a usable speed. Without it they either refuse to load or crawl. If you don't own that hardware, renting cloud GPU time is almost always the smarter spend than buying.

Not sure your PC is up to it? Start with the smallest model — if it runs, you’re fine. For the full breakdown of RAM, VRAM, and when a GPU is actually worth buying, read the hardware guide.

See the hardware guide

Hands-on

Run your first model with Ollama

The fastest path. These are the real commands — run them in your terminal after installing Ollama.

1. Install Ollama

On macOS or Linux, the official installer script sets it up. On Windows, download the installer from ollama.com instead and run it.

curl -fsSL https://ollama.com/install.sh | sh

2. Check it’s installed

Confirm the command is available and see the version.

ollama --version

3. Pull and run a small model

This downloads a small model (about 2GB) the first time, then drops you into a chat prompt. Type a question and press Enter. Use /bye to exit when you’re done.

ollama run llama3.2

4. Try an even smaller one on a weak PC

If llama3.2 is slow, pull a 1B model — it’s lighter and faster on a laptop with no GPU.

ollama run llama3.2:1b

5. See and manage your models

List what you’ve downloaded, and remove ones you don’t need to free up disk space.

ollama list
ollama rm llama3.2:1b

6. Use it from your own code (optional)

Ollama runs a local API while it’s open. Your apps and editors can call it at http://localhost:11434 — no internet, no API key.

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain vibe coding in one sentence."
}'

Prefer clicking? Use LM Studio

No terminal required — a desktop app with a chat window.

The short version: download LM Studio from lmstudio.ai for your OS, open it, search the built-in model browser for a small model (anything labelled around 1B–3B is a safe start on a normal laptop), click download, then load it and chat. When you’re ready to connect it to your own code, LM Studio can start a local server that speaks the same API shape as the big cloud providers, so existing tools just work against it.

The honest limits versus the cloud

Local is powerful, but it is not magic. Know the trade-offs before you commit.

Smaller local models are less capable than the big cloud ones — expect weaker reasoning, more mistakes, and shorter useful context.
Speed depends on your machine. On a CPU-only laptop, replies come word-by-word; the cloud feels instant by comparison.
The first download is large (often several GB) and uses your data — grab models once on good internet, then work offline.
You manage updates and disk yourself. Models pile up fast; delete the ones you've stopped using.

When the cloud still wins: for your hardest reasoning, biggest models, or fastest responses, a hosted model usually beats what your laptop can run. The smart move is to use local for private, offline, and cheap-repeat work, and reach for the cloud when the task truly needs the muscle.

See cloud compute options

Bottom line: install Ollama, run ollama run llama3.2, and you have a real AI model living on your own machine in minutes — private, offline, and free per request. Start small, learn how it behaves, and lean on the cloud only when a task outgrows what your PC can do.

Keep learning how AI works