KanzenAI
All reviews
Tutorials

How to Install Ollama and Run Llama 3 Locally on Mac, Windows, and Linux: Step-by-Step Guide

Step-by-step instructions to install Ollama and run Llama 3 on your machine. Download, configure, and run open models locally in under 10 minutes.

8 min read·Published May 25, 2026
Photo · Pexels · DΛVΞ GΛRCIΛ
TL;DR

You'll download Ollama, install it on your Mac, Windows, or Linux machine, and run a local Llama 3 model using the command line. The entire process takes about 10 minutes and costs nothing.

Running large language models on your own hardware means your prompts stay private, you avoid API rate limits, and you work offline. This guide walks you through installing Ollama—a free, open-source tool that manages local models—and running Meta's Llama 3 model on your Mac, Windows, or Linux machine. You'll use the command line to download the model and send your first prompt. By the end, you'll have a working local AI setup you can query anytime.

What you'll need

Time + cost
Roughly 10 minutes total. Ollama is free; running models on your own hardware is always unlimited. No subscription required.

Step 1 — Download Ollama for your operating system

Visit https://www.ollama.com in your browser. On the homepage, you'll see a 'Download' button. Click it—Ollama detects your OS automatically and serves the correct installer. For Mac, you'll get a .dmg file. For Windows, you'll get an .exe installer. For Linux, the site provides a one-line install script you'll paste into your terminal. Save the file to your Downloads folder (or note the install command if you're on Linux).

Step 2 — Install Ollama

On Mac: Open the .dmg file from your Downloads folder. Drag the Ollama icon into your Applications folder. Eject the disk image. Open Terminal (press Cmd+Space, type 'Terminal', hit Enter). Type 'ollama' and press Enter. If you see a usage message listing commands like 'run', 'pull', and 'list', the installation succeeded.

On Windows: Double-click the .exe installer. Click 'Next' through the prompts and accept the default install location (usually C:\Program Files\Ollama). Click 'Finish'. Open Command Prompt (press Win+R, type 'cmd', hit Enter). Type 'ollama' and press Enter. You should see a help message with available commands.

On Linux: Open your terminal. Paste the install command shown on the Ollama download page (it looks like 'curl -fsSL https://ollama.com/install.sh | sh'). Press Enter and type your sudo password when prompted. Once the script finishes, type 'ollama' and press Enter. You'll see the same usage message as on Mac and Windows.

Step 3 — Download the Llama 3 model

In your terminal or command prompt, type 'ollama pull llama3' and press Enter. Ollama connects to its model library and begins downloading the Llama 3 8B model weights. You'll see a progress bar showing the download. On a typical broadband connection, this takes 2-5 minutes. Once the download completes, you'll see a confirmation message. The model is now cached locally in Ollama's model directory (usually ~/.ollama/models on Mac/Linux or %USERPROFILE%\.ollama\models on Windows).

Step 4 — Run your first prompt

Type 'ollama run llama3' and press Enter. Ollama loads the model into memory—this takes a few seconds on the first run. You'll see a prompt that says 'Send a message (/? for help)'. Type any question or instruction, for example: 'Explain how photosynthesis works in two sentences.' Press Enter. The model generates a response in your terminal window. You're now running a 8-billion-parameter language model entirely on your machine. Type '/bye' to exit the interactive session.

Pro tip
To speed up load times on subsequent runs, keep the Ollama service running in the background. On Mac/Linux, run 'ollama serve' in one terminal tab and use 'ollama run llama3' in another. On Windows, the installer sets up Ollama as a background service by default—just run 'ollama run llama3' directly.

Step 5 — List installed models

Type 'ollama list' in your terminal and press Enter. You'll see a table showing every model you've downloaded, along with its size and the date you pulled it. This is useful if you install multiple models later (like codellama or mistral) and want to confirm what's available locally.

Step 6 — Use the API for programmatic access

Ollama runs a local HTTP server on port 11434 by default. You can send POST requests to http://localhost:11434/api/generate to integrate Llama 3 into scripts or apps. Open a second terminal window (keep 'ollama serve' running in the first if needed). Type the following curl command and press Enter:

curl http://localhost:11434/api/generate -d '{"model": "llama3", "prompt": "Why is the sky blue?"}'

You'll see a stream of JSON responses, each containing a fragment of the generated text. This endpoint is how desktop apps, browser extensions, and automation scripts talk to your local Ollama instance. Full API documentation is at https://github.com/ollama/ollama/blob/main/docs/api.md.

If something breaks

What to do next

Now that you have Llama 3 running locally, explore Ollama's model library at https://ollama.com/library to try specialized models like codellama for coding tasks or mistral for faster responses. You can also integrate Ollama with desktop apps—40,000+ community integrations exist, including plugins for VS Code, Obsidian, and Raycast. If you want cloud models for heavier workloads, Ollama offers a Pro plan ($20/mo) that lets you run larger cloud models alongside your local setup, though local usage remains unlimited and free.

★★★★★ 5.0/5

Ollama

Free (local usage unlimited); Pro $20/mo or $200/yr for cloud models; Max $100/mo for higher cloud concurrency
Download Ollama
Pros
  • +Unlimited local model usage at no cost—run Llama 3, Mistral, CodeLlama, and 100+ models on your hardware
  • +Simple CLI and API—download a model with one command, integrate with scripts or apps via HTTP
  • +Cross-platform—Mac (Apple Silicon + Intel), Windows, Linux all supported
  • +Privacy by default—prompts never leave your machine unless you opt into cloud models
  • +40,000+ community integrations for VS Code, Obsidian, Raycast, and more
Cons
  • Requires local compute—8 GB RAM minimum, larger models need 16 GB+ and benefit from GPU acceleration
  • Download sizes—models like Llama 3 70B exceed 40 GB, which can be slow on slower connections
  • CLI-first—no built-in GUI (though desktop apps exist in the community ecosystem)
FREE · NO SPAM

Get the 2026 AI Tool Stack — free

A one-page pricing breakdown of every tutorials tool plus every other category. Updated monthly. 1 honest email per week — that's it.

1 honest email per week. Unsubscribe any time.
Keep reading
More tutorials reviews and tools we've tested.
All reviews
Affiliate disclosure: Some links above are affiliate links. If you sign up through them, we earn a commission at no extra cost to you. We only recommend tools we'd use ourselves.