Once you download an AI model to your machine, it runs without internet. No API calls, no cloud servers, no data leaving your device. Hereβs how to set up a fully offline AI system.
How offline AI works
AI models are just files β large files (2-200GB), but files nonetheless. Once downloaded, all computation happens locally on your CPU or GPU. The model doesnβt phone home, doesnβt need authentication, and doesnβt require any network connection.
This means you can:
- Use AI on a plane with no WiFi
- Run AI in air-gapped secure environments
- Work in areas with no internet
- Guarantee zero data leakage
Setup: download everything while online
You need internet once β to download the tools and models. After that, everything runs offline.
# Step 1: Install Ollama (requires internet)
curl -fsSL https://ollama.com/install.sh | sh
# Step 2: Download models (requires internet)
ollama pull qwen3.5:9b # General purpose (5.5GB)
ollama pull qwen2.5-coder:32b # Coding (18GB)
ollama pull codestral # Autocomplete (12GB)
# Step 3: Disconnect from internet. Everything below works offline.
# Step 4: Run any downloaded model
ollama run qwen3.5:9b
Pre-download for air-gapped systems
If your target machine has no internet at all, download on another machine and transfer:
# On machine WITH internet:
ollama pull qwen3.5:9b
# Find the model files
ls ~/.ollama/models/
# Copy the entire .ollama directory to a USB drive
cp -r ~/.ollama /media/usb/ollama-backup
# On the air-gapped machine:
cp -r /media/usb/ollama-backup ~/.ollama
ollama run qwen3.5:9b # Works without internet
Best models for offline use
Pick models based on your storage and hardware:
| Model | Download size | RAM needed | Best for |
|---|---|---|---|
| Qwen3.5-0.8B | ~0.5GB | 2GB | Minimal storage, basic tasks |
| Qwen3.5-9B | ~5.5GB | 8GB | Best quality for the size |
| Qwen 2.5 Coder 32B | ~18GB | 24GB | Offline coding assistant |
| Codestral | ~12GB | 16GB | Offline autocomplete |
| DeepSeek R1 7B | ~4GB | 6GB | Offline reasoning |
Download multiple models while you have internet. They sit on disk and donβt use resources until you run them.
Use cases
Travel. Long flights, remote locations, unreliable hotel WiFi. A laptop with Ollama and a few models is a portable AI assistant that works anywhere.
Secure environments. Government, military, healthcare, and financial institutions often have air-gapped networks. Self-hosted AI lets these organizations use AI without connecting to external servers.
Privacy. Even if you have internet, running offline guarantees your data never leaves your machine. No accidental data leaks, no third-party data retention policies.
Developing countries. Unreliable or expensive internet makes cloud AI impractical. Offline models work regardless of connectivity.
Offline AI in your IDE
Set up Continue + Ollama before going offline:
- Install Continue extension in VS Code (requires internet)
- Configure it to use Ollama (localhost)
- Download your models
- Go offline β everything continues to work
Your coding assistant works on a plane, in a bunker, or in the middle of nowhere.
Limitations
- No model updates. Youβre stuck with whatever version you downloaded.
- No web search. Models canβt look up current information.
- Storage. Good models are 5-20GB each. Plan your storage.
- Initial download. The first download requires fast internet. A 32B model at 18GB takes time on slow connections.
Related
- Best Self-Hosted AI Models in 2026
- How to Run AI Without a GPU
- How to Replace GitHub Copilot for Free
- Self-Hosted AI vs API β When to Pay and When to Run Locally
- Best Cloud GPU Providers in 2026
Related: Best VPNs for Developers