Apr 11, 2026 · 5 min read

Last updated on Jun 12, 2026

Where Does Your Code Go? Data Privacy for AI Coding Tools

Every time you press Tab in Cursor or ask Claude Code to fix a bug, your code travels somewhere. Here’s exactly where it goes for each tool.

The data flow

Your code → Internet → Provider's servers → Model inference → Response → Your IDE
                              ↓
                    Logged? Stored? Used for training?

What each tool sends to the cloud

Not all tools send the same data. Understanding what leaves your machine is the first step to managing risk.

Tool	What’s sent	Context window
Cursor	Current file + open tabs + repo structure	Up to 100K tokens
GitHub Copilot	Current file + neighboring tabs + imports	~8K tokens
Continue.dev (cloud)	Current file + selected context	Configurable
Claude Code	Files you reference + conversation history	Up to 200K tokens
Codex CLI	Files in working directory + git diff	Up to 200K tokens
Aider (cloud model)	Git-tracked files you add to chat	Configurable
Aider (local model)	Same — but stays on your machine	Configurable

Key insight: tools with larger context windows send more of your code per request. Claude Code and Codex CLI can send entire repository structures in a single prompt.

What’s NOT sent (usually)

.env files (most tools exclude these by default)
.gitignore’d files (Cursor, Copilot respect this)
Binary files
Files you haven’t opened or referenced

But verify this for your specific tool — defaults can change between versions.

Provider-by-provider breakdown

Anthropic (Claude Code, Claude API)

Where: US servers (AWS)
Retention: 30 days (API), longer for consumer
Training: ❌ Not on API data. ⚠️ Consumer data may be used
DPA: Available for Team/Enterprise plans

OpenAI (Codex CLI, ChatGPT, API)

Where: US servers (Azure)
Retention: 30 days (API), longer for consumer
Training: ❌ Not on API data (since March 2023). ⚠️ ChatGPT data may be used unless opted out
DPA: Available for business plans

Google (Gemini CLI, Vertex AI)

Where: Configurable (US, EU, Asia)
Retention: Configurable
Training: ❌ Not on Vertex AI data. ⚠️ Free Gemini may be used
DPA: Available for Cloud customers

Mistral (Vibe CLI, La Plateforme)

Where: EU servers (France)
Retention: Per DPA terms
Training: ❌ Not on API data
DPA: Available, EU-native

Self-hosted (Ollama, vLLM)

Where: Your machine/server
Retention: You control it
Training: ❌ Impossible — model runs locally
DPA: Not needed

Data retention by provider

Understanding data retention policies is critical for compliance. Here’s what each provider keeps and for how long:

Provider	API data retention	Logs retained	Can you delete?
Anthropic	30 days	30 days	Yes (API request)
OpenAI	30 days	30 days	Yes (API request)
Google (Vertex)	0 days (configurable)	Configurable	Yes
Mistral	Per contract	Per DPA	Yes
DeepSeek	Unclear	Unclear	Unclear
Local/Self-hosted	You decide	You decide	Instant

⚠️ “30 days retention” means your code sits on their servers for a month. For most companies this is acceptable. For regulated industries, it may not be.

How to audit data flows

You can’t protect what you can’t see. Here’s how to audit what your AI tools are sending:

1. Network monitoring

Use a proxy to inspect outbound requests:

# mitmproxy to see all HTTPS traffic from your IDE
mitmproxy --mode regular --listen-port 8080

# Set your IDE/tool to use the proxy
export HTTPS_PROXY=http://localhost:8080

2. Tool-specific logging

Most tools have debug modes that log what they send:

# Aider: see exactly what's sent to the model
aider --verbose

# Continue.dev: check ~/.continue/logs/
cat ~/.continue/logs/core.log

# Claude Code: use --verbose flag
claude --verbose

3. Git pre-commit hooks

Prevent sensitive data from being in files that AI tools can access:

#!/bin/bash
# .git/hooks/pre-commit - check for secrets before commit
if grep -rn "API_KEY\|SECRET\|PASSWORD" --include="*.py" --include="*.ts" .; then
  echo "WARNING: Possible secrets detected in tracked files"
  echo "These files may be sent to AI coding tools"
  exit 1
fi

4. .cursorignore / .aiignore files

Most tools support ignore files to exclude sensitive directories:

# .cursorignore / .continueignore
secrets/
.env*
**/credentials*
internal-docs/

The risk matrix

Scenario	Risk level	Why
Personal project with Cursor	🟢 Low	No sensitive data
Startup using Claude API	🟡 Medium	Need DPA, review terms
Enterprise with customer PII in code	🔴 High	Need DPA + audit + possibly EU hosting
Healthcare/finance codebase	🔴 High	Regulatory requirements beyond GDPR
Using free ChatGPT for work code	🔴 High	No DPA, data may be used for training

Local alternatives

For maximum privacy, run everything locally. The quality gap has narrowed significantly. See our guide on best AI coding agents for privacy for detailed comparisons.

Cloud tool	Local alternative	Quality vs cloud
GitHub Copilot	Ollama + Codestral 22B + Continue.dev	85-90%
Claude Code	Aider + Qwen 3.5 27B (local)	75-80%
Cursor	Continue.dev + Devstral Small 24B	80-85%
ChatGPT	Ollama + Qwen 3.5 72B	85%

The tradeoff is hardware cost. You need at minimum a 16GB GPU (RTX 4070+) for usable local autocomplete, or 24GB (RTX 4090) for a good chat model. See our self-hosted AI for GDPR guide for compliance-focused setups.

What to do

For personal projects: Use whatever you want. The risk is minimal.

For company code:

Use API access (not consumer subscriptions)
Get a DPA from your provider
Consider Mistral for EU data residency
Or self-host for maximum control
Audit what’s being sent with network monitoring or tool logs
Use .cursorignore / .continueignore to exclude sensitive directories

For regulated industries: Self-host with Ollama + Devstral Small or Qwen 3.5. No data leaves your network. Document your setup for auditors.

Your personal data is already out there too. Even if you lock down your code’s data flows, your name, email, and address are likely on data broker sites — making you a target for social engineering. Incogni automates removal requests from hundreds of brokers.