Where Does Your Code Go? Data Privacy for AI Coding Tools
Every time you press Tab in Cursor or ask Claude Code to fix a bug, your code travels somewhere. Hereโs exactly where it goes for each tool.
The data flow
Your code โ Internet โ Provider's servers โ Model inference โ Response โ Your IDE
โ
Logged? Stored? Used for training?
What each tool sends to the cloud
Not all tools send the same data. Understanding what leaves your machine is the first step to managing risk.
| Tool | Whatโs sent | Context window |
|---|---|---|
| Cursor | Current file + open tabs + repo structure | Up to 100K tokens |
| GitHub Copilot | Current file + neighboring tabs + imports | ~8K tokens |
| Continue.dev (cloud) | Current file + selected context | Configurable |
| Claude Code | Files you reference + conversation history | Up to 200K tokens |
| Codex CLI | Files in working directory + git diff | Up to 200K tokens |
| Aider (cloud model) | Git-tracked files you add to chat | Configurable |
| Aider (local model) | Same โ but stays on your machine | Configurable |
Key insight: tools with larger context windows send more of your code per request. Claude Code and Codex CLI can send entire repository structures in a single prompt.
Whatโs NOT sent (usually)
.envfiles (most tools exclude these by default).gitignoreโd files (Cursor, Copilot respect this)- Binary files
- Files you havenโt opened or referenced
But verify this for your specific tool โ defaults can change between versions.
Provider-by-provider breakdown
Anthropic (Claude Code, Claude API)
- Where: US servers (AWS)
- Retention: 30 days (API), longer for consumer
- Training: โ Not on API data. โ ๏ธ Consumer data may be used
- DPA: Available for Team/Enterprise plans
OpenAI (Codex CLI, ChatGPT, API)
- Where: US servers (Azure)
- Retention: 30 days (API), longer for consumer
- Training: โ Not on API data (since March 2023). โ ๏ธ ChatGPT data may be used unless opted out
- DPA: Available for business plans
Google (Gemini CLI, Vertex AI)
- Where: Configurable (US, EU, Asia)
- Retention: Configurable
- Training: โ Not on Vertex AI data. โ ๏ธ Free Gemini may be used
- DPA: Available for Cloud customers
Mistral (Vibe CLI, La Plateforme)
- Where: EU servers (France)
- Retention: Per DPA terms
- Training: โ Not on API data
- DPA: Available, EU-native
Self-hosted (Ollama, vLLM)
- Where: Your machine/server
- Retention: You control it
- Training: โ Impossible โ model runs locally
- DPA: Not needed
Data retention by provider
Understanding data retention policies is critical for compliance. Hereโs what each provider keeps and for how long:
| Provider | API data retention | Logs retained | Can you delete? |
|---|---|---|---|
| Anthropic | 30 days | 30 days | Yes (API request) |
| OpenAI | 30 days | 30 days | Yes (API request) |
| Google (Vertex) | 0 days (configurable) | Configurable | Yes |
| Mistral | Per contract | Per DPA | Yes |
| DeepSeek | Unclear | Unclear | Unclear |
| Local/Self-hosted | You decide | You decide | Instant |
โ ๏ธ โ30 days retentionโ means your code sits on their servers for a month. For most companies this is acceptable. For regulated industries, it may not be.
How to audit data flows
You canโt protect what you canโt see. Hereโs how to audit what your AI tools are sending:
1. Network monitoring
Use a proxy to inspect outbound requests:
# mitmproxy to see all HTTPS traffic from your IDE
mitmproxy --mode regular --listen-port 8080
# Set your IDE/tool to use the proxy
export HTTPS_PROXY=http://localhost:8080
2. Tool-specific logging
Most tools have debug modes that log what they send:
# Aider: see exactly what's sent to the model
aider --verbose
# Continue.dev: check ~/.continue/logs/
cat ~/.continue/logs/core.log
# Claude Code: use --verbose flag
claude --verbose
3. Git pre-commit hooks
Prevent sensitive data from being in files that AI tools can access:
#!/bin/bash
# .git/hooks/pre-commit - check for secrets before commit
if grep -rn "API_KEY\|SECRET\|PASSWORD" --include="*.py" --include="*.ts" .; then
echo "WARNING: Possible secrets detected in tracked files"
echo "These files may be sent to AI coding tools"
exit 1
fi
4. .cursorignore / .aiignore files
Most tools support ignore files to exclude sensitive directories:
# .cursorignore / .continueignore
secrets/
.env*
**/credentials*
internal-docs/
The risk matrix
| Scenario | Risk level | Why |
|---|---|---|
| Personal project with Cursor | ๐ข Low | No sensitive data |
| Startup using Claude API | ๐ก Medium | Need DPA, review terms |
| Enterprise with customer PII in code | ๐ด High | Need DPA + audit + possibly EU hosting |
| Healthcare/finance codebase | ๐ด High | Regulatory requirements beyond GDPR |
| Using free ChatGPT for work code | ๐ด High | No DPA, data may be used for training |
Local alternatives
For maximum privacy, run everything locally. The quality gap has narrowed significantly. See our guide on best AI coding agents for privacy for detailed comparisons.
| Cloud tool | Local alternative | Quality vs cloud |
|---|---|---|
| GitHub Copilot | Ollama + Codestral 22B + Continue.dev | 85-90% |
| Claude Code | Aider + Qwen 3.5 27B (local) | 75-80% |
| Cursor | Continue.dev + Devstral Small 24B | 80-85% |
| ChatGPT | Ollama + Qwen 3.5 72B | 85% |
The tradeoff is hardware cost. You need at minimum a 16GB GPU (RTX 4070+) for usable local autocomplete, or 24GB (RTX 4090) for a good chat model. See our self-hosted AI for GDPR guide for compliance-focused setups.
What to do
For personal projects: Use whatever you want. The risk is minimal.
For company code:
- Use API access (not consumer subscriptions)
- Get a DPA from your provider
- Consider Mistral for EU data residency
- Or self-host for maximum control
- Audit whatโs being sent with network monitoring or tool logs
- Use
.cursorignore/.continueignoreto exclude sensitive directories
For regulated industries: Self-host with Ollama + Devstral Small or Qwen 3.5. No data leaves your network. Document your setup for auditors.
Related: AI Data Retention Policies ยท Best AI Coding Agents for Privacy ยท Self-Hosted AI for GDPR ยท Ollama Complete Guide 2026 ยท AI and GDPR for Developers ยท Best VPNs for Developers