๐Ÿค– AI Tools
ยท 5 min read
Last updated on

Where Does Your Code Go? Data Privacy for AI Coding Tools


Every time you press Tab in Cursor or ask Claude Code to fix a bug, your code travels somewhere. Hereโ€™s exactly where it goes for each tool.

The data flow

Your code โ†’ Internet โ†’ Provider's servers โ†’ Model inference โ†’ Response โ†’ Your IDE
                              โ†“
                    Logged? Stored? Used for training?

What each tool sends to the cloud

Not all tools send the same data. Understanding what leaves your machine is the first step to managing risk.

ToolWhatโ€™s sentContext window
CursorCurrent file + open tabs + repo structureUp to 100K tokens
GitHub CopilotCurrent file + neighboring tabs + imports~8K tokens
Continue.dev (cloud)Current file + selected contextConfigurable
Claude CodeFiles you reference + conversation historyUp to 200K tokens
Codex CLIFiles in working directory + git diffUp to 200K tokens
Aider (cloud model)Git-tracked files you add to chatConfigurable
Aider (local model)Same โ€” but stays on your machineConfigurable

Key insight: tools with larger context windows send more of your code per request. Claude Code and Codex CLI can send entire repository structures in a single prompt.

Whatโ€™s NOT sent (usually)

  • .env files (most tools exclude these by default)
  • .gitignoreโ€™d files (Cursor, Copilot respect this)
  • Binary files
  • Files you havenโ€™t opened or referenced

But verify this for your specific tool โ€” defaults can change between versions.

Provider-by-provider breakdown

Anthropic (Claude Code, Claude API)

  • Where: US servers (AWS)
  • Retention: 30 days (API), longer for consumer
  • Training: โŒ Not on API data. โš ๏ธ Consumer data may be used
  • DPA: Available for Team/Enterprise plans

OpenAI (Codex CLI, ChatGPT, API)

  • Where: US servers (Azure)
  • Retention: 30 days (API), longer for consumer
  • Training: โŒ Not on API data (since March 2023). โš ๏ธ ChatGPT data may be used unless opted out
  • DPA: Available for business plans

Google (Gemini CLI, Vertex AI)

  • Where: Configurable (US, EU, Asia)
  • Retention: Configurable
  • Training: โŒ Not on Vertex AI data. โš ๏ธ Free Gemini may be used
  • DPA: Available for Cloud customers

Mistral (Vibe CLI, La Plateforme)

  • Where: EU servers (France)
  • Retention: Per DPA terms
  • Training: โŒ Not on API data
  • DPA: Available, EU-native

Self-hosted (Ollama, vLLM)

  • Where: Your machine/server
  • Retention: You control it
  • Training: โŒ Impossible โ€” model runs locally
  • DPA: Not needed

Data retention by provider

Understanding data retention policies is critical for compliance. Hereโ€™s what each provider keeps and for how long:

ProviderAPI data retentionLogs retainedCan you delete?
Anthropic30 days30 daysYes (API request)
OpenAI30 days30 daysYes (API request)
Google (Vertex)0 days (configurable)ConfigurableYes
MistralPer contractPer DPAYes
DeepSeekUnclearUnclearUnclear
Local/Self-hostedYou decideYou decideInstant

โš ๏ธ โ€œ30 days retentionโ€ means your code sits on their servers for a month. For most companies this is acceptable. For regulated industries, it may not be.

How to audit data flows

You canโ€™t protect what you canโ€™t see. Hereโ€™s how to audit what your AI tools are sending:

1. Network monitoring

Use a proxy to inspect outbound requests:

# mitmproxy to see all HTTPS traffic from your IDE
mitmproxy --mode regular --listen-port 8080

# Set your IDE/tool to use the proxy
export HTTPS_PROXY=http://localhost:8080

2. Tool-specific logging

Most tools have debug modes that log what they send:

# Aider: see exactly what's sent to the model
aider --verbose

# Continue.dev: check ~/.continue/logs/
cat ~/.continue/logs/core.log

# Claude Code: use --verbose flag
claude --verbose

3. Git pre-commit hooks

Prevent sensitive data from being in files that AI tools can access:

#!/bin/bash
# .git/hooks/pre-commit - check for secrets before commit
if grep -rn "API_KEY\|SECRET\|PASSWORD" --include="*.py" --include="*.ts" .; then
  echo "WARNING: Possible secrets detected in tracked files"
  echo "These files may be sent to AI coding tools"
  exit 1
fi

4. .cursorignore / .aiignore files

Most tools support ignore files to exclude sensitive directories:

# .cursorignore / .continueignore
secrets/
.env*
**/credentials*
internal-docs/

The risk matrix

ScenarioRisk levelWhy
Personal project with Cursor๐ŸŸข LowNo sensitive data
Startup using Claude API๐ŸŸก MediumNeed DPA, review terms
Enterprise with customer PII in code๐Ÿ”ด HighNeed DPA + audit + possibly EU hosting
Healthcare/finance codebase๐Ÿ”ด HighRegulatory requirements beyond GDPR
Using free ChatGPT for work code๐Ÿ”ด HighNo DPA, data may be used for training

Local alternatives

For maximum privacy, run everything locally. The quality gap has narrowed significantly. See our guide on best AI coding agents for privacy for detailed comparisons.

Cloud toolLocal alternativeQuality vs cloud
GitHub CopilotOllama + Codestral 22B + Continue.dev85-90%
Claude CodeAider + Qwen 3.5 27B (local)75-80%
CursorContinue.dev + Devstral Small 24B80-85%
ChatGPTOllama + Qwen 3.5 72B85%

The tradeoff is hardware cost. You need at minimum a 16GB GPU (RTX 4070+) for usable local autocomplete, or 24GB (RTX 4090) for a good chat model. See our self-hosted AI for GDPR guide for compliance-focused setups.

What to do

For personal projects: Use whatever you want. The risk is minimal.

For company code:

  1. Use API access (not consumer subscriptions)
  2. Get a DPA from your provider
  3. Consider Mistral for EU data residency
  4. Or self-host for maximum control
  5. Audit whatโ€™s being sent with network monitoring or tool logs
  6. Use .cursorignore / .continueignore to exclude sensitive directories

For regulated industries: Self-host with Ollama + Devstral Small or Qwen 3.5. No data leaves your network. Document your setup for auditors.

Related: AI Data Retention Policies ยท Best AI Coding Agents for Privacy ยท Self-Hosted AI for GDPR ยท Ollama Complete Guide 2026 ยท AI and GDPR for Developers ยท Best VPNs for Developers