šŸ“ Tutorials
Ā· 8 min read

How Data Brokers Exploit Your GitHub, npm, and Stack Overflow Profiles


Here’s something that might ruin your morning coffee: right now, there are companies making money by selling your personal information—scraped directly from your GitHub profile, your npm packages, and your Stack Overflow answers. And they’re not breaking any laws doing it.

Data brokers are in the business of aggregation. They crawl public sources, combine fragments of information into comprehensive profiles, and sell those profiles to recruiters, marketers, background check companies, and sometimes much shadier buyers. For developers, the problem is acute because our professional lives are inherently public. You can’t be a successful open-source contributor while being invisible.

Let me walk you through exactly how this works, what data they’re collecting, and what you can actually do about it.

How Data Brokers Find Developer Profiles

Data brokers don’t hack anything. They don’t need to. They simply harvest what’s publicly available, at scale, using automated scrapers. Here’s where they look:

GitHub — The Gold Mine

Your GitHub profile is a treasure trove for data aggregators:

  • Profile information: Name, email, bio, location, company, website URL
  • Commit history: Email addresses in git commits (even old ones you forgot about)
  • Contribution graph: Activity patterns that reveal your schedule and timezone
  • Repository metadata: Technologies you use, projects you’re involved in
  • Organization memberships: Your employer, past and present
  • Social connections: Who you follow, who follows you, collaborators

The big one people miss: git commit emails. Even if you’ve set your GitHub profile email to private, every commit you’ve ever made contains an email address in the git log. Unless you’ve used GitHub’s no-reply email address from day one, your real email is embedded in repository history forever.

npm and PyPI — Package Registry Exposure

Publishing a package means publishing your identity:

  • package.json contains author name and email by default
  • PyPI setup.py / pyproject.toml includes author and author_email fields
  • Registry profiles show all packages you maintain
  • Download statistics indicate your influence and reach
  • Changelog/commit links tie back to your GitHub identity

Even if you’ve since removed your email from package.json, older versions on npm still contain it. Package registries maintain historical versions indefinitely.

Stack Overflow — Professional Profile Building

Stack Overflow profiles provide brokers with:

  • Your real name (most developers use their real names for reputation building)
  • Technical expertise (inferred from tags you answer in)
  • Experience level (inferred from reputation score and answer quality)
  • Location and timezone (from profile settings)
  • Links to GitHub, personal sites, and other profiles
  • Employment history (many devs list current employer)

Other Developer-Specific Sources

  • Conference talk listings — Speaker bios, employer, photo, social links
  • Domain WHOIS records — Home address if you didn’t use privacy protection
  • LinkedIn — The most-scraped professional platform, period
  • Blog posts — Real name, opinions, employer mentions
  • Open-source contributor lists — CONTRIBUTORS.md, AUTHORS files
  • Mailing list archives — Email addresses in public list archives

What Data Brokers Actually Collect

Let me paint a picture of what a typical ā€œdeveloper profileā€ looks like in a broker’s database:

Name: John Developer
Emails: john@personal.com, john.dev@company.com, jdev@oldstartup.io
Phone: (555) 123-4567
Address: 123 Code Street, San Francisco, CA 94102
Employer: TechCorp (current), StartupX (2022-2024), BigCo (2019-2022)
Skills: TypeScript, React, Node.js, AWS, Python, Kubernetes
GitHub: github.com/johndev (2,400 followers, 89 repos)
Influence Score: High (top 5% on Stack Overflow)
Salary Estimate: $180,000-$220,000
Education: CS degree, State University
Age: 32
Related people: [family members]
Property records: [home ownership data]

That profile was built entirely from public data sources combined with property records and phone databases. No hacking required.

Who Buys Developer Data and What They Pay

The buyers of developer data include:

  1. Recruiting firms — Pay $0.10-$2.00 per profile for bulk developer data. They use it for cold outreach campaigns.

  2. Marketing companies — Target developers with tool and service ads. Your Stack Overflow activity tells them exactly which tools you use.

  3. Background check companies — Compile reports for employers. Your entire git history becomes part of pre-employment screening.

  4. Scammers and social engineers — Use detailed profiles to craft targeted phishing attacks. Knowing your tech stack makes phishing emails much more convincing.

  5. Data enrichment companies — Buy from one broker, add their own data, sell to another broker at a markup. Your data gets recycled through dozens of companies.

Developer profiles command premium prices because they represent high-income individuals with specific, targetable interests. A generic consumer profile might sell for pennies; a senior developer profile with verified contact info can go for $5-$20 each in bulk.

How to Check If You’re Exposed

Before panicking, let’s assess the damage. Here’s how to check what’s already out there:

Step 1: Google Yourself

Start simple. Search for:

  • ā€œYour Nameā€ + developer
  • ā€œYour Nameā€ + GitHub
  • Your email addresses (in quotes)
  • Your phone number

Look beyond page 1. Broker sites often rank on pages 2-5.

Step 2: Check Common Broker Sites

Visit these sites and search for yourself:

  • Spokeo
  • BeenVerified
  • Whitepages
  • PeopleFinder
  • Intelius
  • ZoomInfo (professional/developer focused)
  • Clearbit (tech company focused)

Step 3: Check Your Git History

Run this in any public repository you’ve contributed to:

git log --format='%ae' | sort -u

Every unique email there is potentially in a broker’s database.

Step 4: Check npm/PyPI

Look at your published packages. Check the older versions:

npm info your-package-name | grep -i "email\|author"

Step 5: WHOIS Lookup

If you own domains, check if WHOIS privacy is active:

whois yourdomain.com | grep -i "registrant"

How to Remove Your Data

You have two paths: manual and automated.

Each data broker has an opt-out process. Some are straightforward web forms. Others require:

  • Mailed physical letters
  • Faxed documents
  • Notarized identity verification
  • Phone calls during business hours
  • Repeated submissions when they ā€œloseā€ your request

There are hundreds of brokers. Even if each opt-out takes just 10 minutes, you’re looking at weeks of work. And brokers re-acquire data constantly, so you’d need to repeat this every few months.

This is where data removal services earn their keep. Incogni handles the entire process automatically—contacting hundreds of brokers, submitting opt-out requests, following up on ignored requests, and fighting rejected claims. It covers the US, UK, EU, Switzerland, and Canada, which matters because developer data crosses borders constantly.

At ~$6.49/month on the annual plan, it costs less than the hourly value of doing one manual opt-out. And it runs continuously, catching re-listings and new brokers.

Preventing Future Exposure

Removing existing data is half the battle. Here’s how to minimize future exposure:

GitHub Settings

  1. Use GitHub’s no-reply email for commits: username@users.noreply.github.com
  2. Set email to private in GitHub settings
  3. Consider whether you need your real name, location, and employer in your bio
  4. Review organization memberships visibility

Package Registries

  1. Use an email alias for package author fields (see our password manager guide for tools that offer email aliases)
  2. Consider using an org/team account for package publishing
  3. Review and update author fields in existing packages

General Practices

  • Use a VPN to prevent IP-based location tracking
  • Enable WHOIS privacy on all domains
  • Use email aliases for conference registrations and mailing lists
  • Store sensitive documents in encrypted cloud storage
  • Audit your public profiles quarterly

For a comprehensive approach to locking down your online presence, check out our developer privacy checklist.

Data brokers operate legally in most jurisdictions, but regulations are tightening:

  • GDPR (EU) — Gives you the right to request deletion from any company holding your data
  • CCPA/CPRA (California) — Similar deletion rights for California residents
  • Various state laws — Colorado, Virginia, Connecticut, and others have passed privacy laws

Understanding privacy laws by region helps you know your rights. The challenge isn’t legal—it’s practical. You have the right to request deletion, but exercising that right across hundreds of brokers manually is impractical.

This is also relevant if you’re building AI tools that handle user data. Our guides on AI code and data privacy and the GDPR guide for developers cover the compliance side.

The Real Risk: Social Engineering

The scariest use of broker data isn’t spam emails—it’s targeted social engineering. With a detailed developer profile, an attacker can:

  • Craft phishing emails referencing your actual projects (ā€œI found a bug in your-package-nameā€¦ā€)
  • Impersonate colleagues they know you work with
  • Reference real conferences you spoke at
  • Target you through family members listed in the same broker databases
  • Bypass security questions using publicly available personal details

This ties directly into securing your API keys and following security checklists—because the weakest link is often not your code, but the human data available to attackers.

Frequently Asked Questions

Can I just make my GitHub profile private?

You can, but it defeats the purpose of having a GitHub profile in the professional sense. A better approach is to sanitize what’s there—use no-reply email for commits, remove sensitive details from your bio, and let a data removal service handle what’s already been scraped. The historical data in broker databases won’t disappear just because you make your profile private today.

Do data brokers comply with GDPR deletion requests?

They’re legally required to under GDPR if they hold data on EU residents. In practice, compliance varies wildly. Some process requests within days. Others ignore them, stall, or ā€œloseā€ the request. This is exactly why services like Incogni that fight rejected claims are valuable—they don’t let brokers off the hook.

I only publish under a pseudonym. Am I safe?

Safer, but not immune. If your pseudonym has ever been linked to your real identity anywhere (a conference registration, a payment processor, a domain registration), brokers can and do make that connection. They specialize in linking disparate identities into unified profiles.

How often do brokers re-acquire data after removal?

Frequently. Data broker databases are rebuilt from continuous scraping. A one-time removal might last 3-6 months before your data reappears. This is why continuous monitoring services are important—they catch and re-remove data as it resurfaces.

What about the data in old git commits? Can that be removed?

Unfortunately, rewriting git history on public repositories is impractical for most projects (it breaks everyone’s clones). The best approach is: (1) use no-reply emails going forward, (2) accept that historical commit emails are public, and (3) use a data removal service to handle the downstream broker listings that result from that exposure.

Is developer data really worth more than regular consumer data?

Yes, significantly. Developer profiles indicate high income, specific technical interests (useful for targeted marketing), and professional networks. ZoomInfo, for example, charges enterprise clients substantial sums for access to verified tech professional profiles. Your data is literally more valuable because you code for a living.

šŸ“˜