Nobody asked “can we actually ship this?” until the lawyers showed up.
For two years, developers have been tab-completing their way through entire features with AI coding tools. Startups brag about AI writing 60% of their codebase. Then a due diligence call happens, an acquirer’s legal team asks “where did this code come from?”, and suddenly everyone needs answers.
This guide covers the three real risks of shipping AI-generated code in production — and exactly what to do about each one.
The three risks
Every line of AI-generated code in your production system carries three distinct legal risks:
- Copyright gap — your AI-generated code may not be copyrightable, meaning anyone can copy it
- License contamination — the AI may have reproduced GPL-licensed code, infecting your proprietary codebase
- Provenance uncertainty — you can’t prove where the code came from, which kills acquisitions and funding rounds
Let’s break each one down with what you actually need to do.
Risk 1: Your code might not be copyrightable
The US Copyright Office has been consistent: works generated by AI without meaningful human creative input don’t qualify for copyright protection. That means if a competitor copies your AI-generated function verbatim, you may have no legal recourse.
Why this matters practically:
- Your proprietary business logic might not be protectable
- Trade secret claims become your fallback — but only if you treat the code as secret
- In litigation, opposing counsel will ask which parts were AI-generated
What to do:
- Always modify AI output. Don’t ship raw suggestions. Refactor, rename, restructure. The more human creative input you add, the stronger your copyright claim.
- Document your contributions. Git commits that show iterative human refinement help establish authorship.
- Treat critical business logic as trade secrets. Use access controls, NDAs, and internal classification — don’t rely on copyright alone.
For the full copyright analysis, see our deep dive on who owns AI-generated code.
Risk 2: Open-source license contamination
This is the risk that keeps legal teams up at night. AI coding assistants were trained on billions of lines of open-source code, including GPL, AGPL, and LGPL repositories. When the model suggests code that closely matches a GPL-licensed snippet, you may have inadvertently introduced copyleft obligations into your proprietary codebase.
The GPL infection scenario:
- Copilot suggests a sorting utility that’s nearly identical to a GPL-licensed implementation
- You ship it in your proprietary SaaS product
- A compliance audit (or a lawsuit) reveals the match
- Under GPL, your entire linked codebase may need to be open-sourced — or you face infringement claims
What’s changed: After the GitHub Copilot settlement, a strict duplication filter is now available. Copilot can block suggestions that match known GPL/AGPL/LGPL code. Turn it on.
What to do:
- Enable license filtering in your AI coding tool’s settings (Copilot, Codeium, etc.)
- Run license scans on every PR — treat AI suggestions like third-party dependencies
- Flag any AI-generated code that touches core product logic for manual review
For more on open-source compliance with AI tools, see our legal compliance guide.
Risk 3: Provenance and due diligence
When investors or acquirers evaluate your company, they ask: “Can you prove you own your code?”
If the answer is “an AI wrote most of it and we didn’t track which parts,” that’s a material risk finding. It can reduce your valuation, delay a deal, or kill it entirely.
What acquirers and investors ask:
- What percentage of the codebase is AI-generated?
- Which AI tools were used, and what are their terms of service?
- Do you have license scan results for AI-generated code?
- Are there any pending or potential IP claims related to AI training data?
What to do:
- Tag AI-generated code at commit time. Use commit message conventions or git trailers:
git commit -m "Add rate limiter middleware
AI-assisted: yes
Tool: GitHub Copilot
Human-review: refactored logic, added edge cases"
- Maintain an AI usage log. Track which tools, which features, and rough percentage of AI contribution per module.
- Run provenance scans before any fundraise or acquisition process.
How to ship AI code safely: the checklist
Use this before every release that includes AI-generated code:
- AI coding tool’s license filter is enabled (block GPL/AGPL/LGPL matches)
- Duplication filter is turned on (blocks verbatim reproductions)
- All AI-generated code has been human-reviewed and modified
- License scan has been run on the full codebase
- AI-generated commits are tagged with provenance metadata
- Critical business logic has been reviewed for copyright strength
- Data privacy implications have been assessed (no secrets in prompts)
Tools for scanning and compliance
OSS Review Toolkit (ORT)
The industry standard for license compliance scanning. Run it in CI:
# Install ORT
git clone https://github.com/oss-review-toolkit/ort.git
cd ort && ./gradlew installDist
# Analyze your project
ort analyze -i /path/to/project -o /path/to/output
# Scan for license violations
ort scan -i /path/to/output/analyzer-result.yml -o /path/to/output
GitHub code scanning
If you’re on GitHub, enable code scanning with the default CodeQL setup. It won’t catch license issues directly, but it flags known vulnerable patterns that often originate from copied code.
copilot-scanner
A community tool that checks your codebase against known open-source snippets:
npm install -g copilot-scanner
# Scan your source directory
copilot-scanner scan ./src --threshold 0.9 --output report.json
The --threshold flag sets the similarity threshold — 0.9 means flag anything that’s 90%+ similar to known open-source code.
Putting it in CI
# .github/workflows/license-check.yml
name: License Compliance
on: [pull_request]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run license scan
run: |
npm install -g copilot-scanner
copilot-scanner scan ./src --threshold 0.9 --fail-on-match
When NOT to use AI-generated code
Some contexts demand you avoid AI code generation entirely:
- Government contracts — many federal contracts now prohibit or restrict AI-assisted development. Check your contract terms before using any AI tool.
- Safety-critical systems — medical devices, aviation software, autonomous vehicles. These require full audit trails and deterministic code provenance. AI-generated code can’t meet that bar today.
- AGPL/copyleft projects you maintain — ironic, but using AI to contribute to copyleft projects creates circular licensing questions that nobody has resolved yet.
- EU-regulated products — the EU AI Act is introducing transparency requirements for AI-generated artifacts. If your product ships into the EU, track your AI usage now before the requirements are finalized.
Contract templates: sample clauses
If you’re hiring contractors, working with agencies, or negotiating employment agreements, add AI-assisted work product clauses. Here are starting points (have your lawyer adapt them):
For employment/contractor agreements:
“Work Product” includes any code, documentation, or materials created with the assistance of AI code generation tools. Contractor shall disclose all AI tools used and maintain provenance records for AI-assisted output. Contractor represents that all AI-generated code has been reviewed for license compliance and modified to reflect original creative contribution.
For SaaS/product contracts:
Vendor represents that any AI-generated components of the Software have been scanned for open-source license compliance and do not incorporate copyleft-licensed code without prior written disclosure to Customer.
For acquisition/investment due diligence:
Company shall provide a complete inventory of AI code generation tools used in development, the approximate percentage of AI-assisted code by module, and results of the most recent license compliance scan.