Open-Sourcing AI Code Detection: How We Built Rules to Detect Claude Code, GitHub Copilot, and Cursor

About 46% of code on GitHub is now AI-assisted (GitHub Octoverse 2024). That's nearly half.

For engineering teams, this raises an important question: which pull requests were written with AI tools, and which weren't?

We've been tracking AI adoption across development teams (here's why we chose the PR-based approach), and we built a detection system based on the explicit markers AI tools leave behind. Today we're open-sourcing those detection rules so everyone can benefit.

#Try It: Paste Your PR URL

Before we dive into how it works, test it yourself. Paste any public GitHub pull request URL:

Uses the open-source detection package in real-time. Your PR data isn't stored.

#Why Detecting AI PRs Matters

Understanding which PRs used AI tools helps teams:

Track adoption patterns - Is your team actually using the AI tools you paid for?
Understand velocity changes - Are AI tools helping or creating noise?
Review differently - AI-generated code often needs different review focus (less syntax errors, more logic validation)
Identify training opportunities - Which team members aren't using available tools?
Measure ROI - Are expensive AI subscriptions worth it?

We wrote more about why we track AI at the PR level instead of API integrations if you're curious about the approach.

#How We Built the Detection Rules

AI coding tools leave fingerprints. Some are obvious (GitHub Copilot literally commits as github-copilot[bot]), others took serious detective work.

#The Easy Ones: Bot Commits and Explicit Footers

GitHub Copilot was the easiest. When you use Copilot's workspace feature or commit suggestions, it shows up as a bot author:

 1# github-copilot.yml
 2bot_authors:
 3  - pattern: 'github-copilot[bot]'
 4    location: commit_author
 5    confidence: 100
 6    description: "GitHub Copilot bot author"

100% confidence. Zero false positives. Beautiful.

Claude Code also makes it easy. Anthropic automatically adds a footer to PRs created with Claude Code:

 1# claude-code.yml
 2commit_footers:
 3  - pattern: '\[Claude Code\]\(https://claude\.com/claude-code\)'
 4    regex: true
 5    confidence: 100
 6    description: "Claude Code footer in PR description"

Every PR created with Claude Code includes:

 1🤖 Generated with [Claude Code](https://claude.com/claude-code)
 2
 3Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Again, definitive. No ambiguity.

#The Harder Ones: Cursor and Tool Rebranding

Cursor was trickier. It doesn't commit as a bot, and early versions didn't add any markers at all.

But we noticed patterns:

Some developers added cursor- prefixes to branch names
PRs occasionally had  HTML comments
The Cursor team eventually added an optional footer

 1# cursor.yml
 2commit_footers:
 3  - pattern: 'cursor\.com'
 4    regex: false
 5    confidence: 90
 6    description: "Cursor footer or reference"
 7
 8branch_patterns:
 9  - pattern: '^cursor[-/]'
10    case_insensitive: true
11    confidence: 70
12    description: "Branch name starts with cursor-"

Lower confidence, but still useful. We're conservative—if we're not sure, we don't flag it.

#The Confusing One: Codex Everywhere

Here's where it gets interesting. OpenAI's Codex powers multiple tools:

GitHub Copilot uses Codex (though now also GPT-4)
Cursor can use Codex
Custom integrations use Codex directly

So when we see codex in branch names or commit messages, which tool was it?

 1# openai-codex.yml
 2branch_patterns:
 3  - pattern: '^codex[-/]'
 4    case_insensitive: true
 5    confidence: 60
 6    description: "Branch name contains codex prefix"
 7
 8text_patterns:
 9  - pattern: 'codex-'
10    location: description
11    case_insensitive: true
12    confidence: 60
13    description: "Description mentions codex-"

We categorize it as "OpenAI Codex" but acknowledge uncertainty. The same model showing up in different contexts makes precise attribution impossible.

#The Newer Ones: Community Patterns

Tools like WindSurf (Codeium), Aider, v0.dev (Vercel), and Replit AI are newer to our detection system. We've created rules based on:

Expected patterns (similar to other tools)
Developer conventions (branch naming, labels)
Common markers these tools could leave

 1# windsurf.yml
 2commit_footers:
 3  - pattern: 'WindSurf'
 4    regex: false
 5    confidence: 95
 6    description: "WindSurf mention in PR description"

These haven't been validated against large datasets yet, but the patterns are conservative—they won't false positive. This is where community contributions matter. If you use these tools and see different patterns, submit a PR!

#The Impossible Ones: When Detection Breaks Down

Here's the uncomfortable truth: some AI usage is nearly impossible to detect with explicit markers alone.

Take OpenAI Codex. It powers GitHub Copilot, appears in Cursor, and runs in custom integrations. When we see a branch named codex/refactor-auth, which tool created it?

Our current rules for Codex detection are weak by design:

 1# openai-codex.yml
 2branch_patterns:
 3  - pattern: '^codex[-/]'
 4    confidence: 60
 5    description: "Branch name contains codex prefix"

60% confidence. That's barely better than guessing.

Why keep these rules at all? Because transparency beats pretending we can detect everything. When we flag something as "OpenAI Codex - 60% confidence," teams know it's an educated guess, not a certainty.

The bigger challenge: Many developers use AI tools without ANY explicit markers:

Cursor with default settings (no footer added)
GitHub Copilot inline suggestions (no commit metadata)
ChatGPT copy-paste workflows (completely invisible)

For these cases, we need behavioral analysis - looking at commit patterns, code style shifts, PR description quality, and temporal patterns. That's why our platform combines:

YAML rules - Free, fast, definitive (when markers exist)
AI behavioral analysis - Paid, slower, probabilistic (when markers don't)

How we're improving detection:

Community patterns - The more developers add detection rules, the better coverage gets
Tool partnerships - Working with AI tool makers to standardize attribution (like Claude Code's footer)
Better heuristics - Researching commit patterns that indicate AI usage without explicit markers
Validation datasets - Building open datasets of confirmed AI/human PRs to test accuracy

The goal isn't perfect detection today—it's building a system that gets better as AI tools evolve and the community contributes.

#Enabling Explicit Attribution: The Best Path to Detection

The most reliable way to detect AI-assisted PRs is when the tools leave explicit markers. Here's how to enable attribution in popular AI coding tools:

#Claude Code (Easiest)

Claude Code has built-in support for co-author attribution. Enable it globally:

 1# Enable co-author attribution globally
 2claude config set --global commitCoAuthor true

Or add to your project's CLAUDE.md:

 1# AI Attribution
 2Always include co-author attribution in commits.

This automatically adds to every commit:

 1Co-Authored-By: Claude <noreply@anthropic.com>

Detection coverage: 100% - Every PR will be detected with definitive confidence.

#Cursor

Cursor doesn't have a built-in attribution setting, but you can encourage it via .cursor/rules (or .cursorrules):

 1## Git Commit Guidelines
 2
 3When creating or suggesting commit messages, always include AI attribution:
 4
 5Co-Authored-By: Cursor <noreply@cursor.com>
 6
 7This attribution should be added to all commit messages to maintain transparency
 8about AI-assisted development.

Note: Cursor's rules primarily affect code generation. Commit message attribution may not be consistently applied. Consider also enabling Cursor's optional PR footer in settings.

Detection coverage: ~70% - Depends on whether Cursor follows the instruction.

#GitHub Copilot

GitHub Copilot's inline suggestions don't leave markers, but Copilot Workspace commits as github-copilot[bot] which is automatically detected.

For inline suggestions, you can add instructions via .github/copilot-instructions.md:

 1## Commit Attribution
 2
 3When suggesting commit messages, include:
 4
 5Co-Authored-By: GitHub Copilot <noreply@github.com>

Detection coverage: Variable - Workspace features are 100% detected; inline suggestions depend on manual attribution.

#OpenAI Codex CLI

OpenAI Codex CLI doesn't have native co-author attribution yet (it's a requested feature), but you can use a workaround.

Manual workaround - modify the git author name when committing:

 1# Add (codex) suffix to author name for this commit
 2GIT_AUTHOR_NAME="$(git config user.name) (codex)" git commit -m "Your message"

Or create a git alias in ~/.gitconfig:

 1[alias]
 2    codex-commit = "!f() { GIT_AUTHOR_NAME=\"$(git config user.name) (codex)\" git commit \"$@\"; }; f"

Then use: git codex-commit -m "Your message"

Detection coverage: ~60% - Until native attribution is added, detection relies on manual practices or branch naming conventions.

#Aider

Aider automatically adds co-author attribution by default. Ensure it's enabled in your .aider.conf.yml:

 1# Enable commit attribution (default: true)
 2attribute-author: true
 3attribute-committer: true

Detection coverage: 100% - When attribution is enabled.

#Team-Wide Adoption

For engineering teams wanting consistent AI detection:

Add tool configs to your repo - Include .cursorrules, CLAUDE.md, etc. in your repository
Document the expectation - Add to your CONTRIBUTING.md that AI attribution is required
Use pre-commit hooks - Validate that AI-assisted commits include attribution
Monitor adoption - Track which PRs are detected vs. undetected to identify gaps

Example pre-commit hook (.git/hooks/prepare-commit-msg):

 1#!/bin/bash
 2# Remind developers to add AI attribution if they used AI tools
 3
 4if ! grep -q "Co-Authored-By:" "$1"; then
 5  echo ""
 6  echo "💡 If you used AI tools for this commit, please add attribution:"
 7  echo "   Co-Authored-By: Claude <noreply@anthropic.com>"
 8  echo "   Co-Authored-By: Cursor <noreply@cursor.com>"
 9  echo "   Co-Authored-By: GitHub Copilot <noreply@github.com>"
10  echo ""
11fi

#Why Attribution Matters

Beyond detection, explicit AI attribution provides:

Transparency - Reviewers know which code may need different scrutiny
Audit trails - Track AI usage for compliance and security reviews
Learning opportunities - Identify which tasks benefit most from AI assistance
Accurate metrics - Measure true AI adoption across your organization

The bottom line: If your team uses AI tools, enabling explicit attribution is the single most effective way to improve detection accuracy from ~60% to nearly 100%.

#The YAML Approach: Why Not Just Code?

We started with hardcoded PHP for detection. 247 lines of if/else chains. Every new tool meant editing code, testing, deploying.

YAML rules changed everything:

 1tool:
 2  id: devin
 3  name: Devin
 4  provider: Cognition AI
 5  website: https://devin.ai
 6
 7explicit_markers:
 8  bot_authors:
 9    - pattern: 'devin-ai[bot]'
10      location: commit_author
11      confidence: 100
12      description: "Devin bot author"
13
14  commit_footers:
15    - pattern: 'Generated by Devin'
16      regex: false
17      confidence: 100
18      description: "Devin footer in PR description"

Benefits:

Non-developers can contribute - No PHP/Python/Node knowledge needed
Update without deploying - Pull latest rules, done
Framework-agnostic - Works in any language that can parse YAML
Easy to review - PRs are readable by everyone

Want to add a new AI tool? Create a YAML file, submit a PR. That's it.

#What We're Open-Sourcing

The coderbuds/ai-detector package includes:

Detection rules for 9+ AI coding tools
Pattern matching categories:
- Commit footers (PR description signatures)
- Co-author attributions (commit metadata)
- Bot authors (bot email addresses and usernames)
- Branch patterns (naming conventions)
- PR labels
- HTML comments
MIT License - Use freely, commercially or personally

We're keeping our AI behavioral analysis prompts proprietary (the system that detects AI usage when there are NO explicit markers), but the rules that catch ~75% of AI-assisted PRs are now public.

#Why Open Source?

Simple: AI tools evolve faster than any one company can track.

New tools launch monthly. Existing tools change their signatures. Cursor switched from no markers to optional markers. Claude Code updated their footer format. GitHub Copilot added workspace features.

With an open-source package:

Developers using new tools can add detection rules
The community keeps rules updated as tools change
Everyone benefits from better detection

We'd rather have 100 contributors keeping the rules current than try to monitor every AI tool ourselves.

#How Detection Actually Works

Two-tier system:

Tier 1: YAML Rules (Fast & Free)

Check for explicit markers:

Scan PR description for tool footers
Check commit authors for bot emails
Analyze branch names for patterns
Look for co-author attributions

Result: ~75% of AI-assisted PRs detected instantly with 100% confidence.

Tier 2: Behavioral Analysis (When Needed)

For PRs without explicit markers, we use OpenAI to analyze:

Commit message patterns (verbose, perfect grammar, consistent structure)
PR description quality (structured, detailed, formatted)
Temporal patterns (burst commits, rapid development)
Code style consistency

Result: High/Medium/Low confidence estimates for ambiguous cases.

This two-tier approach keeps costs low (most detection is free) while catching edge cases.

#Current Detection Coverage

As of December 2025, we detect:

Tool	Provider	Detection Method	Accuracy
Claude Code	Anthropic	Footer, co-author, bot email	100%
GitHub Copilot	Microsoft	Bot commits, co-author	100%
Cursor	Anysphere	Footer, branch patterns	90%
Devin	Cognition AI	Bot author, footer	100%*
WindSurf	Codeium	Footer, attribution	95%*
OpenAI Codex	OpenAI	Branch patterns, markers	60%†
Aider	Open Source	Commit patterns	85%*
v0.dev	Vercel	Markers, comments	90%*
Replit AI	Replit	Bot author, markers	95%*

*Estimated accuracy based on pattern strength. Not yet validated against real-world PRs. †Lower because Codex appears in multiple tools, making precise attribution difficult.

Missing a tool? Submit a PR - we accept all valid detection patterns.

#Real-World Accuracy

We tested the core YAML rules (Claude Code, GitHub Copilot, Cursor) against 1,000+ pull requests created by our own team during platform development:

100% agreement with our previous detection system
0% false positives on PRs marked as AI-assisted
Perfect tool identification for PRs with explicit markers
~2% false negatives (AI PRs without markers, caught by behavioral analysis)

For PRs with explicit attribution, detection is instant and certain. We've now rolled this detection system out across the Coderbuds platform for all teams.

Note: Newer tools (Devin, WindSurf, Aider, v0.dev, Replit AI) have pattern-based rules but haven't been validated against large datasets yet. Accuracy estimates are conservative based on pattern specificity.

#Using the Package

The rules are framework-agnostic. Here's Python:

 1import yaml
 2import re
 3from pathlib import Path
 4
 5# Load all rules
 6rules = {}
 7for rule_file in Path('rules').glob('*.yml'):
 8    with open(rule_file) as f:
 9        data = yaml.safe_load(f)
10        rules[data['tool']['id']] = data
11
12def detect_ai(pr_description, commits):
13    """Detect AI tool usage in pull request."""
14    for tool_id, rule in rules.items():
15        # Check commit footers
16        for marker in rule.get('explicit_markers', {}).get('commit_footers', []):
17            if marker['pattern'] in pr_description:
18                return {
19                    'tool': rule['tool']['name'],
20                    'confidence': marker['confidence']
21                }
22    return None

Same logic works in PHP, JavaScript, Ruby, Go—anything that can parse YAML and match strings.

Full examples for all languages in the GitHub repository.

#Contributing Detection Rules

Found an AI tool we're missing? Here's how to add it:

Create a YAML file - rules/your-tool.yml
Follow the schema:

 1tool:
 2  id: your-tool-slug
 3  name: Your Tool Name
 4  provider: Company Name
 5  website: https://tool-website.com
 6
 7explicit_markers:
 8  commit_footers:
 9    - pattern: 'Generated with Your Tool'
10      regex: false
11      confidence: 100
12      description: "Tool footer in PR description"
13
14  bot_authors:
15    - pattern: 'your-tool[bot]'
16      location: commit_author
17      confidence: 100
18      description: "Your Tool bot author"

Include examples - Link to 3+ real PRs showing the pattern
Test it - Verify accuracy against real-world PRs
Submit a PR - We review and merge quickly

Contribution guidelines:

Confidence levels: 100 = definitive, 80+ = high, 60+ = medium
Prefer specific patterns over generic ones
Document where the pattern appears (commit message, PR description, etc.)
Include the tool's website for reference

#How Coderbuds Uses This

At Coderbuds, we combine this open-source AI detection with DORA metrics, cycle time analysis, and team velocity tracking to help engineering leaders understand how their teams actually work.

The AI detection layer answers: "Which PRs used AI tools?"

Combined with other metrics, you get insights like:

Are AI-assisted PRs deployed faster or slower?
Do AI-generated changes have higher or lower failure rates?
Which developers are leveraging AI effectively?
Is your AI tooling investment paying off?

The detection rules are free and open. The team analytics, historical trends, and correlation insights are what Coderbuds provides.

Resources:

GitHub Repository - YAML rules and documentation
Why We Track AI at the PR Level - Our approach explained

#Sources

Questions? Open a GitHub Discussion. Want to track your team's AI adoption? Start with Coderbuds (30-day free trial).