How to Block AI Crawlers from Your Website (Complete 2026 Guide)

Quick answer: To block AI crawlers like GPTBot, ChatGPT-User, ClaudeBot, and others from scraping your website, add specific rules to your robots.txt file and use HTTP headers for fine-grained control. Here's the complete list of AI user agents and how to block (or allow) each one.

Why Block (or Allow) AI Crawlers?

AI companies like OpenAI, Anthropic, Google, and others send crawlers to scrape website content for training their language models. Unlike search engine bots (which index your pages for search results), AI crawlers use your content to build commercial products — often without compensation or attribution.

Reason to block	Reason to allow
Protect proprietary content from being used in AI training	Get cited in AI answers (ChatGPT, Perplexity, etc.)
Reduce server load from aggressive crawling	Increase brand visibility through AI-generated recommendations
Copyright and licensing concerns	Drive referral traffic from AI tools that link to sources
Competitive advantage — don't feed competitor AI models	Participate in AI Search (Google AI Overview, Bing Chat)

Complete List of AI Crawlers (2026)

User Agent	Company	Purpose	Respects robots.txt
`GPTBot`	OpenAI	Training data for GPT models	Yes
`ChatGPT-User`	OpenAI	Real-time browsing (ChatGPT with browsing)	Yes
`OAI-SearchBot`	OpenAI	SearchGPT / ChatGPT Search	Yes
`ClaudeBot`	Anthropic	Training data for Claude	Yes
`anthropic-ai`	Anthropic	Web browsing for Claude	Yes
`Google-Extended`	Google	Training Gemini / Bard	Yes
`Googlebot`	Google	Search indexing + AI Overview	Yes (don't block)
`PerplexityBot`	Perplexity AI	AI search engine	Yes
`Applebot-Extended`	Apple	Apple Intelligence / Siri	Yes
`Bytespider`	ByteDance	TikTok AI training	Partially
`CCBot`	Common Crawl	Open dataset (used by many AI labs)	Yes
`FacebookBot`	Meta	AI training for Llama	Yes
`meta-externalagent`	Meta	Meta AI browsing	Yes
`cohere-ai`	Cohere	Enterprise AI training	Yes
`Diffbot`	Diffbot	Web data extraction for AI	Partially

Option 1: Block All AI Crawlers via robots.txt

Add this to your robots.txt file (usually at yoursite.com/robots.txt):

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Diffbot
Disallow: /

Option 2: Block Training, Allow AI Search

If you want to appear in AI search results (ChatGPT Search, Perplexity, Google AI Overview) but don't want your content used for training, use this selective configuration:

# Block AI TRAINING crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: FacebookBot
Disallow: /

# ALLOW AI search/browsing bots (so you appear in AI answers)
User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Applebot-Extended
Allow: /

Option 3: HTTP Headers (More Control)

For page-level control, use the X-Robots-Tag HTTP header. This is useful when you want to block AI crawlers from specific pages (like premium content) while allowing them on others.

In your server config (Nginx example):

# Block GPTBot from premium content
location /premium/ {
    add_header X-Robots-Tag "noai, noimageai" always;
}

Google also supports the nosnippet and max-snippet:0 meta tags to prevent content from appearing in AI Overviews:

<meta name="robots" content="max-snippet:0">

How to Verify Your Blocks Are Working

Test robots.txt: Visit yoursite.com/robots.txt and verify the rules are present
Use Google Search Console: The robots.txt tester shows whether specific user agents are blocked
Check server logs: Search for AI bot user agents in your access logs to see if they're still crawling
Use PrivacyChecker: Our scanner checks your robots.txt configuration and flags AI crawlers that aren't blocked (or that are allowed)

The Copyright Angle

As of 2026, several legal precedents affect AI crawling:

EU AI Act (2024): Requires AI providers to document training data sources and respect copyright opt-outs
EU Copyright Directive (Article 4): Text and data mining for commercial AI requires an opt-out mechanism — robots.txt is the de facto standard
NYT v. OpenAI (US, 2024): Established that large-scale scraping for AI training can constitute copyright infringement
TDM Reservation Protocol: Some publishers use the tdm-reservation: 1 header to explicitly reserve text/data mining rights

Frequently Asked Questions

Does blocking GPTBot prevent my site from appearing in ChatGPT?

Not exactly. Blocking GPTBot prevents OpenAI from using your content for training future models. But ChatGPT-User is a separate bot used for real-time browsing — if you allow ChatGPT-User, your content can still appear when users ask ChatGPT to browse the web.

Will blocking AI crawlers hurt my Google SEO ranking?

No. Blocking Google-Extended only prevents Google from using your content for Gemini/AI training. It does not affect Googlebot (the search index crawler). Your search rankings are unaffected. However, blocking Googlebotwill remove you from search results entirely — never block Googlebot.

Is robots.txt legally binding?

Not directly, but it's increasingly recognized in court. The EU Copyright Directive recognizes robots.txt as a valid machine-readable opt-out. OpenAI, Anthropic, and Google have all publicly committed to respecting robots.txt. Ignoring a robots.txt block could strengthen a copyright infringement claim.

How to Block AI Crawlers from Your Website (Complete 2026 Guide)

Why Block (or Allow) AI Crawlers?

Complete List of AI Crawlers (2026)

Option 1: Block All AI Crawlers via robots.txt

Option 2: Block Training, Allow AI Search

Option 3: HTTP Headers (More Control)

How to Verify Your Blocks Are Working

The Copyright Angle

Frequently Asked Questions

Does blocking GPTBot prevent my site from appearing in ChatGPT?

Will blocking AI crawlers hurt my Google SEO ranking?

Is robots.txt legally binding?

Check your website now — free

Explore More Resources

Related Articles

How to Audit Your Website's Privacy Compliance (Step-by-Step)

What Is a Privacy Score and Why Does It Matter for Your Business?

Hidden SaaS Costs: How Duplicate Tools Are Draining Your Budget

Third-Party Vendor Risk Assessment: A GDPR Requirement You Can't Ignore

Third-Party Scripts: The Hidden Security Risk on Your Website

CCPA vs GDPR: Key Differences Every Business Must Know