Back to Blog

Blocking AI Crawlers Backfires: What Traffic Data Shows

Publishers blocking AI crawlers lose up to 23.1% of total traffic and miss AI referral traffic that converts at 3x search rates. Here's the data.

10 min read

By Jack Gardner · Founder, EdgeBlog

Cracked digital shield with traffic data flowing through to analytics dashboard, illustrating how blocking AI crawlers backfires
#AI crawlers#robots.txt#AI referral traffic#SEO#GEO#AI search

The logic seems sound. AI companies scrape your content, train their models on it, and serve your answers to users without sending visitors your way. So you update your robots.txt, block the bots, and protect your investment.

Most publishers have followed exactly this playbook, and most are paying a price they haven't measured yet.

The Instinct to Block AI Crawlers (And Why It's Wrong)

Blocking AI crawlers reduces total traffic by up to 23.1% and cuts publishers off from the fastest-growing referral channel on the web.

A December 2025 study by researchers at Rutgers and Wharton analyzed what actually happens when publishers block AI crawlers. The result wasn't the content protection publishers expected. It was a broad, sustained traffic decline that affected both AI and human visitors alike.

The researchers measured a 23.1% reduction in total traffic and a 13.9% drop in human-only traffic for publishers that implemented AI bot blocking. This wasn't a temporary dip following a configuration change. It was a persistent decline that showed no signs of recovery during the observation period.

The implications challenge a widely held assumption in digital publishing. Blocking AI crawlers doesn't just reduce AI traffic in isolation. It triggers a cascade that reduces your visibility across channels, including traditional search. And the cost extends beyond lost pageviews: it includes forfeited access to a referral channel that converts at three times the rate of organic search.

Here's what the evidence shows, and what it means for your crawler strategy.

What Blocking AI Crawlers Actually Does to Your Traffic

The Zhao-Berman study is the most rigorous causal analysis available on this question. Using a difference-in-differences methodology (the gold standard for measuring policy effects), the researchers compared traffic patterns for publishers that blocked AI bots against those that didn't. They drew on two independent data sources, SimilarWeb and Comscore, to cross-validate their findings.

The results were consistent across both datasets:

  • 23.1% total traffic reduction (SimilarWeb) for publishers that blocked AI crawlers
  • 13.9% human traffic reduction (Comscore, which filters out all bot traffic)
  • No recovery over time: the traffic loss was sustained throughout the study period

Why would blocking AI bots reduce human traffic? The researchers identified two primary mechanisms. First, AI systems like ChatGPT and Perplexity cannot cite or recommend content they cannot access. When a user asks a question and your content is blocked from retrieval, another source gets the citation and the referral click. Over time, this compounds: AI systems learn to rely on accessible sources, and your content gets bypassed entirely.

Second, Google's Googlebot now powers both traditional search results and AI Overviews. You cannot block Googlebot for AI purposes without also blocking it for search. This creates a structural problem that no robots.txt configuration can solve.

There is an important caveat. The study focused on large publishers: major news outlets and digital media companies with high domain authority. Raptive, which manages over 6,000 smaller creator and publisher sites, reported no statistically significant traffic change from blocking (as noted in Cloudflare's analysis of AI crawler behavior). The effect appears strongest for publishers producing high-authority, frequently-referenced content, exactly the type of content AI systems are most likely to cite.

The takeaway is straightforward. If your content is the kind that AI systems would reference and quote, blocking crawlers is especially costly. If it rarely gets cited, blocking may have little measurable effect. But it also delivers little measurable benefit.

AI Referral Traffic: Small Channel, Outsized Impact

The blocking question becomes even more consequential when you examine where AI referral traffic is headed and how it converts.

The growth trajectory is steep. AI referral traffic grew 357% year-over-year, reaching 1.13 billion visits to top websites by June 2025. ChatGPT alone processes approximately 2.5 billion daily queries with over 800 million weekly active users. Gartner projects that traditional search volume will decline 25% by 2026 as users shift to AI-powered alternatives. This is not a forecast about a distant future. It is an active, measurable migration happening right now.

But volume alone doesn't explain why this channel matters. The conversion data does.

Microsoft Clarity's analysis found that AI referral traffic converts at approximately 3x the rate of traditional search for subscriptions and sign-ups: a 1.34% conversion rate for AI referrals compared to 0.55% for organic search. Users arriving from AI recommendations have already been pre-qualified by the AI's response. They understand the context, they've seen your content endorsed by a system they trust, and they're further along in their decision-making process.

The common objection is that AI referral traffic is still a tiny channel. That's true today. TollBit's Q4 2024 data confirms that AI search sends 96% less referral traffic per query than Google. But this "96% gap" is narrowing fast. When you factor in the 3x conversion advantage, the revenue impact per visitor from AI referrals already rivals traditional organic search. And with the channel growing at 357% annually, dismissing it based on current volume is like dismissing organic search in 2005 because everyone still used the Yellow Pages.

This also reshapes how you should think about measuring content performance in the zero-click era. Traditional traffic metrics undercount the value of AI referrals because they optimize for volume rather than conversion quality. A channel sending fewer but higher-intent visitors deserves its own evaluation framework.

The shift connects to broader changes in search behavior, too. Many AI-driven queries target zero-volume keywords that traditional SEO tools don't register. These conversational, long-tail queries represent a massive and growing share of total search activity, and they're exactly the type of queries where AI citation drives referral traffic.

Blocking AI crawlers doesn't just sacrifice today's small channel. It forfeits your position in tomorrow's dominant one. For a broader look at how this shift reshapes search strategy, see our analysis of the future of SEO when AI answers first.

Training Bots vs. Retrieval Bots: The Distinction That Matters

Not all AI crawlers serve the same purpose. Treating them as a monolith is one of the most common and costly mistakes publishers make when configuring robots.txt for AI bots.

Bot NameOperatorPurposeWhat Blocking Does
GPTBotOpenAIModel trainingExcludes content from future training data
OAI-SearchBotOpenAIChatGPT search resultsRemoves content from ChatGPT search citations
ChatGPT-UserOpenAIReal-time retrievalPrevents ChatGPT from browsing your site when users request it
Google-ExtendedGoogleGemini trainingExcludes from Gemini training (search indexing unaffected)
GooglebotGoogleSearch + AI OverviewsCannot be selectively blocked for AI vs. search
PerplexityBotPerplexitySearch and citationRemoves content from Perplexity answers
ClaudeBotAnthropicTraining and retrievalExcludes content from Claude's knowledge base

The critical distinction: blocking training bots (like GPTBot or Google-Extended) prevents your content from being used to build future AI models, but it does not prevent your content from appearing in current AI search results. Blocking retrieval bots (like OAI-SearchBot or ChatGPT-User) directly removes your content from AI-generated answers. That's where the traffic impact hits hardest.

BuzzStream's analysis of top news websites quantifies this distinction neatly. 79% of top news sites block at least one AI training bot, but only 14% block all AI bots. This reveals a deliberate, selective strategy: sophisticated publishers are drawing a clear line between "don't train on my content" and "don't cite my content."

The Googlebot problem deserves special attention. Unlike OpenAI or Anthropic, Google does not offer a separate crawler for AI Overviews. Googlebot handles both traditional search indexing and AI Overview content retrieval through the same user agent. If you block Googlebot, you disappear from Google search entirely. This makes Google's AI integration fundamentally different from standalone AI products, and it means there is no technical path to appearing in Google search results while opting out of AI Overviews.

Cloudflare's data highlights another consideration. OpenAI's crawlers access roughly 1,091 pages for every referral click they send back, compared to Google's approximately 14:1 crawl-to-click ratio. AI crawlers are far more resource-intensive per visitor generated. That's a legitimate operational concern. But the response should be rate limiting and selective access controls, not blanket blocking that eliminates the referral traffic entirely.

What Smart Publishers Do Instead of Blocking AI Crawlers

The evidence doesn't support either extreme. Neither blanket blocking nor unrestricted access produces the best outcome. The data points to a selective, strategic approach.

Step 1: Audit your robots.txt. This takes 30 seconds. Visit yoursite.com/robots.txt and check for rules targeting "GPTBot," "OAI-SearchBot," "ChatGPT-User," "PerplexityBot," and "ClaudeBot." Many default configurations or overly broad wildcard rules block all AI crawlers when the intent was only to block training. A single misconfigured rule can cost you a significant share of your traffic.

Step 2: Block training bots, allow retrieval bots. If you want to prevent your content from being used for model training while still appearing in AI search results, block GPTBot and Google-Extended while explicitly allowing OAI-SearchBot, ChatGPT-User, and PerplexityBot. This is the approach most sophisticated publishers have adopted, and it preserves both intellectual property protection and AI referral traffic.

Step 3: Optimize for AI citation. Allowing crawlers is step one. Making your content the source AI systems choose to cite is step two. This means optimizing content for AI search engines: structured data, quotable passages with clear source attribution, answer-first formatting, and verifiable claims that AI systems can confidently extract and reference.

Tools like EdgeBlog build GEO (Generative Engine Optimization) into the content pipeline automatically, handling structured metadata, quotable passage formatting, and AI-ready content structure so your content doesn't just get crawled. It gets cited.

Step 4: Monitor AI referral traffic as a distinct channel. Stop lumping AI referrals into "direct" or "other" traffic categories. Set up analytics segments for ChatGPT, Perplexity, and other AI referrers. Track conversion rates alongside volume. The publishers who measure this channel properly will be first to optimize for its compounding growth.


The Bottom Line

The instinct to block AI crawlers is understandable. Content theft is a legitimate concern, and the asymmetry between how much AI crawlers consume versus how much traffic they send back is stark. OpenAI's crawlers access over 1,000 pages for every referral click they return. That ratio is frustrating. But the research consistently shows that blanket blocking is counterproductive for publishers who want to maintain or grow their traffic.

A 23.1% traffic decline. Forfeited access to a referral channel converting at 3x traditional search. Lost positioning in a channel growing at 357% annually. That's the measured cost of blocking without distinction.

The smarter approach is surgical, not ideological. Block training bots to protect your intellectual property. Allow retrieval bots so your content gets cited. Then invest in making your content the source AI systems choose to reference, through clear data, quotable structure, and verifiable claims.

Allowing crawlers gets you into the game. Optimizing for AI citation is how you win it. EdgeBlog makes that optimization automatic, so your content earns the citations and high-converting traffic that come with them.

Related Articles

Long-tail distribution showing zero-volume keywords capturing the majority of search traffic

Zero-Volume Keywords Capture 70% of Search Traffic

Your keyword research tool says a query gets zero searches per month. The instinct is to skip it and chase something bigger. But the data tells a different story: 92% of all keywords have fewer than 10 monthly searches, yet they collectively drive roughly 70% of total search traffic and convert at rates that dwarf head terms.

9 min