Inside the robots.txt, llms.txt, and crawl-log reality of WordPress in 2026 - from 7 months of continuous server-side tracking.
For two decades, "search bot traffic" on a WordPress site meant Googlebot. Maybe Bingbot if you were looking, especially since November 2022. The hierarchy was simple: Google crawls you, you appear in search, traffic follows.
That hierarchy quietly broke in 2025. Between November 2025 and May 2026, EZY.ai tracked 12.8 million bot and human visits across 47 (opt-in) WordPress sites and found something unexpected: counting the full set of AI crawlers - GPTBot, ClaudeBot, Bytespider, Apple's crawler, Meta's crawler, PerplexityBot, Amazonbot and others combined - AI bots now generate about 108% of the crawl volume that Googlebot generates on the same sites.
In aggregate, AI crawlers have drawn level with and edged past Googlebot in raw volume on WordPress. And looked at site by site, AI already out-crawls Googlebot on the majority of them.
That shift changes how a serious WordPress site should think about discoverability. This piece walks through what we measured, how, and what to do about it.
Why you can't see this happening
Here is the strange part: almost no site owner has seen any of this, because the tools they rely on are structurally incapable of showing it.
Google Analytics- and most hosting dashboards are client-side tools, they work by running a JavaScript snippet in a visitor's browser. AI crawlers don't run JavaScript. They request your raw HTML, extract the text, and leave. The tracking script never fires, so the visit is never recorded. One widely-cited estimate is that GA4 misses on the order of 99% of AI-bot activity. Google Search Console only reports Google's own crawlers, so it can't help either.
The only place AI crawlers reliably show up is server-side - in raw access logs, or in a tool that reads requests at the server or CDN edge before any browser is involved. That's how the numbers in this article were captured (via the EZY WordPress plugin). It's also why this is a genuine blind spot rather than a niche concern: surveys suggest only around 14% of marketers actually track AI-search performance, even though far more say they're optimizing for it. Most teams' single largest new source of crawl attention is one their dashboard shows as zero.
(To be clear, EZY isn't the only way to see this - Cloudflare's own analytics and raw server-log analysis can too. The point is that the default WordPress analytics stack can't, and most owners don't realise it.)
Methodology
Tracking corpus: 47 opt-in WordPress sites with continuous server-side EZY tracking, server-side because AI bots don't execute JavaScript. 12,817,816 total visits, 2025-11-04 to 2026-05-28.
Audit corpus: 117 WordPress sites that ran an EZY robots.txt audit and 114 that ran an llms.txt audit in the same window - the same WordPress universe seen from the configuration side.
Classification: every request bucketed by User-Agent using case-insensitive, exact-token matching (so GPTBot matches OpenAI's crawler string but not the substring "GPT" elsewhere). EZY-internal health checks were excluded. Where a company operates several crawlers, this analysis groups them under that company's primary AI-associated agent; see the appendix for exact tokens, and the author's note on confirming variant-level labels.
Every figure carries its sample size and period. None are projections. Two biases to keep in mind: this sample self-selects for owners who care about AI visibility (skewing toward small business, professional services and content sites - enterprise WordPress likely has stronger baselines), and a portion of the audited sites are EZY customers who deploy files like llms.txt through EZY, which inflates "file present" rates relative to WordPress at large.
Finding 1: AI bots have overtaken Googlebot - in aggregate
Across 47 WordPress sites (12.8M visits, Nov 2025 – May 2026):
| Source | Visits | Share of total |
|---|---|---|
| Humans (browser-class User-Agents) | 8,547,128 | 66.7% |
| AI bots (GPT, Claude, Perplexity, Apple, ByteDance, Meta, Amazon, Mistral combined) | 924,464 | 7.2% |
| Googlebot | 858,113 | 6.7% |
| Other bots (Ahrefs, Semrush, generic crawlers, etc.) | 2,488,111 | 19.4% |
The full set of AI crawlers generated 66,351 more visits than Googlebot over the window - about a 7.7% premium.
One essential caveat, stated plainly. That "overtaken" result depends on what counts as an AI bot. Counting only the dedicated answer engines - GPTBot, ClaudeBot and PerplexityBot - AI sits at roughly 59% of Googlebot's volume, not ahead of it. The overtaking is driven by the broader set of AI-associated crawlers, including training and scraping crawlers like Bytespider (ByteDance) and Apple's crawler. Both framings are true; we report both so the number can't be misread.
The cleaner, definition-independent statement is per-site: regardless of how you draw the AI bucket, AI crawlers fetch more pages than Googlebot on the majority of individual WordPress sites we track (56–67%, depending on definition). The era in which AI-bot accessibility was a secondary concern is over - not because Google stopped mattering (Googlebot is still 6.7% of traffic and Google search is still the dominant referral channel), but because a second crawl surface of comparable size now exists alongside it.
Finding 2: GPTBot is the dominant AI crawler - but TikTok's is the surprise #2
Inside that 924,464 AI-bot visits, the split is not what public discourse suggests:
| AI crawler | Visits | Share of AI total |
|---|---|---|
| GPTBot (OpenAI) | 376,278 | 40.7% |
| Bytespider (ByteDance / TikTok) | 204,063 | 22.1% |
| Apple's crawler | 163,969 | 17.7% |
| ClaudeBot (Anthropic) | 126,778 | 13.7% |
| Meta's crawler | 37,857 | 4.1% |
| Amazonbot | 13,645 | 1.5% |
| PerplexityBot | 1,758 | 0.2% |
| MistralAI-User | 116 | <0.1% |
Four things worth mentioning:
- GPTBot is roughly 3× the volume of ClaudeBot, and on its own equals about 44% of Googlebot's entire crawl volume - a single bot at nearly half of Google. If your robots.txt addresses only one AI crawler, address GPTBot.
- Bytespider - ByteDance/TikTok's crawler - is the surprise #2, crawling in at nearly twice ClaudeBot's rate. Most WordPress owners don't know it exists.
- PerplexityBot is just 0.2% - far below its public profile. Perplexity leans on on-demand fetching via its in-product browser rather than a large standing crawl fleet, which is the likeliest explanation. Crawler footprint and brand prominence are not the same thing.
Finding 3: 65% of WordPress robots.txt files don't deliberately address AI bots
Across 117 audited WordPress sites:
Only 35.5% explicitly grant AI bots access via robots.txt. The other 64.5% leave AI access to defaults.
A robots.txt that doesn't mention any AI crawler effectively does allow them - robots.txt is a denylist, so anything not disallowed is allowed. But allowed by accident is not allowed by intent. It means no decision has been made - and with AI crawl volume now rivalling Googlebot, the absence of a decision is itself worth considering.
Full breakdown:
- 6.0% have no robots.txt at all (site defaults apply - generally everything crawlable, no sitemap directive, no AI rules).
Of the 94% with a robots.txt:
- 94.5% include a Sitemap directive - sitemap signalling is broadly healthy.
- 67.3% have at least one issue flagged by the EZY engine- missing links, no update in the last 12 months.
- 35.5% explicitly allow at least one AI bot by name.
- 11.8% explicitly disallow at least one AI bot - about 1 in 8 sites is actively blocking a crawler, sometimes without realising it.
- 13% score a perfect 100; 45.5% score below 50/100.
45.5% score below 50 on EZY's audit - meaning they don't yet follow current AI-crawler best practices (mainly: no explicit AI-bot rules), not that they're broken.
Finding 4: llms.txt adoption is low - and the evidence it helps is thinner still
llms.txt is a proposed convention (at llmstxt.org) giving language models a curated, markdown overview of a site's most important content. It's the file most often pitched as "robots.txt for AI." We build it, our customers ask for it - and we'll be straight about what the data currently shows, because the honest read matters more than the convenient one.
Across 114 audited WordPress sites:
- 45.6% have no llms.txt at all.
- 54.4% have one deployed; of those, only 15.8% score a perfect 100, and 27.2% score below 50 - the file exists but doesn't follow the spec.
Read those as a floor, not a ceiling: a portion of this sample are EZY customers who deployed the file through EZY.ai, so the "has a file" rate is higher here than across WordPress at large.
But here's the part most "deploy llms.txt!" advice skips. As of 2026, there is no good evidence that llms.txt improves how AI engines find, rank, or cite a site. No major provider - OpenAI, Anthropic, Google, Meta, Mistral - has publicly committed to using it in production; Google has said on the record that it doesn't support it and isn't planning to, with one Google engineer likening it to the long-discredited keywords meta tag.
There's even a way to get it wrong that actively hurts: the popular "make a markdown copy of every page" approach to llms-full.txt, if those copies are left indexable, creates duplicate content at scale, which can dilute crawl budget and suppress the originals.
The honest position: llms.txt is cheap insurance against a convention that might gain traction, not a proven lever today. Deploy a minimal, correct one if you want to be early - but don't expect it to move citations yet, skip the per-page markdown dump, and spend your real effort on the things in Finding 3 that demonstrably affect crawling.
The case for deploying one anyway. None of this means llms.txt is worthless - it means it's unproven for AI citations today, which isn't the same thing. The search and answer crawlers ignore it, but agentic tools increasingly don't: IDE assistants like Cursor, Claude Code and Copilot, and a growing set of MCP-based agents, do fetch it. As more of the web is navigated by agents rather than classic crawlers, a clean, accurate llms.txt is cheap insurance that costs little to maintain and positions a site for that shift. That's why we build one for clients - not as a ranking trick, but as low-cost future-proofing, with honest expectations about what it does and doesn't do today.
Finding 5: WordPress sitemaps are stale, not missing
Sitemaps matter here because AI bots fetch them early to decide what to crawl - so an out-of-date sitemap means new content gets discovered late. Across 114 audited WordPress sites:
- 87% have a sitemap - absence isn't the problem.
- But 71% haven't updated it in over 30 days, and 38% not in over 90. The median time since the newest entry is 70 days.
- Coverage is decent but imperfect: sitemaps include a median of 94% of the pages we can find on the site, and two-thirds miss at least some pages.
In other words, the typical WordPress sitemap exists, is nearly complete, and is out of date - publishing new posts without the sitemap reflecting them for weeks. For AI engines that lean on the sitemap to prioritise fresh content, that's a discovery lag baked in before a single page is read.
(Two caveats: lastmod is self-reported and some plugins set it loosely, so read "70 days" as the newest declared update; and where a sitemap was EZY-generated, freshness reflects that, not the site's original.)
Why this matters: the citation economy
A piece of external context worth holding alongside the crawl numbers. Semrush research published in June 2025, drawn from 500+ high-value digital-marketing and SEO topics, found that visitors referred from AI assistants convert at roughly 4.4× the rate of traditional organic search visitors - the theory being that the AI pre-qualifies the user before they ever click.
That figure is Semrush's, not EZY's, and it deserves two honest qualifiers. First, it's strongest for informational and research-type queries; a large e-commerce study found organic search actually out-converting ChatGPT referrals by ~13% for transactional purchases, and other analyses put the premium nearer 31%. Second, AI-referral volume is still small - on the order of 1.84% of total traffic today. So the right read is "high-value, low-volume, growing fast," not "AI has replaced search”. 1.84% is up from 0.27% not long ago, so the question is- when will 1.84% become 10%?
Stacked with this article's crawl data, though, the direction is clear: AI bots now crawl WordPress at roughly Googlebot volume (Finding 1), and the visitors that flow from AI answers tend to be higher-intent. AI-bot accessibility has moved from a nice-to-have to a discoverability surface worth deliberate attention.
What WordPress site owners should do about it
- Audit your robots.txt for explicit AI-bot directives. By observed volume: GPTBot, Bytespider, Applebot-Extended, ClaudeBot, Meta-ExternalAgent, Amazonbot, Google-Extended, CCBot, PerplexityBot, MistralAI-User. Add the on-demand fetchers - ChatGPT-User, OAI-SearchBot, Claude-Web, anthropic-ai, Perplexity-User. Make a deliberate Allow/Disallow choice for each.
- Check your security plugin and WAF. Wordfence, Sucuri, iThemes and Cloudflare bot rules can block unfamiliar user-agents silently. The 11.8% blocking an AI bot are mostly doing it by accident - check and confirm.
- Treat llms.txt as optional, not urgent. If you want to be early to the convention, deploy a correct llms.txt (spec at llmstxt.org) - but don't expect it to move AI citations yet (see Finding 4), and don't auto-generate a markdown copy of every page, which can backfire as duplicate content. Higher-leverage than llms.txt today: clean, extractable HTML and schema markup (Organization, Article, FAQPage, Product/Service) on the pages you actually want cited - see schema.org for types.
- Keep your sitemap current and referenced in robots.txt. AI bots use it to prioritise crawl order.
- Measure server-side, and break bots out by name. Your default analytics can't see AI crawlers at all, and most dashboards that do lump them into one "bot" bucket. GPTBot at 376k, Bytespider at 204k and ClaudeBot at 127k are three different signals about three different product surfaces - aggregating them hides more than it shows.
For owners who'd rather not do this by hand, EZY runs a free audit at EZY.ai that flags missing AI-bot directives, sitemap issues and llms.txt problems - and measures AI-bot visits server-side, where your standard analytics can't.
Frequently asked questions
Do AI bots crawl WordPress sites more than Googlebot in 2026?
Across 47 WordPress sites and 12.8 million server-side visits (November 2025 – May 2026), all AI-associated crawlers combined logged about 108% of Googlebot's crawl volume - slightly more than Google. Counting only the dedicated answer engines (GPTBot, ClaudeBot, PerplexityBot), AI sits at roughly 59% of Googlebot. On a per-site basis, AI crawlers out-crawl Googlebot on the majority of these sites. The figures are specific to this tracked sample, not all of WordPress.
Which AI crawler visits WordPress sites the most?
GPTBot (OpenAI) is the largest by a wide margin - about 41% of all AI-bot traffic in this dataset, and on its own roughly 44% of Googlebot's total volume. The surprise second is Bytespider (ByteDance/TikTok) at about 22%, followed by Apple's crawler at about 18% - both ahead of ClaudeBot (about 14%). PerplexityBot is small in raw crawl volume (about 0.2%) because it fetches on demand rather than running a large standing crawl.
Why don't AI bots show up in Google Analytics?
AI crawlers don't execute JavaScript. Google Analytics, Plausible and Fathom work by running a JavaScript tag in a visitor's browser, so a bot that requests your raw HTML and leaves never triggers the tag and is never recorded. One industry estimate puts the share of AI-bot activity missed by GA4 at around 99%. The only reliable way to measure AI crawlers is server-side - raw access logs or a tool that reads requests at the server or CDN edge.
Does llms.txt improve AI search visibility?
As of 2026, there is no strong evidence that it does. No major AI provider - OpenAI, Anthropic, Google, Meta or Mistral - has publicly committed to using llms.txt in production, and Google has said on the record that it doesn't support it. Independent studies find the search and answer crawlers rarely fetch the file, and a correlation study by SE Ranking (around 39,000 domains) found no clear link between having an llms.txt and how often AI engines cite a site. Treat it as low-cost, forward-looking insurance - agentic tools like Cursor, Claude Code and Copilot increasingly do fetch it - rather than a proven ranking lever today, and avoid auto-generating a markdown copy of every page, which can create duplicate-content problems.
How do I check whether my WordPress site is blocking AI crawlers?
Open your robots.txt and search for the AI user-agents - GPTBot, ClaudeBot, PerplexityBot, Bytespider and Google-Extended. In this analysis, about 7% of sites actively block at least one AI crawler, often unintentionally, while around 65% had made no explicit decision either way. Also check your security plugin (Wordfence, Sucuri, iThemes) and any Cloudflare bot rules, which can block unfamiliar user-agents silently. Because AI bots don't appear in JavaScript analytics, confirm access in your server logs rather than your analytics dashboard.
Last updated: May 2026. Comparison data verified against publicly documented product features at the time of publication.
Data appendix
Tracking corpus: 47 WordPress sites, server-side, 2025-11-04 → 2026-05-28. Visits classified: 12,817,816.
robots.txt audit corpus: 117 WordPress sites, same period.
llms.txt audit corpus: 114 WordPress sites, same period.
AI bots counted (case-insensitive token match): GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-Web, anthropic-ai, PerplexityBot, Perplexity-User, Applebot-Extended, Bytespider, CCBot, Meta-ExternalAgent, MistralAI-User, Amazonbot.
Googlebot: all Googlebot variants; Google-Extended counted separately as an AI bot.
Humans: browser-class User-Agent strings without bot tokens.
Not measured here, and therefore not claimed: HTTP status / 403 rates by bot or plugin, schema coverage, and sitemap freshness/completeness.
External figures attributed in-text: Semrush (June 2025) AI-referral conversion; Ahrefs/Visibility Labs conversion comparisons; GA4 client-side limitation (industry-documented).
All EZY figures are direct measurements, not projections. Raw aggregate data available on request for journalists and researchers.
EZY.ai is an AEO automation platform for WordPress and Cloudflare-fronted sites, helping owners measure and improve their accessibility to AI search engines - robots.txt, llms.txt, schema, sitemap and the full AEO file set. Free audit at ezy.ai.
