llms.txt Is Not a Citation Strategy. The Architecture Around It Is.

OtterlyAI spent 90 days measuring AI bot traffic after a correct llms.txt deployment. Out of 62,100 bot requests, exactly 84 went to the file. That is 0.1 percent. Yet 844,000 sites have shipped an llms.txt anyway — many of them convinced they just solved AI discoverability.

The file is not the problem. The gap is treating a routing table as a citation strategy.

The file indexes what exists — it doesn't fix what's broken

llms.txt tells a crawler which pages matter. It doesn't change what those pages say or whether they contain anything citable.

If your site has duplicate URLs, legacy redirects, and design-first pages with no direct answers, llms.txt hands the crawler a clean map of all of that. ChatGPT and Perplexity follow it — and still skip you, because the indexed pages don't match the query.

The trigger most teams recognize: a competitor's llms.txt surfaces in a ChatGPT browse session or Perplexity citation, and your team wants parity. Parity isn't a file. It's the entity structure the file points to.

JSON-LD does the citation work llms.txt sets up

Google's Gary Illyes said in mid-2025 that Google doesn't support llms.txt and isn't planning to. Yet Anthropic, Cloudflare, and Stripe all ship it. The contradiction resolves cleanly: llms.txt is layer one of a machine-readable stack, not the stack itself.

What actually moves citations is JSON-LD. Organization, Product, Person, FAQ, HowTo — structured markup that gives a language model a factual anchor it can quote without paraphrasing your prose. A side-by-side study published in early 2026 found JSON-LD provided superior factual grounding for entity recognition, while llms.txt contributed token efficiency. The combination outperformed either alone.

Deploy one without the other and you get half the architecture. Most teams deploy just the file.

Canonical URLs are a prerequisite, not an assumption

Before llms.txt can index anything useful, the site needs clean canonical URLs. Duplicate pages and redirect chains force the crawler to choose — and it often picks the legacy version, not the canonical one you want cited.

This is the part that takes time to audit: which URLs are authoritative, which are duplicates, and which carry the structured data the citation requires. Done wrong, llms.txt accelerates the problem by making the wrong pages easier to find.

The honest deployment sequence is: audit canonical URL structure, add JSON-LD to the pages worth indexing, then write the llms.txt that points to them. Reversing the order is fast and largely useless.

What a minimum viable schema layer looks like

For a B2B SaaS team that wants to ship before committing to a full content rebuild, the viable first release is narrow: Organization schema on the homepage, Product schema on product pages, FAQ schema on any page with a Q&A section, and an llms.txt pointing to the pages that now carry structured data.

It won't produce a citation overnight — AI models re-crawl on their own schedules — but it gives crawlers something to latch onto beyond plain prose. A scheduled probe run weeks after deployment tells you whether the index was adopted and which pages surfaced in live answers.

The longer content rebuild still matters. But the schema layer is what makes the rebuild's output citable when it ships.