SEO in the Age of AI: Implementing llms.txt and Tracking LLM-Driven Traffic

As large language models (LLMs) like ChatGPT and Bing’s AI Chat become gateways for information, SEO is expanding beyond traditional search engines. Companies now face a new challenge: ensuring their content is visible and attributable in AI-generated answers. This post explores how to implement the emerging llms.txt standard (a sort of “robots.txt for AI”), why declaring AI permissions matters, and how to track traffic and engagement from LLMs. We’ll also discuss how SEO is evolving from blue links to AI recommendations, the limitations of current attribution, and the new opportunities this shift creates for content and branding.

What is llms.txt (The “Robots.txt” for AI)?

Figure: The llms.txt logo represents a new standard to help AI agents understand website content at a glance. Similar to how robots.txt guides search engine crawlers, llms.txt is a proposed text file that guides AI systems (LLMs) in using a site’s content. However, it serves a different purpose than robots.txt. Instead of listing which URLs to avoid, llms.txt is designed to expose your important content in a structured, LLM-friendly way. In other words, it’s about making content easier for AI to consume, rather than restricting it.

Robots.txt vs llms.txt: Robots.txt is a decades-old standard that tells search engine bots which pages not to crawl or index. By contrast, llms.txt is meant to tell AI models how to use your content, not just what to skip. It’s more akin to a mini sitemap or content guide specifically formatted for AI understanding. For example, robots.txt might prevent a bot from crawling your /admin folder, whereas llms.txt might highlight your key docs, FAQs, or product info that an AI should focus on. Unlike a sitemap.xml (which is just a list of URLs), llms.txt provides context and structure for each link. It’s written in plain Markdown, a format both humans and AI can easily read, stripped of the clutter of HTML or scripts.

How Does llms.txt Work?

The llms.txt file lives in your website’s root (e.g. yourwebsite.com/llms.txt). When an AI agent or crawler accesses this file, it finds a curated outline of your site’s most important information. The standard format includes:

H1 Title: The name of your site or project (this is the only required element).
Summary Block: A short description of the site enclosed in a blockquote, highlighting key information or purpose.
Main Sections (H2 headings): Each H2 introduces a category (for example “Guides”, “Products”, “Support”, etc.), under which you list important pages.
Bullet Lists of Links: Under each H2, a list of links to critical pages or documents, formatted as [Page Title](URL): optional brief notes. This gives the AI both the link and a hint of what’s there.
Optional Section: Optionally, a section (often titled “Optional”) can list lower-priority pages that can be skipped if the AI has limited context window.

In essence, llms.txt provides a condensed knowledge base for your site. Instead of crawling dozens of pages and parsing complex layouts, an AI can retrieve this single file to quickly learn the structure of your content. This is crucial because LLMs have limited context windows and can’t ingest your whole site at once. By surfacing the most important content in a clean format, you reduce the chance of the AI missing or misinterpreting your site’s key information.

Why Should Companies Implement llms.txt?

As AI-generated answers become more common, making your content accessible to LLMs is no longer optional. If an AI assistant is answering questions about your industry or products, you want it to draw from your content – and to do so accurately. Parsing raw HTML is often slow and error-prone for LLMs, especially when pages are filled with navigation menus, ads, or interactive elements. By providing a ready-made “cheat sheet” in llms.txt, you:

Surface important content quickly: The AI sees your top guides, product pages, or FAQs immediately, rather than wading through irrelevant pages.
Reduce errors and omissions: Structured links and summaries help the model retrieve exactly what it needs, reducing the risk of it hallucinating or using outdated info.
Improve brand presence in answers: When someone asks an AI, “How do I do X with [Your Product]?”, a well-curated llms.txt increases the likelihood that the AI’s answer will come from your official content.

Early adopters are already seeing the benefits. The llms.txt proposal was introduced in late 2024 by Jeremy Howard of Fast.ai, and gained traction when documentation platforms like Mintlify enabled automatic llms.txt generation in November 2024. Overnight, thousands of developer doc sites (including ones from Anthropic and Cursor) had llms.txt files in place. This rapid uptake in the tech community highlights a broader trend: companies that depend on accurate, up-to-date information (dev tools, knowledge bases, etc.) are leading the way, ensuring AI assistants get the best version of their content.

Implementing llms.txt on Your Website

Implementing llms.txt is straightforward. You can manually create the file, or use tools if your CMS or docs platform supports it. Here’s how to get started:

Create a plain Markdown file named llms.txt in your website’s root directory. (If your site is static or you have access to hosting, it’s just like creating a robots.txt or sitemap.xml, but in Markdown format.)
Add a top-level heading (H1) with your project or site name. For example: # MySite Documentation.
Write a one-paragraph summary of your site or product as a blockquote. In Markdown, a blockquote is indicated with a > at the start. This should be a concise description that gives context. Example: > Official docs and API reference for MySite, including integration guides and troubleshooting tips.

List your key content sections using H2 subheadings. Under each, list important pages with bullet points. Each bullet should be a Markdown link followed by a colon and a short note. For instance:

## Guides  
- [Getting Started](https://mysite.com/docs/intro): Step-by-step setup guide  
- [Integration](https://mysite.com/docs/integrations): How to connect our API with your app  

## Reference  
- [API Reference](https://mysite.com/docs/api): All API endpoints and parameters  
- [CLI Commands](https://mysite.com/docs/cli)

This structure tells an AI, “These are the main areas of my content, and here’s what you’ll find in each page.”

(Optional) Include an “Optional” or less important section: If there are lengthy pages or ancillary resources that might not fit in an LLM’s context window, you can mark them under a clearly labeled section (like ## Optional) at the end. The AI can skip these if it’s tight on space, focusing first on the core content.
Provide Markdown versions of content pages (if possible): The llms.txt proposal also suggests offering your content pages in Markdown format (e.g. accessible by adding .md to the URL). This is an advanced step, but it means an AI could retrieve a clean text version of any page directly. Some sites even provide an aggregated llms-full.txt containing all their content in one file. This isn’t mandatory, but it further streamlines AI access to your information.

Once your llms.txt is live, test it by visiting yourwebsite.com/llms.txt in a browser – you should see the raw Markdown. From here, any AI agents that know about this standard can fetch it. The community is actively discussing integrations (there’s a public GitHub and Discord where llms.txt is being evolved), and already directories like llmstxt.site and directory.llmstxt.cloud list hundreds of sites adopting it. By implementing it early, you signal that your site is “AI-ready,” which could give you an edge as more AI tools start looking for this file.

Declaring AI Permissions and Visibility: Why It Matters

Beyond just helping AI find your content, companies need to consider permissions – essentially, what usage of your content by AI is allowed – and how visible your content is within AI platforms. In the past year, we’ve seen web publishers awaken to the reality that AI models are crawling and training on their content. Some responded by blocking AI crawlers (for example, OpenAI’s GPTBot can be disallowed via robots.txt directives), or by using meta tags like “noai” to opt out of AI training. But blocking outright may mean forfeiting a new traffic stream. This is where a balanced approach like llms.txt comes in: it’s a way of explicitly allowing and guiding AI usage on your terms.

Think of llms.txt as a permission and guidance document for AI. By creating one, you’re effectively saying: “Here is the content we’re okay with AI using, and here’s the best way to use it.” It’s an opportunity to declare what information can be used by AI search engines or assistants. Instead of letting a crawler loose on your site (which might result in high server load or it grabbing irrelevant data), you pre-package the important stuff. This not only helps the AI give better answers but also protects you from misuse – you’re not exposing every page, only what you choose to highlight.

From a strategic standpoint, declaring permissions and being transparent about AI access can enhance your brand’s credibility. Users and regulators are increasingly concerned about how AI gets its data. A company that openly provides an llms.txt is effectively opting in to the AI ecosystem in a controlled way. It shows you’re willing to share knowledge (for instance, a software company sharing its documentation), but you’re doing so with clear boundaries. In contrast, a company that tries to hide from AI might miss out on being featured in AI-driven recommendations or answers.

Moreover, visibility in AI models matters. If your competitors’ content is being indexed or summarized by AI and yours isn’t, you could be invisible in a whole class of user queries. Imagine a user asks an AI assistant for “the best CRM software for small business” – if that AI has ingested and understood a competitor’s product info (perhaps via their llms.txt or other means) but not yours, the answer may exclude you. Ensuring you’re present in the model’s knowledge means declaring yourself open for AI business. It’s similar to the early days of SEO: companies that allowed search engines to crawl their sites (and optimized for it) gained an advantage over those who were invisible to Google. Today, being visible to AI models and agents could determine whether your brand is mentioned at all in AI-driven conversations.

In short, explicitly managing your AI visibility is now as important as managing your search engine visibility. Through measures like llms.txt (for guidance) and proper crawler directives or meta tags (for permission), you can strike the right balance between protecting your content and amplifying it via AI.

How LLMs Use Your Site’s Content (ChatGPT, Claude, Bing, Perplexity, etc.)

Not all AI systems gather and use content in the same way. It’s important for SEO professionals to understand how different LLM-powered services access web content – and thereby how your site can appear (or not appear) in their outputs.

LLMs via Training Data (ChatGPT, Claude): Models like OpenAI’s ChatGPT (particularly the older versions) and Anthropic’s Claude are trained on vast datasets that include a snapshot of the web. If your site was part of that training data (e.g. scraped in 2021 for GPT-4’s cut-off, or in 2023 for newer models), the model might have knowledge of your content. However, responses from these models usually don’t cite sources, and the content is baked into the model’s memory. This means your site’s info could be used without attribution, and possibly out-of-date if you’ve changed since the training snapshot. You have limited control here beyond opting out of training crawlers or providing updates via something like plugins. Notably, OpenAI has offered a way to opt-out of training (via robots.txt disallow for GPTBot), but if you opt out, you also opt out of being part of the model’s answers. It’s a trade-off between presence vs. privacy.
Retrieval-based AI (Bing Chat, Google’s Bard/SGE, Perplexity, etc.): These systems fetch information from the web at query time, similar to a search engine, and then have the LLM formulate an answer. For example, Bing Chat uses Bing’s search index to retrieve content and always cites its sources in the answer. If a user asks something, Bing Chat will show snippets from web pages (including yours) and list the source domains or titles as footnotes. Perplexity.ai works in a similar fashion: it performs a web search for you and generates an answer with footnoted citations linking to the sources. In fact, Perplexity is considered a gold standard for citation transparency – every answer clearly links to where it got the info. For your site, this means if you rank well for a given query (SEO still matters!), an AI like Bing or Perplexity might pick up your page and directly quote or summarize it, then drive the user to your site via a citation link.
Hybrid approaches (ChatGPT with browsing, plugins, others): Some LLMs combine both. ChatGPT, for instance, introduced a browsing mode that uses Bing’s API to fetch current information when a user explicitly asks for it. If a user with ChatGPT enabled browsing or a specific plugin asks a question, ChatGPT might pull your site’s content in real time (respecting robots.txt rules). This is more like a mini-search engine within ChatGPT. Other tools, like certain browser extensions or assistant apps, will do behind-the-scenes retrieval from sites (they might use APIs like Google’s or Bing’s to find relevant URLs and then scrape content). These will parse your page, often stripping HTML tags, and feed chunks of text to an LLM to generate an answer. Here, having clean, structured content (with semantic HTML or an available llms.txt) can influence whether the AI accurately understands your page.

Why does visibility in these models matter? Because increasingly, users are bypassing traditional search results and going straight to these AI agents. Millions of users now rely on ChatGPT-style Q&A or multi-modal search engines to get answers. If your site isn’t part of that ecosystem, you lose mindshare. It’s analogous to not being listed on Google a decade ago. We’re already seeing companies get traffic from these AI sources: early data shows anywhere from 0.5% to 3% of website traffic now coming from LLM-based sources, even in 2024. That number might seem small, but it has been climbing steadily – and it’s expected to grow exponentially as AI tools go mainstream on every device. In fact, projections suggest LLM-driven search could jump from a fraction of a percent of queries in 2024 to as much as 10% by the end of 2025. This is a tectonic shift in how people discover content.

Being visible in LLM answers means a few things for your business: your information is reaching users even when they don’t visit your site directly, your brand can be mentioned as an authority (if the AI cites or describes the source), and you have an opportunity to capture clicks from those answers (when users want to “learn more”). But it also means you need to take care that the AI is getting correct information. If your site is out-of-date or the AI picks content from someone else describing your brand, the answer might be wrong. By using tools like llms.txt and keeping your content optimized for AI retrieval, you improve the odds that your voice is the one the model uses.

From Search Engine Results to AI Recommendations: The Evolution of SEO

The rise of LLMs is transforming classic SEO into something new. We’re moving from a world where success is being Rank #1 in a list of links, to a world where success is being the source an AI cites or the recommendation an AI assistant gives. In other words, we are shifting from search engines to “answer engines”. As one AI SEO expert put it, the future of discovery isn’t a blue link — it’s a bot. Instead of a user scanning a page of results, an AI agent will synthesize the information and present a single (or a few) answers. So what does that mean for those of us in SEO and content strategy?

Generative Answers Reduce Clicks (The Zero-Click Phenomenon): AI answers often provide so much information that the user may not feel the need to click through to a website at all. This is an extension of the “zero-click search” trend we saw with featured snippets on Google. Now an AI might answer, say, “What’s the best running shoe under $100?” by directly naming a product and summarizing reviews, all without the user visiting a single website. For businesses, this means your content could be driving the answer but not the click. It raises the question: how do we measure success when impressions (being mentioned) become as important as clicks? We must broaden our KPIs to include metrics like brand mentions or AI citations, not just site visits.
“Answer Engine Optimization” (AEO) is emerging: Ensuring that an AI cites you or uses your data is the new game. Some are dubbing this Generative Engine Optimization, because it’s about optimizing for generative AI results. This might involve new tactics: structuring your content semantically, using schema markup to reinforce your authority (e.g., marking your content with proper schema.org types, author info, etc.), and focusing on entity presence (making sure your brand or key topics are well-defined across the web). For instance, if you’re a notable entity on Wikipedia or have lots of authoritative backlinks, LLMs are more likely to “know” about you and include you in answers. LLMs don’t use traditional keyword-based ranking, but they do rely on the semantic understanding of content and the signals of authority available (such as being mentioned in trusted sources).
The role of citations and trust: Unlike regular search where the user decides which link to click, in AI answers the AI is deciding which sources to trust and present. LLM-based systems use retrievers and scoring algorithms to pick what content to draw from. They tend to favor sources that the model deems authoritative or that score high for relevance and credibility. This means that brand authority and content quality become even more critical. If your site has a history of expertise (e.g., being cited elsewhere, or containing unique insights), it’s more likely to be chosen by the AI’s retrieval algorithm. We’re basically seeing a convergence of SEO with PR and thought leadership: getting cited in high-authority places (like academic papers, respected news sites, or well-known industry resources) can influence AI outputs. In other words, citations beget citations – much like in human SEO where backlinks beget higher ranking.
Continuous content updates and accuracy: Because AI models might rely on a mix of training data and live data, it’s crucial to keep your information up-to-date. For example, if an AI’s training data knows about your company up to 2022, but you’ve since launched new products or changed pricing, an AI might give outdated info unless it fetches updates from your site. This is driving an evolution in content strategy: content needs to be written not just for human readers and search bots, but also for AI comprehension. Clarity, factual correctness, and having a concise summary (like in llms.txt or on-page) help ensure the AI doesn’t get things wrong.

Ultimately, traditional SEO isn’t going away – people still use search engines, and ranking there remains important. But AI SEO is quickly becoming a parallel field. It requires a blend of technical SEO (to ensure crawlability by AI agents), content strategy (to ensure you’re the source of truth the AI finds), and analytics (to measure impact in a world of fewer clicks). Companies and agencies that recognize this shift early are positioning themselves as leaders in a new era of search. They’re treating AI responses as a new distribution channel, one that can be optimized and won like any search results page – albeit through new techniques.

Tracking Traffic and Citations from LLMs

One big question for marketers is: How do we know if we’re getting traffic or visibility from AI models? Traditional web analytics and SEO tools have focused on search engine referrals, but now we need to catch referrals from AI chatbots and assistants. The good news is that LLM-driven referral traffic is measurable – and it’s already showing up in analytics reports. The bad news is there’s no simple “AI Console” (yet) that aggregates all this, so you have to get a bit creative.

1. Use Web Analytics (GA4, etc.) to identify AI referrals: Modern analytics platforms like Google Analytics 4 can be configured to filter out sessions coming from known AI sources. Typically, when an AI agent provides a link and a user clicks it, the referral might show up as coming from a domain related to that AI. For instance, traffic from Bing’s AI chat may appear with a referrer containing bing.com (often with clues like /new or edgeservices in the URL), and Perplexity.ai referrals will show perplexity.ai as the referrer. If ChatGPT (with browsing) opens a page for the user, you might see chat.openai.com or an OpenAI domain. SEO analysts have compiled lists of these referrer patterns – for example, one GA4 setup uses a filter for any referrer containing keywords like “openai”, “chatgpt”, “perplexity”, “bard”, “bing” (and related terms) to capture AI-originating traffic. By setting up a custom report or segment, you can monitor how many sessions and what engagement comes from these sources. In practice, companies are already observing that roughly 0.5% to 3% of their traffic is arriving via LLM sources like ChatGPT, Bing Chat, Perplexity, GitHub Copilot, etc.. That share is expected to rise, so having a dashboard for it now is wise.

2. Monitor AI-specific analytics (if available): Some AI platforms are beginning to offer analytics or at least clues for content creators. For instance, Bing Webmaster Tools now shows if your content was seen in Bing’s chat feature. It may not explicitly break it out (Bing might just count it as an impression or click from Bing), but keep an eye on any unusual spikes in Bing referrals that don’t correspond to regular web search traffic. Similarly, the team at Perplexity has hinted at working with publishers – and since Perplexity is very transparent with sources, one can imagine future “Perplexity Publisher” reports showing how often your site was cited. While not publicly available yet, these kinds of features are likely on the horizon as AI search grows.

In absence of official tools, use the AI tools themselves for insight. For example, you can manually query Bing Chat or Perplexity with prompts to see if your site comes up. Ask something like “What does [Your Company] do?” or “What are the best resources on [your topic]?” and see if the answer cites you. Perplexity in particular is useful for this reconnaissance, because it shows footnotes with URLs. If you find that a competitor is consistently cited where you are not, that’s a signal to beef up your content or authority in that topic.

Figure: AI-generated answer citing sources. In this example from Bing’s AI chat, the assistant provides a “Learn more” section with numbered references to websites (popsugar.com, masterclass.com, wikihow.com, etc.), indicating where it got its information. This kind of citation list is what you want to see your site appear in. Bing’s design shows source domains at the end of answers, and users can click those links for full details. Similarly, Perplexity will include inline footnote numbers that link to sources. By tracking whether your domain shows up in these citations, you can gauge your AI visibility even when the user doesn’t immediately click through.

3. Conduct “prompt audits” for your content: Since there’s no Google Search Console equivalent for LLMs, you have to audit manually. Think of the questions your audience might ask an AI. Then try them yourself in various AI tools (ChatGPT, Bing Chat, Bard, Claude, Perplexity, etc.). Are you in the answer? Are you referenced or quoted? For example, if you run a travel blog, ask “According to [YourBlogName], what’s the best time to visit Paris?” or simply “best time to visit Paris” and see if the AI mentions your blog. This tactic was described as thinking like a spy, not a marketer – you have to sleuth out where you stand. Some experts suggest systematic checks: perhaps quarterly, run a set of key queries through popular AI systems and log the results. If you’re never showing up (especially for queries you should be relevant for), that’s a red flag that you need to boost your content’s authority or relevance (or that the AI hasn’t been permitted to access your site, in which case llms.txt or allowing its crawler could help).

4. Leverage logs and monitoring tools: If you have the capability, analyze your server logs for known AI user agents or patterns. OpenAI’s GPTBot has a user agent string which might appear if someone using ChatGPT’s browsing fetched your page. Other AI services might have distinctive fetch patterns (for instance, Perplexity might use the same user agent as a browser but could be identified by IP ranges or timing of bursts of fetches). This is more technical, but worth it for large sites: by identifying AI-driven access in logs, you can quantify interest in your content from these models even if it didn’t result in a traditional page view (some AI might scrape content without triggering analytics scripts).

5. Third-party “AI visibility” tools: A nascent industry of tools is emerging to help track AI mentions. Some SEO tool providers and startups (e.g., the GrowthMarshal blog we cited earlier) are building solutions to monitor where your brand/content appears in AI outputs. These tools use a combination of prompt simulation, AI analysis, and perhaps partnerships with AI platforms. Keep an eye on services from major SEO platforms too – it wouldn’t be surprising if in the near future something like an “AI section” appears in Google Search Console or Bing Webmaster. Majestic, for instance, noted that tracking LLM traffic is becoming essential as LLMs transform SEO by emphasizing fresh, authoritative content.

Engagement and conversion tracking: One insight from early studies is that LLM-driven traffic behaves differently. Users coming from an AI answer might be less inclined to browse around once they land on your site – they came for a specific snippet, after all. A study cited on SearchEngineLand noted that in most sectors, traditional organic search traffic still outperforms LLM referral traffic in terms of engagement and conversions (using metrics like time on site or conversion events). This is logical: a user from Google Search often comes in “research mode” and might click multiple results, whereas an AI referral might indicate the user already got a condensed answer and only clicked through for detail on a very particular point. Understanding this behavior is key – if AI traffic is less engaged, you may need to adjust your content to capture those users quickly (e.g., ensure the landing page immediately addresses what the AI was summarizing, perhaps even greeting them with the info they likely came for).

In summary, tracking AI-driven traffic and citations requires a mix of traditional analytics, creative querying, and new tools. It’s a bit of a cat-and-mouse game right now, but by being proactive you can get a reasonable picture of how AI is contributing to your web presence. And by measuring it, you can start optimizing for it.

Limitations and the Future of Attribution in LLM Discovery

While LLMs open exciting new channels, they also pose challenges for attribution and content creators’ visibility. Let’s address some limitations of the current landscape and where things might head:

Limited Attribution and “Invisible” Impressions: Unlike search engines which at least show a URL or brand name in the results, many AI interactions give zero visibility to content sources unless the model is explicitly designed to cite (like Bing or Perplexity). If someone asks ChatGPT a question and it produces an answer derived from your content, the user might never know – your brand is essentially hidden behind the AI. This is problematic for content producers. It’s one reason publishers are urging AI developers to build in attribution. We do see progress in citation-friendly AI (again, Bing Chat’s approach is a positive example), but it’s not universal. Until there’s broader adoption of citation standards, some AI-driven “referrals” will leave no trace – no referral string, no click, just your information delivered by the AI. This is a major limitation if you’re trying to quantify the AI impact or get credit for your content.

Inconsistent Obedience to Standards: Another issue is that not all AI crawlers or tools will respect the same rules. For instance, you might put up an llms.txt or a noai meta tag and find that some obscure AI tool ignored it. Unlike search (dominated by a few players who usually adhere to robots.txt and standard protocols), the AI field has many new entrants, not all of whom play nice. Over time, we might see consolidation and the emergence of a de facto “AI robots policy” – perhaps an extension to robots.txt or adoption of llms.txt for not just content guidance but also permission signaling. In fact, it wouldn’t be surprising if the industry or regulators push for a standardized “do not train/use” flag that all responsible AI actors would honor.

Context and Misquoting Issues: Even when attribution exists, it’s only useful if the context is correct. LLMs can sometimes misquote or misattribute information – e.g. citing the right source for the wrong fact. This could mean your brand gets mentioned in an odd or incorrect context. Such errors are currently hard to prevent because they result from the model’s internal workings. Ensuring your content is clearly written and factual can help (since the AI is less likely to mix things up if the source text is unambiguous). Also, having multiple authoritative sources about you (official site, Wikipedia, news articles) that corroborate can reduce weird misattributions.

No Unified Dashboard: As mentioned, we lack a “Search Console for AI.” Webmasters are used to getting reports of search queries, impressions, and clicks from Google Search Console or Bing Webmaster. There’s nothing equivalent for ChatGPT or Alexa’s voice assistant or any number of AI systems. This leaves a data gap. The future direction likely includes some unified reporting – perhaps third-party aggregators or even collaborations where AI platforms voluntarily share data with publishers (for example, maybe OpenAI could one day provide sites a report of popular Q&A where their content was used, if privacy and IP issues can be sorted). Bing’s early experiment in potentially sharing ad revenue with cited publishers hints that big players are thinking about publisher relations. If AI answers become monetized (with ads or subscriptions), pressure will grow to give credit and even compensation to content sources. This could lead to formal attribution frameworks.

Emerging Solutions for Attribution: On the horizon, we see ideas like watermarking and embedding tracking. Watermarking involves subtly altering the generated text in a way that can be detected later (OpenAI has researched this for identifying AI-generated content). For publishers, a form of watermarking could be embedding hidden signals in your content that an AI might carry into its answer (this is theoretical and tricky). More concrete is embedding monitoring: since LLMs convert text into vectors (embeddings) to understand it, some tools might allow checking if your content’s vector is present in a model. This is advanced, but if you had access to the model’s vector space (say via an API), you could query if a chunk of your text is known to the model by seeing if it produces a similar embedding. Companies could use this to at least confirm “Yes, my data is in there.” That said, most AI companies don’t expose their model internals to that degree.

A simpler approach is what we discussed: regularly query the AI with things only your content says, to see if it regurgitates them. If it does, you know it has your data. Going forward, the AI industry might implement content provenance features. Imagine an AI answer where every sentence is backed by a source you can click – that would be ideal for attribution. Projects in the works (by academic and open-source communities) aim to make AI output more traceable to training data or sources, but it’s a hard technical challenge.

In summary, we’re in a transitional period. Attribution in AI-driven discovery is currently limited and inconsistent, but awareness is growing. Publishers want it, users arguably benefit from knowing sources, and AI developers want to build trust (citing sources improves trustworthiness of the answer). We can expect future AI systems to increase transparency. In the meantime, companies should do what they can: implement llms.txt and similar standards to explicitly mark their content and preferences, lobby (individually or through industry groups) for responsible AI usage of content, and closely watch this space. Those who experiment early with these attribution solutions will be ready to adapt as standards emerge.

New Opportunities for Content, Branding, and Organic Reach

It’s not all challenges – the rise of LLMs and AI assistants also creates new strategic opportunities for companies that are savvy about content. Here’s how embracing AI-driven discovery can amplify your brand and reach:

Become the Go-To Authority in AI Answers: In the traditional web, one goal of SEO was to become a featured snippet on Google – a quick answer box that often cemented you as an authority. The analogue in AI is to be the source the AI trusts. If you consistently appear as a cited source for answers on, say, “best project management tools” or “how to fix a leaky faucet” (whatever domain your content covers), you gain a reputation among users who see those answers. Even if they don’t click immediately, your name exposure is valuable. It’s a bit like being quoted in an article – it builds credibility. Brand mentions in AI outputs can drive awareness. Later, that user might specifically seek out your site or product because they recall it being recommended by an AI. Smart content strategy can leverage this: for example, a cooking site might ensure its recipe data is structured and accessible so that voice assistants (which use LLMs) often quote its recipes, making the site a household name via Alexa or Google Assistant.
New content formats and channels: Optimizing for AI may lead you to create content in formats you hadn’t before. Perhaps you’ll maintain a high-quality Q&A page (knowing that Q&A pairs are gold for LLM training and retrieval). Or you might produce succinct “AI digest” versions of articles (maybe as part of llms.txt or on a dedicated page) that summarize your long content – essentially feeding the AI the CliffsNotes. This can actually improve your human UX too (think of executive summaries). Additionally, we may see companies produce content specifically to appeal to AI as a distribution channel: e.g., data sets, FAQs, or tutorials released under open licenses that AI companies can ingest freely. This is analogous to how some companies open-sourced certain code or data for the goodwill and visibility it generated. If having your content widely used by AI is beneficial, some may choose to donate knowledge in exchange for attribution.
Long-tail reach and “answer partnerships”: AI assistants don’t have a single “first page” of results – they can draw from a mix of sources for different aspects of an answer. This means niche, long-tail content can find its way into answers even if it never ranked #1 on Google. For instance, a forum post or a small blog might hold the exact insight an LLM needs for a very specific user query. If your company has a deep knowledge base or a community forum, making that accessible to AI could let you capture long-tail queries that previously would never reach you because you weren’t on page one of Google. AI might aggregate multiple niche answers to provide a comprehensive response, thereby giving each niche source a slice of exposure. Think of it as democratization of reach – quality info, even from a lesser-known source, can be elevated by an AI if it’s the best answer to the question.
Enhanced user experiences with your own AI integrations: While not the main focus of this post, it’s worth noting: as you structure your content for external AI consumption, you’re also priming it for use in your own AI and chatbot interfaces. Many companies are now deploying chatbots on their sites or integrating with voice assistants. Having content in clean text formats, using llms.txt, and adding semantic markup means you can more easily plug your content into an in-house chatbot that, for example, answers customer support questions or guides users interactively. So there’s a dual benefit: you make it easier for external AI to feature you, and you can repurpose the same optimizations to power any conversational experiences you build for your audience.
Organic reach without traditional SEO competition: In some ways, AI answers level the playing field. A great piece of content can shine even if you don’t have the highest domain authority, because the AI is looking for the content that best satisfies the query, not just the most SEO-optimized page. If you focus on quality and depth, you might win in AI where you couldn’t in Google rank. This opens a strategic avenue: invest in extremely high-quality content and expertise, knowing that even if it doesn’t outrank big competitors on Google, it could be picked up by an AI seeking the best answer. Essentially, content becomes modular – an AI might take a paragraph from your site as the perfect explanation for something, even if your site as a whole isn’t top-ranked. That means every section of your content should be written to stand on its own merits.

All these opportunities point to one thing: the emergence of an “AI visibility stack” as part of marketing strategy. Just as companies have an SEO toolkit (keyword research, on-page optimization, link building, etc.), they will develop an AI visibility toolkit: structuring data for LLMs, monitoring AI mentions, optimizing content for natural language answers, and more.

Conclusion: Embrace the AI Visibility Stack – A Call to Action

The shift to AI-driven search and recommendations is not a future scenario – it’s here now, and it’s accelerating. For companies and marketers, this means expanding your playbook. In addition to traditional SEO, you need to proactively manage your AI visibility. This includes implementing technical standards like llms.txt to make your site AI-friendly, optimizing content so that AI agents recognize its value, and tracking the traffic and engagement coming from AI referrals.

For AI-native agencies and forward-thinking SEO teams, this is a golden opportunity. Just as the rise of Google created an industry for search engine optimization, the rise of ChatGPT, Bing Chat, and others creates a need for AI search optimization expertise. Agencies can develop services to help companies build out their AI visibility stack. This might include:

Auditing a company’s content and structure for LLM compatibility (and setting up files like llms.txt, schema markup, etc.).
Crafting strategies to increase a brand’s presence in AI training data and retrieval indices (through content partnerships, PR for authoritative citations, or ensuring inclusion in key data sources like Wikidata).
Setting up analytics solutions to track AI-driven traffic and mentions, and integrating those insights into overall marketing KPIs.
Continuously monitoring AI platforms for how their clients’ content appears, and adjusting tactics accordingly (much like monitoring search rankings and tweaking SEO).

The companies that move now will establish themselves in the “answer engines” of tomorrow. We’re looking at a new kind of organic reach – one where your content might reach users via an AI intermediary. To capitalize on it, marketing and SEO professionals must collaborate with developers and AI experts, breaking silos between SEO, data, and engineering teams. It’s time to treat AI not as a threat, but as the next frontier for growth.

Call to action: If you’re a business unsure how to navigate this shift, consider partnering with an agency or consultants who are fluent in both SEO and AI. AI-native agencies can help audit your current visibility, implement the llms.txt standard and other AI-friendly practices, and set up a monitoring system for AI referrals. They can train your team on prompt-based audits, and ensure your brand voice and content are accurately represented in AI outputs. In short, they’ll help you structure and track your AI visibility stack – turning what could be a disruptive change into a strategic advantage.

The age of AI-driven discovery is here. Just as we optimized for search engines, we must now optimize for answer engines. By doing so, you’ll not only protect your hard-earned content investments but also unlock new pathways for users to discover and trust your brand. Embrace the change, equip yourself with the right tools (and partners), and you’ll thrive in this new landscape where bots are the new browsers and every answer box is a chance to win hearts and minds.

Your next customer might not come through a search results page, but through an AI recommendation – let’s make sure you’re ready for them.

SEO in the Age of AI: Implementing llms.txt and Tracking LLM-Driven Traffic

As large language models (LLMs) like ChatGPT and Bing’s AI Chat become gateways for information, SEO is expanding beyond traditional search engines. Companies now face a new challenge: ensuring their content is visible and attributable in AI-generated answers. This post explores how to implement the emerging llms.txt standard (a sort of “robots.txt for AI”), why declaring AI permissions matters, and how to track traffic and engagement from LLMs. We’ll also discuss how SEO is evolving from blue links to AI recommendations, the limitations of current attribution, and the new opportunities this shift creates for content and branding.

What is llms.txt (The “Robots.txt” for AI)?

Figure: The llms.txt logo represents a new standard to help AI agents understand website content at a glance. Similar to how robots.txt guides search engine crawlers, llms.txt is a proposed text file that guides AI systems (LLMs) in using a site’s content. However, it serves a different purpose than robots.txt. Instead of listing which URLs to avoid, llms.txt is designed to expose your important content in a structured, LLM-friendly way. In other words, it’s about making content easier for AI to consume, rather than restricting it.

Robots.txt vs llms.txt: Robots.txt is a decades-old standard that tells search engine bots which pages not to crawl or index. By contrast, llms.txt is meant to tell AI models how to use your content, not just what to skip. It’s more akin to a mini sitemap or content guide specifically formatted for AI understanding. For example, robots.txt might prevent a bot from crawling your /admin folder, whereas llms.txt might highlight your key docs, FAQs, or product info that an AI should focus on. Unlike a sitemap.xml (which is just a list of URLs), llms.txt provides context and structure for each link. It’s written in plain Markdown, a format both humans and AI can easily read, stripped of the clutter of HTML or scripts.

How Does llms.txt Work?

The llms.txt file lives in your website’s root (e.g. yourwebsite.com/llms.txt). When an AI agent or crawler accesses this file, it finds a curated outline of your site’s most important information. The standard format includes:

H1 Title: The name of your site or project (this is the only required element).
Summary Block: A short description of the site enclosed in a blockquote, highlighting key information or purpose.
Main Sections (H2 headings): Each H2 introduces a category (for example “Guides”, “Products”, “Support”, etc.), under which you list important pages.
Bullet Lists of Links: Under each H2, a list of links to critical pages or documents, formatted as [Page Title](URL): optional brief notes. This gives the AI both the link and a hint of what’s there.
Optional Section: Optionally, a section (often titled “Optional”) can list lower-priority pages that can be skipped if the AI has limited context window.

In essence, llms.txt provides a condensed knowledge base for your site. Instead of crawling dozens of pages and parsing complex layouts, an AI can retrieve this single file to quickly learn the structure of your content. This is crucial because LLMs have limited context windows and can’t ingest your whole site at once. By surfacing the most important content in a clean format, you reduce the chance of the AI missing or misinterpreting your site’s key information.

Why Should Companies Implement llms.txt?

As AI-generated answers become more common, making your content accessible to LLMs is no longer optional. If an AI assistant is answering questions about your industry or products, you want it to draw from your content – and to do so accurately. Parsing raw HTML is often slow and error-prone for LLMs, especially when pages are filled with navigation menus, ads, or interactive elements. By providing a ready-made “cheat sheet” in llms.txt, you:

Surface important content quickly: The AI sees your top guides, product pages, or FAQs immediately, rather than wading through irrelevant pages.
Reduce errors and omissions: Structured links and summaries help the model retrieve exactly what it needs, reducing the risk of it hallucinating or using outdated info.
Improve brand presence in answers: When someone asks an AI, “How do I do X with [Your Product]?”, a well-curated llms.txt increases the likelihood that the AI’s answer will come from your official content.

Early adopters are already seeing the benefits. The llms.txt proposal was introduced in late 2024 by Jeremy Howard of Fast.ai, and gained traction when documentation platforms like Mintlify enabled automatic llms.txt generation in November 2024. Overnight, thousands of developer doc sites (including ones from Anthropic and Cursor) had llms.txt files in place. This rapid uptake in the tech community highlights a broader trend: companies that depend on accurate, up-to-date information (dev tools, knowledge bases, etc.) are leading the way, ensuring AI assistants get the best version of their content.

Implementing llms.txt on Your Website

Implementing llms.txt is straightforward. You can manually create the file, or use tools if your CMS or docs platform supports it. Here’s how to get started:

Create a plain Markdown file named llms.txt in your website’s root directory. (If your site is static or you have access to hosting, it’s just like creating a robots.txt or sitemap.xml, but in Markdown format.)
Add a top-level heading (H1) with your project or site name. For example: # MySite Documentation.
Write a one-paragraph summary of your site or product as a blockquote. In Markdown, a blockquote is indicated with a > at the start. This should be a concise description that gives context. Example: > Official docs and API reference for MySite, including integration guides and troubleshooting tips.

List your key content sections using H2 subheadings. Under each, list important pages with bullet points. Each bullet should be a Markdown link followed by a colon and a short note. For instance:

## Guides  
- [Getting Started](https://mysite.com/docs/intro): Step-by-step setup guide  
- [Integration](https://mysite.com/docs/integrations): How to connect our API with your app  

## Reference  
- [API Reference](https://mysite.com/docs/api): All API endpoints and parameters  
- [CLI Commands](https://mysite.com/docs/cli)

This structure tells an AI, “These are the main areas of my content, and here’s what you’ll find in each page.”

(Optional) Include an “Optional” or less important section: If there are lengthy pages or ancillary resources that might not fit in an LLM’s context window, you can mark them under a clearly labeled section (like ## Optional) at the end. The AI can skip these if it’s tight on space, focusing first on the core content.
Provide Markdown versions of content pages (if possible): The llms.txt proposal also suggests offering your content pages in Markdown format (e.g. accessible by adding .md to the URL). This is an advanced step, but it means an AI could retrieve a clean text version of any page directly. Some sites even provide an aggregated llms-full.txt containing all their content in one file. This isn’t mandatory, but it further streamlines AI access to your information.

Once your llms.txt is live, test it by visiting yourwebsite.com/llms.txt in a browser – you should see the raw Markdown. From here, any AI agents that know about this standard can fetch it. The community is actively discussing integrations (there’s a public GitHub and Discord where llms.txt is being evolved), and already directories like llmstxt.site and directory.llmstxt.cloud list hundreds of sites adopting it. By implementing it early, you signal that your site is “AI-ready,” which could give you an edge as more AI tools start looking for this file.

Declaring AI Permissions and Visibility: Why It Matters

Beyond just helping AI find your content, companies need to consider permissions – essentially, what usage of your content by AI is allowed – and how visible your content is within AI platforms. In the past year, we’ve seen web publishers awaken to the reality that AI models are crawling and training on their content. Some responded by blocking AI crawlers (for example, OpenAI’s GPTBot can be disallowed via robots.txt directives), or by using meta tags like “noai” to opt out of AI training. But blocking outright may mean forfeiting a new traffic stream. This is where a balanced approach like llms.txt comes in: it’s a way of explicitly allowing and guiding AI usage on your terms.

Think of llms.txt as a permission and guidance document for AI. By creating one, you’re effectively saying: “Here is the content we’re okay with AI using, and here’s the best way to use it.” It’s an opportunity to declare what information can be used by AI search engines or assistants. Instead of letting a crawler loose on your site (which might result in high server load or it grabbing irrelevant data), you pre-package the important stuff. This not only helps the AI give better answers but also protects you from misuse – you’re not exposing every page, only what you choose to highlight.

From a strategic standpoint, declaring permissions and being transparent about AI access can enhance your brand’s credibility. Users and regulators are increasingly concerned about how AI gets its data. A company that openly provides an llms.txt is effectively opting in to the AI ecosystem in a controlled way. It shows you’re willing to share knowledge (for instance, a software company sharing its documentation), but you’re doing so with clear boundaries. In contrast, a company that tries to hide from AI might miss out on being featured in AI-driven recommendations or answers.

Moreover, visibility in AI models matters. If your competitors’ content is being indexed or summarized by AI and yours isn’t, you could be invisible in a whole class of user queries. Imagine a user asks an AI assistant for “the best CRM software for small business” – if that AI has ingested and understood a competitor’s product info (perhaps via their llms.txt or other means) but not yours, the answer may exclude you. Ensuring you’re present in the model’s knowledge means declaring yourself open for AI business. It’s similar to the early days of SEO: companies that allowed search engines to crawl their sites (and optimized for it) gained an advantage over those who were invisible to Google. Today, being visible to AI models and agents could determine whether your brand is mentioned at all in AI-driven conversations.

In short, explicitly managing your AI visibility is now as important as managing your search engine visibility. Through measures like llms.txt (for guidance) and proper crawler directives or meta tags (for permission), you can strike the right balance between protecting your content and amplifying it via AI.

How LLMs Use Your Site’s Content (ChatGPT, Claude, Bing, Perplexity, etc.)

Not all AI systems gather and use content in the same way. It’s important for SEO professionals to understand how different LLM-powered services access web content – and thereby how your site can appear (or not appear) in their outputs.

LLMs via Training Data (ChatGPT, Claude): Models like OpenAI’s ChatGPT (particularly the older versions) and Anthropic’s Claude are trained on vast datasets that include a snapshot of the web. If your site was part of that training data (e.g. scraped in 2021 for GPT-4’s cut-off, or in 2023 for newer models), the model might have knowledge of your content. However, responses from these models usually don’t cite sources, and the content is baked into the model’s memory. This means your site’s info could be used without attribution, and possibly out-of-date if you’ve changed since the training snapshot. You have limited control here beyond opting out of training crawlers or providing updates via something like plugins. Notably, OpenAI has offered a way to opt-out of training (via robots.txt disallow for GPTBot), but if you opt out, you also opt out of being part of the model’s answers. It’s a trade-off between presence vs. privacy.
Retrieval-based AI (Bing Chat, Google’s Bard/SGE, Perplexity, etc.): These systems fetch information from the web at query time, similar to a search engine, and then have the LLM formulate an answer. For example, Bing Chat uses Bing’s search index to retrieve content and always cites its sources in the answer. If a user asks something, Bing Chat will show snippets from web pages (including yours) and list the source domains or titles as footnotes. Perplexity.ai works in a similar fashion: it performs a web search for you and generates an answer with footnoted citations linking to the sources. In fact, Perplexity is considered a gold standard for citation transparency – every answer clearly links to where it got the info. For your site, this means if you rank well for a given query (SEO still matters!), an AI like Bing or Perplexity might pick up your page and directly quote or summarize it, then drive the user to your site via a citation link.
Hybrid approaches (ChatGPT with browsing, plugins, others): Some LLMs combine both. ChatGPT, for instance, introduced a browsing mode that uses Bing’s API to fetch current information when a user explicitly asks for it. If a user with ChatGPT enabled browsing or a specific plugin asks a question, ChatGPT might pull your site’s content in real time (respecting robots.txt rules). This is more like a mini-search engine within ChatGPT. Other tools, like certain browser extensions or assistant apps, will do behind-the-scenes retrieval from sites (they might use APIs like Google’s or Bing’s to find relevant URLs and then scrape content). These will parse your page, often stripping HTML tags, and feed chunks of text to an LLM to generate an answer. Here, having clean, structured content (with semantic HTML or an available llms.txt) can influence whether the AI accurately understands your page.

Why does visibility in these models matter? Because increasingly, users are bypassing traditional search results and going straight to these AI agents. Millions of users now rely on ChatGPT-style Q&A or multi-modal search engines to get answers. If your site isn’t part of that ecosystem, you lose mindshare. It’s analogous to not being listed on Google a decade ago. We’re already seeing companies get traffic from these AI sources: early data shows anywhere from 0.5% to 3% of website traffic now coming from LLM-based sources, even in 2024. That number might seem small, but it has been climbing steadily – and it’s expected to grow exponentially as AI tools go mainstream on every device. In fact, projections suggest LLM-driven search could jump from a fraction of a percent of queries in 2024 to as much as 10% by the end of 2025. This is a tectonic shift in how people discover content.

Being visible in LLM answers means a few things for your business: your information is reaching users even when they don’t visit your site directly, your brand can be mentioned as an authority (if the AI cites or describes the source), and you have an opportunity to capture clicks from those answers (when users want to “learn more”). But it also means you need to take care that the AI is getting correct information. If your site is out-of-date or the AI picks content from someone else describing your brand, the answer might be wrong. By using tools like llms.txt and keeping your content optimized for AI retrieval, you improve the odds that your voice is the one the model uses.

From Search Engine Results to AI Recommendations: The Evolution of SEO

The rise of LLMs is transforming classic SEO into something new. We’re moving from a world where success is being Rank #1 in a list of links, to a world where success is being the source an AI cites or the recommendation an AI assistant gives. In other words, we are shifting from search engines to “answer engines”. As one AI SEO expert put it, the future of discovery isn’t a blue link — it’s a bot. Instead of a user scanning a page of results, an AI agent will synthesize the information and present a single (or a few) answers. So what does that mean for those of us in SEO and content strategy?

Generative Answers Reduce Clicks (The Zero-Click Phenomenon): AI answers often provide so much information that the user may not feel the need to click through to a website at all. This is an extension of the “zero-click search” trend we saw with featured snippets on Google. Now an AI might answer, say, “What’s the best running shoe under $100?” by directly naming a product and summarizing reviews, all without the user visiting a single website. For businesses, this means your content could be driving the answer but not the click. It raises the question: how do we measure success when impressions (being mentioned) become as important as clicks? We must broaden our KPIs to include metrics like brand mentions or AI citations, not just site visits.
“Answer Engine Optimization” (AEO) is emerging: Ensuring that an AI cites you or uses your data is the new game. Some are dubbing this Generative Engine Optimization, because it’s about optimizing for generative AI results. This might involve new tactics: structuring your content semantically, using schema markup to reinforce your authority (e.g., marking your content with proper schema.org types, author info, etc.), and focusing on entity presence (making sure your brand or key topics are well-defined across the web). For instance, if you’re a notable entity on Wikipedia or have lots of authoritative backlinks, LLMs are more likely to “know” about you and include you in answers. LLMs don’t use traditional keyword-based ranking, but they do rely on the semantic understanding of content and the signals of authority available (such as being mentioned in trusted sources).
The role of citations and trust: Unlike regular search where the user decides which link to click, in AI answers the AI is deciding which sources to trust and present. LLM-based systems use retrievers and scoring algorithms to pick what content to draw from. They tend to favor sources that the model deems authoritative or that score high for relevance and credibility. This means that brand authority and content quality become even more critical. If your site has a history of expertise (e.g., being cited elsewhere, or containing unique insights), it’s more likely to be chosen by the AI’s retrieval algorithm. We’re basically seeing a convergence of SEO with PR and thought leadership: getting cited in high-authority places (like academic papers, respected news sites, or well-known industry resources) can influence AI outputs. In other words, citations beget citations – much like in human SEO where backlinks beget higher ranking.
Continuous content updates and accuracy: Because AI models might rely on a mix of training data and live data, it’s crucial to keep your information up-to-date. For example, if an AI’s training data knows about your company up to 2022, but you’ve since launched new products or changed pricing, an AI might give outdated info unless it fetches updates from your site. This is driving an evolution in content strategy: content needs to be written not just for human readers and search bots, but also for AI comprehension. Clarity, factual correctness, and having a concise summary (like in llms.txt or on-page) help ensure the AI doesn’t get things wrong.