llms.txt, robots.txt & Schema Markup: The Technical…

Last updated: March 30, 2025 | By Janis Grinhofs, Gründer & CGO, Bavaria AI (formerly yoummday)

Technical GEO (Generative Engine Optimization) is the practice of configuring a website’s infrastructure so that AI systems — ChatGPT, Perplexity, Google AI Overviews, Claude, and Gemini — can crawl, parse, and confidently cite its content. The three pillars are access control (robots.txt and llms.txt), machine-readable meaning (Schema Markup), and fast discovery (IndexNow). A 730-site citation study by Growth Marshal found that pages with attribute-rich schema markup earned a 61.7% citation rate — compared to just 41.6% for pages using generic schema and 59.8% for pages with no schema at all. Implementing all three pillars together is not optional for any site that wants to compete for AI-generated traffic in 2026.

This guide walks through every technical step developers need to take, with copy-paste code examples for each one.

Why Does Technical GEO Matter for AI Visibility?

Technical GEO matters because AI systems do not rank content the way classic search engines do. Instead of PageRank-style link analysis, they rely on a crawl-parse-cite pipeline: a crawler collects your HTML, a parser extracts structured signals, and a language model decides whether your page is a credible, citable source for a given query. If any step in that pipeline breaks — because a bot is blocked, a schema type is absent, or the HTML is semantically opaque — the page is simply not in the running.

The business case is sharp. Research cited by Averi shows that brands cited inside Google AI Overviews earn 35% more clicks than those appearing only in the traditional organic results below. And AI Overviews now appear on 50–60% of U.S. searches — up from just 6.49% in January 2026. Structured data alone increases AI visibility by up to 30% according to the same source. For developers, the implication is clear: technical implementation is no longer a backend hygiene task. It is a direct revenue lever.

If you want a baseline measurement before diving into implementation, the Bavaria AI GEO Score Check gives you a 0–100 rating across six AI platforms in minutes. Our own website moved from a GEO Score of 22 to 88 after implementing the measures described in this article — a real-world proof point covered in detail in our GEO Score case study.

What Is llms.txt and How Do You Implement It?

llms.txt is a proposed open standard — published at llmstxt.org by Jeremy Howard at Answer.AI in September 2024 — that places a Markdown-formatted file at the root of a website to give large language models a structured overview of the site’s content at inference time. Unlike robots.txt, which is about access permissions, llms.txt is about understanding: it tells an LLM what the site is, what content categories exist, which pages are most important, and how to use the site’s resources responsibly.

The file lives at yourdomain.com/llms.txt. Only one section is technically required — an H1 with the project or site name — but a complete implementation includes a blockquote description, additional detail sections, and file lists linking to the most important pages (ideally in Markdown format). Platforms including Mintlify and GitBook auto-generate llms.txt; WordPress users can add it manually or via a dedicated plugin.

llms.txt File Structure

The specification defines the following order of sections:

H1 heading — name of the project or site (required)
Blockquote — a short summary containing the key information an LLM needs to understand the site (recommended)
Additional detail sections — paragraphs or lists with no subheadings, providing context such as audience, purpose, and unique value proposition
File lists (H2 sections) — labelled collections of URLs with optional notes, using the format [Page Title](url): brief description

Complete llms.txt Example

# Bavaria AI

> Bavaria AI (Bavarian Crypto Labs GmbH) is a Munich-based GEO agency
> helping DACH e-commerce companies become visible in AI-generated search
> results on ChatGPT, Perplexity, Google AI Overviews, Claude, and Gemini.

Bavaria AI offers Generative Engine Optimization (GEO), AI consulting,
process optimization, and sales automation. The team consists of
Lion Harisch, Thomas Wallner, and Janis Grinhofs (alumni of yoummday,
a €250M tech scale-up). All content is published in German and English.

## Core Services

- [GEO Agency](https://bavaria-ai.com/geo-agentur/): Overview of our GEO services
- [AI Consulting Munich](https://bavaria-ai.com/ki-beratung-muenchen/): AI advisory for DACH businesses
- [GEO Score Check](https://bavaria-ai.com/geo-score-check/): Free AI visibility assessment

## Key Guides

- [What Is GEO?](https://bavaria-ai.com/what-is-geo-agency/): Definition and introduction to Generative Engine Optimization
- [ChatGPT SEO Guide](https://bavaria-ai.com/chatgpt-seo/): How to optimize for ChatGPT citations
- [Perplexity SEO Guide](https://bavaria-ai.com/perplexity-seo/): How to appear in Perplexity answers
- [AI Visibility Guide 2026](https://bavaria-ai.com/ai-seo-guide-2026/): Comprehensive GEO strategy

## Optional

- [llms-full.txt](https://bavaria-ai.com/llms-full.txt): Extended version with full page content

llms-full.txt: The Extended Version

The specification also defines a companion file, llms-full.txt, which contains the full text of all pages in a single Markdown document. This is particularly valuable for LLMs operating in retrieval mode. Mintlify co-developed this format with Anthropic, and it is now part of the official proposal. For smaller sites, generating llms-full.txt manually is feasible; for larger sites, an automated build step that concatenates page Markdown exports is more practical. The file should also respect authentication — pages behind login should not appear in the public llms-full.txt.

Finally, the spec recommends making individual pages available in clean Markdown at yourdomain.com/page-slug.md. This gives AI systems a zero-noise version of each page without sidebar, navigation, or footer content cluttering the extraction.

How Should You Configure robots.txt for AI Crawlers?

robots.txt for AI visibility is not simply about allowing or blocking bots — it is about distinguishing between AI training crawlers (which collect content to retrain models) and AI search crawlers (which collect content to generate live answers for users). For most publishers, the goal is to allow AI search crawlers unconditionally and to make a deliberate decision about training crawlers. The critical mistake is omitting bots entirely: if an AI crawler’s user-agent is not listed, some platforms interpret a missing rule as a block, while others treat it as an allow. Research by Fuel Online identifies at least 14 AI user-agents that require explicit Allow rules.

Two agents require special attention for OpenAI coverage. As noted by Fuel Online, both GPTBot (training) and OAI-SearchBot (live search citations in ChatGPT) need separate rules — blocking one does not affect the other. Similarly, Anthropic operates three agents: anthropic-ai for bulk training, ClaudeBot for chat citation fetches, and Claude-User for real-time browsing. Missing any one of these leaves a gap in your AI visibility.

Complete AI-Ready robots.txt Template

# ============================================================
# robots.txt — AI Crawler Configuration
# Last updated: 2026-03-30
# ============================================================

# --- OPENAI ---
# OAI-SearchBot: indexes pages for ChatGPT Search (citations, NOT training)
User-agent: OAI-SearchBot
Allow: /

# GPTBot: training crawler for GPT models
# Allow if you want your content in future model training
User-agent: GPTBot
Allow: /
Disallow: /wp-admin/
Disallow: /private/

# ChatGPT-User: user-triggered browsing from ChatGPT and Custom GPTs
User-agent: ChatGPT-User
User-agent: ChatGPT-User/2.0
Allow: /

# --- ANTHROPIC (Claude) ---
User-agent: anthropic-ai
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Claude-User
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: claude-web
Allow: /

# --- PERPLEXITY ---
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /

# --- GOOGLE (Gemini / AI Overviews) ---
User-agent: Googlebot
Allow: /
User-agent: Google-Extended
Allow: /

# --- MICROSOFT (Bing / Copilot) ---
User-agent: Bingbot
Allow: /
User-agent: msnbot
Allow: /

# --- AMAZON (Alexa) ---
User-agent: Amazonbot
Allow: /

# --- APPLE ---
User-agent: Applebot
Allow: /
User-agent: Applebot-Extended
Allow: /

# --- META ---
User-agent: FacebookBot
Allow: /
User-agent: meta-externalagent
Allow: /

# --- BYTEDANCE ---
User-agent: Bytespider
Allow: /

# --- DUCKDUCKGO ---
User-agent: DuckAssistBot
Allow: /

# --- COHERE ---
User-agent: cohere-ai
Allow: /

# --- RESEARCH / COMMON CRAWL ---
User-agent: AI2Bot
Allow: /
User-agent: CCBot
Allow: /
User-agent: Diffbot
Allow: /

# --- EMERGING AI SEARCH ---
User-agent: YouBot
Allow: /
User-agent: TimpiBot
Allow: /

# --- FALLBACK ---
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-login.php

# --- SITEMAP ---
Sitemap: https://yourdomain.com/sitemap.xml

Training vs. Search Crawlers: Decision Guide

User-Agent	Platform	Purpose	Recommended Rule
`OAI-SearchBot`	OpenAI / ChatGPT	Live search citations	Allow: /
`GPTBot`	OpenAI	Model training	Allow: / (or Disallow if you object to training use)
`ClaudeBot`	Anthropic / Claude	Chat citation fetch	Allow: /
`anthropic-ai`	Anthropic	Bulk model training	Allow: / (or Disallow if you object)
`PerplexityBot`	Perplexity	Index builder	Allow: /
`Google-Extended`	Google	Gemini / AI Overviews training	Allow: / (blocks Google AI features if Disallowed)
`Bingbot`	Microsoft	Bing index (feeds Copilot)	Allow: /
`Amazonbot`	Amazon	Alexa / Rufus AI	Allow: /
`Applebot-Extended`	Apple	Apple Intelligence training	Allow: /

For deeper background on how each AI platform processes crawled content differently, see our guide to GEO vs. AIO vs. LLMO — what the differences mean for your strategy.

Which Schema Markup Types Drive AI Citations?

Schema Markup is JSON-LD code added to a page’s <head> (or inline in the body) that translates human-readable content into a structured format AI systems can parse without ambiguity. The decisive evidence comes from a 730-site citation study by Growth Marshal: pages with attribute-rich schema achieved a 61.7% citation rate, while pages with generic (incomplete) schema lagged at 41.6% — a 20-percentage-point gap that represents the difference between being cited and being invisible. Zero schema produced a 59.8% rate, which suggests that absent schema is less harmful than poorly implemented schema that sends wrong signals.

For a full deep dive into schema strategy specifically for AI search engines, see our dedicated guide: Schema Markup for AI Search Engines.

1. Article Schema — The Foundation

Every blog post and guide should carry Article or BlogPosting schema. The critical attributes that push citation rates up are author (with Person sub-type including name, url, and jobTitle), datePublished, dateModified, publisher, and mainEntityOfPage. Omitting any of these reduces the schema to „generic“ territory — the 41.6% citation bucket.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "llms.txt, robots.txt & Schema Markup: The Technical GEO Checklist",
  "description": "A complete technical implementation guide for developers who want their content cited by ChatGPT, Perplexity, and Google AI Overviews.",
  "author": {
    "@type": "Person",
    "name": "Janis Grinhofs",
    "jobTitle": "Gründer & CGO",
    "url": "https://bavaria-ai.com/team/janis-grinhofs/"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Bavaria AI",
    "url": "https://bavaria-ai.com",
    "logo": {
      "@type": "ImageObject",
      "url": "https://bavaria-ai.com/logo.png"
    }
  },
  "datePublished": "2026-03-30",
  "dateModified": "2026-03-30",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://bavaria-ai.com/technical-geo-checklist/"
  },
  "image": "https://bavaria-ai.com/images/technical-geo-checklist.jpg",
  "inLanguage": "en"
}
</script>

2. FAQPage Schema — The Highest-Impact Schema Type

FAQPage schema is the most powerful schema type for AI citations, according to WPRiders research. It matches exactly how AI systems deliver answers: a question-answer pair that can be extracted, summarized, and served directly. Stackmatix recommends keeping individual answers between 40–60 words for optimal extraction. Every answer should be self-contained — no pronouns referencing earlier paragraphs.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is llms.txt and why do I need it?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "llms.txt is a Markdown file at the root of your website that gives large language models a structured overview of your content at inference time. It helps AI systems understand your site's purpose, key pages, and content categories — improving the likelihood that your content is cited in AI-generated answers."
      }
    },
    {
      "@type": "Question",
      "name": "Which robots.txt user-agents do I need for AI crawlers?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "You need explicit rules for at least 14 AI user-agents, including GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, Bingbot, Amazonbot, Applebot, Applebot-Extended, meta-externalagent, DuckAssistBot, and cohere-ai."
      }
    }
  ]
}
</script>

3. Organization Schema — Brand Entity Establishment

Organization schema on your homepage or About page establishes your brand as a named entity in AI knowledge graphs. AI systems use entity recognition to match queries to sources; without explicit Organization schema, your brand may not be reliably identified. Include name, url, logo, sameAs (links to LinkedIn, Crunchbase, Wikidata), address, and contactPoint.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Bavaria AI",
  "alternateName": "Bavarian Crypto Labs GmbH",
  "url": "https://bavaria-ai.com",
  "logo": "https://bavaria-ai.com/logo.png",
  "description": "Munich-based Generative Engine Optimization agency helping DACH e-commerce companies achieve AI visibility.",
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "Munich",
    "addressCountry": "DE"
  },
  "sameAs": [
    "https://www.linkedin.com/company/bavaria-ai",
    "https://twitter.com/bavariaai"
  ]
}
</script>

4. HowTo Schema — For Step-by-Step Guides

HowTo schema structures procedural content in a way that AI systems can extract and present as step-by-step instructions. Stackmatix’s 2026 structured data guide recommends numbering steps explicitly and keeping each step to one or two sentences. Google AI Overviews and Bing Copilot both surface HowTo content prominently for instructional queries.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to implement llms.txt on your website",
  "description": "Step-by-step guide to creating and deploying an llms.txt file for AI visibility.",
  "step": [
    {
      "@type": "HowToStep",
      "position": 1,
      "name": "Create the file",
      "text": "Create a plain text file named llms.txt in Markdown format. Add an H1 heading with your site name as the only required element."
    },
    {
      "@type": "HowToStep",
      "position": 2,
      "name": "Add a blockquote description",
      "text": "Below the H1, add a Markdown blockquote (starting with >) with a 2-3 sentence summary of your site's purpose and key information."
    },
    {
      "@type": "HowToStep",
      "position": 3,
      "name": "Add file lists",
      "text": "Create H2 sections grouping your most important pages by category, with each link in the format [Page Name](URL): brief description."
    },
    {
      "@type": "HowToStep",
      "position": 4,
      "name": "Deploy at root",
      "text": "Upload the file to your web server root so it is accessible at yourdomain.com/llms.txt. Verify access in a browser before submitting."
    }
  ]
}
</script>

5. Speakable Schema — Marking AI Extraction Targets

Speakable schema marks specific sections of a page as the best candidates for extraction by AI and voice engines. According to Fuel Online, Speakable is designed precisely to tell AI systems: „this passage is a clean, standalone answer — extract it.“ With 62%+ of searches now involving voice or conversational AI interfaces, Speakable is increasingly relevant beyond classic voice assistants. Google’s Speakable documentation notes it is currently in beta but supported for Google Assistant-enabled content.

Speakable works by referencing CSS selectors or XPath expressions that point to the citable passages:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "name": "Technical GEO Checklist for Developers",
  "speakable": {
    "@type": "SpeakableSpecification",
    "cssSelector": [
      ".answer-capsule",
      ".speakable-intro",
      "h2 + p"
    ]
  },
  "url": "https://bavaria-ai.com/technical-geo-checklist/"
}
</script>

In practice, apply a class like answer-capsule to every opening paragraph directly below an H2 heading — those are the passages most likely to match user queries and should be explicitly surfaced for AI extraction.

Schema Type Priority Matrix

Schema Type	AI Citation Impact	Priority	Where to Place
FAQPage	Highest	1 — Essential	Every page with Q&A content
Article / BlogPosting	High	1 — Essential	All blog posts, guides, news
HowTo	High	1 — Essential for tutorials	Step-by-step content
Organization	Medium-High	2 — Implement early	Homepage, About page
Speakable	Medium	2 — Important for voice/AI	Articles, landing pages
BreadcrumbList	Medium	3 — Contextual signals	All pages with breadcrumbs
Product / Offer	High (e-commerce)	1 (for product pages)	Product pages
LocalBusiness	High (local)	1 (for local services)	Homepage, Contact page

How Does Semantic HTML Affect AI Citation Rates?

Semantic HTML is the use of HTML elements for their intended meaning rather than visual effect: <article> wraps the main content, <section> divides thematic blocks, <h1>–<h6> establish a clear heading hierarchy, <p> marks paragraphs, and <nav> labels navigation. For AI systems, semantic HTML reduces ambiguity during parsing: a page with dozens of nested <div> elements requires significantly more inference to identify the main content than a page that wraps it in <article> and structures it with labelled headings.

As Barry Adams writes in SEO for Google News: „It’s much simpler for ChatGPT to parse a few dozen semantic HTML tags rather than several hundred nested <div> tags to find a webpage’s main content.“ This directly affects citation eligibility — if an LLM cannot reliably locate the core content of a page, it will not cite it.

Definition Lists: An Underused AI Optimization

One of the highest-leverage semantic HTML optimizations is the definition list. Research by Yotpo found that HTML definition lists (<dl>, <dt>, <dd>) are 30–40% more likely to be cited than standard paragraphs. This makes sense: the <dt>/<dd> pattern maps directly onto the question-answer structure AI systems use to generate responses.

<dl>
  <dt>llms.txt</dt>
  <dd>A Markdown file at the root of a website that provides large language models with a structured overview of the site's content, purpose, and key pages at inference time.</dd>

  <dt>GEO (Generative Engine Optimization)</dt>
  <dd>The practice of optimizing website content and technical infrastructure to earn citations in AI-generated answers from systems such as ChatGPT, Perplexity, Google AI Overviews, Claude, and Gemini.</dd>

  <dt>OAI-SearchBot</dt>
  <dd>OpenAI's crawler dedicated to indexing web content for live ChatGPT Search citations — distinct from GPTBot, which is used for model training.</dd>
</dl>

Core Semantic HTML Rules for GEO

One <h1> per page. It should match the headline in your Article schema. Multiple H1s confuse AI parsers about the primary topic.
Wrap all main content in <article>. This explicitly excludes navigation, sidebars, and footer content from the parseable body.
Use <section> with aria-labelledby pointing to the section’s H2 heading. This creates named content blocks that AI can address individually.
Never use <div> or <span> as paragraph replacements. Use <p> for all prose.
Use <time datetime="YYYY-MM-DD"> for all publication and modification dates. AI systems use machine-readable dates to evaluate content freshness.
Use <blockquote> for sourced quotes with a cite attribute or an explicit citation link. This marks quotations as distinct from the author’s own claims.
Use <code> and <pre> for all code examples. AI systems recognize these as technical content and cite them more readily in developer-audience answers.
Use <figure> and <figcaption> for all images and diagrams. The figcaption becomes indexable descriptive text.
Ensure server-side or static rendering of core content. AI crawlers often do not execute JavaScript, so content loaded via client-side JS may be invisible to them.

For a broader view of how AI systems evaluate content structure, see our AI SEO guide and our detailed article on how to measure and improve AI visibility.

What Is the IndexNow Protocol and Should You Use It?

IndexNow is an open push protocol, created by Microsoft and Yandex, that lets publishers instantly notify participating search engines when content is published or updated. Instead of waiting for a crawler to rediscover a changed URL on its next scheduled visit — which can take days or weeks — IndexNow sends a single API call that reaches all participating engines simultaneously. As of early 2026, Pressonify reports that the protocol handles 5+ billion URL submissions daily across 80+ million websites, with 22% of clicked URLs in Bing results now coming via IndexNow.

The GEO relevance is direct: ChatGPT’s web browsing relies on Bing’s indexing infrastructure. Strategic Nerds notes that faster Bing indexing translates to faster ChatGPT citation potential. Perplexity is extremely recency-sensitive — its citation decay for trending topics can begin within 48–72 hours — making rapid indexing a direct competitive advantage. Note that Google does not currently support IndexNow; its indexing pathway remains the Caffeine crawler and the (limited) Google Indexing API.

IndexNow Implementation: Three Steps

Generate an API key. The key is a random alphanumeric string (minimum 8 characters, maximum 128). You can generate one at indexnow.org. Save it as a text file at yourdomain.com/[your-key].txt — this lets search engines verify ownership.
Submit URLs via API. On each publish or update event, send a POST request to the IndexNow endpoint.
Automate via your CMS. WordPress users with Yoast SEO, Rank Math, or Microsoft’s official IndexNow plugin get automatic submissions out of the box. The WordPress IndexNow ecosystem surpassed 10 million active installations in July 2026.

IndexNow API: Minimal Implementation

POST https://api.indexnow.org/indexnow
Content-Type: application/json; charset=utf-8

{
  "host": "bavaria-ai.com",
  "key": "your-api-key-here",
  "keyLocation": "https://bavaria-ai.com/your-api-key-here.txt",
  "urlList": [
    "https://bavaria-ai.com/technical-geo-checklist/",
    "https://bavaria-ai.com/schema-markup-ki-suchmaschinen/"
  ]
}

A single request can include up to 10,000 URLs. The response is a 200 OK on success. Participating engines that receive the notification include Bing, Yandex, Naver, and Seznam — one submission notifies all of them simultaneously.

IndexNow vs. Google Search Console

Protocol	Engines Covered	Google Support	ChatGPT Benefit	Setup Complexity
IndexNow	Bing, Yandex, Naver, Seznam	No	Indirect (via Bing → ChatGPT)	Low
Google Search Console (manual)	Google only	Yes	Via Google AI Overviews	Manual only
Google Indexing API	Google only	Yes (limited)	Via Google AI Overviews	Medium (OAuth)
XML Sitemap	All engines	Yes	Baseline (slow)	Low

The recommendation is to implement IndexNow and maintain a fresh XML sitemap. They are complementary: IndexNow handles real-time push notifications; the sitemap provides a complete inventory for full re-crawl cycles.

The Developer’s Priority Checklist

The following checklist is organized in three tiers. Tier 1 items have the highest impact on AI citation rates and should be completed before any content optimization work begins. Tier 2 provides material improvements to citation quality and breadth. Tier 3 items are refinements that compound over time.

Tier 1: Access & Discovery (Complete First)

robots.txt includes explicit Allow rules for all 14+ AI user-agents (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, Bingbot, Amazonbot, Applebot, Applebot-Extended, meta-externalagent, DuckAssistBot, cohere-ai)
Training crawlers (GPTBot, anthropic-ai, Google-Extended) have a deliberate Allow or Disallow decision — no ambiguous omissions
/llms.txt file deployed at root with H1, blockquote description, and file lists for top pages
XML sitemap is current, submitted to Google Search Console and Bing Webmaster Tools
IndexNow implemented and triggering on every publish/update event
Core page content is in server-rendered HTML (not client-side JS only)

Tier 2: Schema Markup (Highest ROI Implementation)

Article / BlogPosting schema on every post with author Person, publisher Organization, datePublished, dateModified
FAQPage schema on every page with Q&A content (answers 40–60 words, self-contained)
Organization schema on homepage with name, url, logo, sameAs (LinkedIn, Crunchbase, Wikidata)
HowTo schema on all step-by-step guides
Speakable schema referencing CSS selectors for key answer passages
BreadcrumbList schema on all deep-linked pages
All schema validated via validator.schema.org and Google Rich Results Test

Tier 3: Semantic HTML & Structural Refinements

One <h1> per page, matching Article schema headline
All main content wrapped in <article>
Sections use <section aria-labelledby="..."> referencing H2 IDs
Definition lists (<dl>/<dt>/<dd>) used for all term definitions
Publication and modification dates use <time datetime="YYYY-MM-DD">
Answer capsule paragraphs tagged with class for Speakable CSS selector targeting
/llms-full.txt generated with full page content
Individual pages available as clean Markdown at /page-slug.md
All external citations link to primary sources (not secondary aggregators)
Author bios include name, title, credentials, and a link to an author page

For a structured approach to measuring your current state against these criteria, the Bavaria AI GEO Score Check audits your site across all six major AI platforms and produces a prioritized action list. Our AI visibility guide for companies covers the content strategy layer that sits on top of this technical foundation.

Key insight: The 730-site citation study found the worst-performing cohort was not sites with no schema — it was sites with generic schema. Implementing FAQPage with incomplete answers, or Article schema without a named author, actively signals low quality to AI ranking systems. Partial implementation can be worse than no implementation. Audit before you deploy.

Frequently Asked Questions

Is llms.txt an official standard or just a proposal?

llms.txt is a proposed standard, not a ratified specification. It was published by Jeremy Howard at Answer.AI in September 2024 at llmstxt.org and has since been adopted by major documentation platforms including Mintlify and GitBook. Major AI companies have not formally committed to honouring the file, but the format is structured, widely supported, and represents the leading community approach to giving LLMs structured site overviews. Implementing it carries zero downside risk and material potential upside.

What happens if I block GPTBot but allow OAI-SearchBot?

Blocking GPTBot prevents OpenAI from using your content for future model training, but does not prevent your pages from appearing as cited sources in ChatGPT Search results. OAI-SearchBot handles live citation indexing independently of GPTBot. The two user-agents serve entirely different pipelines. You can disallow GPTBot on ethical or business grounds while still appearing in ChatGPT answers by allowing OAI-SearchBot — but both rules must be explicitly present in robots.txt.

Does Google support the IndexNow protocol?

No. As of March 2026, Google has not adopted IndexNow despite testing the protocol since October 2021. Google continues to use its Caffeine crawler infrastructure and the Google Indexing API (restricted to job postings and livestream events). For Google AI Overviews, the fastest indexing pathway remains Google Search Console’s URL Inspection tool for individual pages, combined with a well-maintained XML sitemap for broader crawl coverage.

Why does generic schema perform worse than no schema at all?

The Growth Marshal 730-site citation study showed that pages with attribute-rich schema earned a 61.7% citation rate, pages with no schema earned 59.8%, but pages with generic (incomplete) schema earned only 41.6%. The likely explanation is that incomplete schema creates contradictions: a page may claim to be an Article but have no author, or include FAQPage markup with answers that are too short to be useful. AI systems may interpret these inconsistencies as low-quality signals — worse than a page that makes no structured claims at all. The lesson is to implement schema completely or not at all for a given type.

Which schema type has the highest AI citation impact?

FAQPage schema consistently produces the highest citation rates among all schema types, according to research from WPRiders and Stackmatix. This is because the question-answer format of FAQPage maps directly onto how AI systems generate responses. The key implementation requirement is that each answer must be self-contained (understandable without surrounding context) and between 40–60 words — long enough to be informative, short enough to be extracted cleanly.

How should I handle protected or private content in llms.txt?

Private or authentication-gated content should not appear in the public llms.txt or llms-full.txt files. If your site has member-only sections, create the llms.txt to list only publicly accessible pages. For llms-full.txt, apply the same logic: only concatenate content that is freely accessible without login. Mintlify’s implementation automatically respects authentication and excludes protected docs from the generated file — a useful model to follow for custom implementations.

How quickly will implementing these changes affect my AI visibility?

Timelines vary by platform. Perplexity can reflect structural changes within days, as its index refreshes frequently. Bing (which feeds ChatGPT Search) typically reflects IndexNow-submitted content within minutes of submission, but citation testing may take a few days. Google AI Overviews follows traditional SEO indexing timelines — changes may take weeks to a month to show measurable improvement. Schema markup changes can have faster impact because they affect how already-indexed pages are parsed, not just whether they are crawled. For a baseline to measure improvement against, run the GEO Score Check before and after implementation.

About the Author

Janis Grinhofs is Gründer & CGO of Bavaria AI (Bavarian Crypto Labs GmbH), a Munich-based Generative Engine Optimization agency. Previously, Janis served in a technical leadership role at yoummday, a German SaaS tech scale-up valued at €250M (2022) with over €100M in annual revenue. At Bavaria AI, he leads technical GEO strategy, the proprietary GEO Score Framework, and AI consulting for DACH e-commerce companies. You can find more of his work in the AI SEO Guide 2026 and the guide to appearing in ChatGPT answers.

Ready to Run Your Technical GEO Audit?

This checklist covers every technical layer that determines whether AI systems can crawl, parse, and cite your content. But knowing what to implement is only half the equation — you also need to know where your site stands right now.

The Bavaria AI GEO Score Check audits your domain across ChatGPT, Perplexity, Google AI Overviews, Claude, Gemini, and Bing Copilot and delivers a prioritized action plan in minutes. It is free, takes less than two minutes to complete, and gives you a concrete 0–100 baseline to measure implementation progress against.

Get your free GEO Score & technical consultation →