The 30-minute GEO foundation: llms.txt, schema, Cloudflare AI crawler config
Josh Higgins
22 May 2026Co-Founder & Digital Growth Specialist
SEO
The 30-minute GEO foundation: llms.txt, schema, Cloudflare AI crawler config
What this guide covers
There are three technical foundations every site needs before any GEO content work pays off. None of them are expensive, all of them are one-time setups that compound for the life of the site, and skipping them is the most common reason businesses pay for GEO content and see nothing happen.
The three:
llms.txt and llms-full.txt: the AI-readable summary of your site.
Schema markup: FAQ, Organization, Article and LocalBusiness, properly linked.
Cloudflare AI crawler config: so the AI engines that obey robots are explicitly allowed in and the ones that do not are blocked.
You can do all three in an afternoon if your stack is modern. The lift is permanent.
Ready to grow your business?
Book a free strategy call with our Brisbane team. We will review your current digital presence and map out a tailored growth plan.
llms.txt is to AI crawlers what robots.txt is to search crawlers. It lives at the root of your domain (yoursite.com/llms.txt) and gives generative engines a structured, citation-ready summary of who you are, what you sell, where you operate, and what content matters most. The spec is described at llmstxt.org.
Two files matter:
llms.txt: a short, structured markdown summary. Brand name, one-line description, locations served, key services, primary contact, links to your most important pages. Think of it as the elevator pitch the AI reads first.
llms-full.txt: a longer document with the full content of your top 10-20 pages concatenated. The AI engine that wants depth can grab it in one request instead of crawling individually.
The structure that works best:
```
# Create & Grow Media
> Brisbane digital marketing agency specialising in SEO, GEO,
> Google Ads, social media, web design and branding for
Host both files at the root of your domain. Update them whenever your services or pricing change. ChatGPT, Perplexity and Claude all check llms.txt when they evaluate a domain for citation candidacy.
Schema: what to add and how to link it
Schema markup tells engines what your content actually is. Four types matter for most service businesses:
Organization schema with sameAs links pointing to your LinkedIn, Facebook, Instagram and any other authoritative profile. This is how engines verify your entity identity across the web. One implementation, applied site-wide via your layout.
LocalBusiness schema if you serve a geographic area. Address, postcode, opening hours, area served, contact details. Required for local AI Overviews to consider citing you.
FAQPage schema on any page that has FAQ content. This is the highest-leverage individual schema because it tells engines exactly which lines are questions and which are answers. AI Overviews and ChatGPT both prefer FAQ-marked content for extraction.
Article schema on every blog post and resource, with a verified author. Author bio with credentials and links to a real LinkedIn profile is what separates content the AI trusts from content it ignores.
How to verify it is working: use Google's Rich Results Test on your top 20 pages. If it does not detect FAQ, Article or Organization markers where it should, you have a fix to make.
Cloudflare AI crawler config
This is the one most businesses miss. AI engines have their own crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Applebot-Extended, and others) and they each respect robots.txt rules. If your robots.txt is silent on them, most will crawl. If you are running behind Cloudflare with default bot-fight settings, some of them may be blocked unintentionally.
Two steps:
One: explicitly allow AI crawlers in robots.txt. Add user-agent blocks naming GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Applebot-Extended, OAI-SearchBot, ChatGPT-User, anthropic-ai and a handful of others, with Allow: / under each. This signals consent and increases crawl priority for engines that respect it.
Two: configure Cloudflare to allow AI crawlers through bot management. In the Cloudflare dashboard, the new Bots section has explicit toggles for AI training crawlers and AI search crawlers. The default is to block AI training and allow AI search. For most marketing sites, you want both allowed: training crawlers are how your content gets baked into the model's knowledge, search crawlers are how it gets cited in real-time answers.
If you do not want your content used for AI training but do want to be cited in answers, allow only the search-time crawlers (OAI-SearchBot, ChatGPT-User, PerplexityBot, ClaudeBot user-facing version) and block the training-only ones (GPTBot, anthropic-ai training, CCBot if you are concerned).
The pragmatic recommendation for almost every Australian SMB: allow everything. The volume of traffic these crawlers represent is negligible, and the upside (being cited inside ChatGPT and Perplexity) is significant.
Verifying it works
After all three are in place, three checks:
Curl your llms.txt and llms-full.txt to confirm they return 200 and contain the content you expect.
Run Google's Rich Results Test on your top 5 pages to confirm FAQ and Article schema are picked up.
Check your Cloudflare Bots dashboard to confirm AI crawler traffic is appearing (it will, usually within 48 hours of allowing it).
A week later, query ChatGPT and Perplexity for your business name and your top services. You should start to see your brand mentioned with a link, where previously the answer either said "I cannot find specific information" or pointed to competitors.
What this does not do
To be honest about the limits: doing all three of the above does not guarantee citation. It removes the technical barriers to citation. The actual citation work still needs answer-first content, named author bios, original statistics, and a steady cadence of off-domain mentions. The technical foundation just makes sure that work is not wasted.
But if you do GEO content work without the technical foundation in place, you are paying for content that the engines cannot read, cannot trust, and cannot cite. The technical setup is the entry ticket.
If you want this done as a productised one-off, our GEO Foundation is exactly this: 14-day delivery, llms.txt + schema + Cloudflare config + citable passage rewrites on your top 10 pages, $1,500. Or grab our GEO Audit first if you want to see what is broken before you fix it.