The AEO Checklist: 47 Points to Audit Before Launching Any Page
Every page you publish is either ready for AI search engines or invisible to them. There is no middle ground. ChatGPT, Perplexity, Google AI Overviews, and Claude all decide in milliseconds whether your page is worth citing — and that decision is based on technical signals you can check before you hit publish.
This is the complete AEO checklist. 47 items across the 9 categories that AEOprobe scores in every audit. Use it as a pre-launch gate for every page, a monthly maintenance audit, or a diagnostic tool when your AI search visibility drops. Each item includes what to check, why it matters for AI, and how to fix it.
How to Use This Checklist
Work through each category in order. The categories are arranged by priority — AI Bot Access comes first because if crawlers cannot reach your page, nothing else matters. Each item is written as a pass/fail check you can verify in under a minute.
If you want to automate the entire checklist, run a free AEOprobe audit — it checks all 47 points in about 60 seconds and gives you a score from A+ to F with specific fix recommendations.
1. AI Bot Access (6 Items)
This is the foundation of AEO. If AI crawlers are blocked by your robots.txt, they literally cannot read your content — no matter how well-optimized everything else is. These 6 checks ensure AI engines can reach your pages.
1. GPTBot is allowed in robots.txt
Check that your robots.txt does not contain User-agent: GPTBot / Disallow: /. GPTBot powers ChatGPT search — blocking it means your content never appears in ChatGPT answers. If you do not mention GPTBot at all, it defaults to allowed, which is fine.
2. ClaudeBot is allowed in robots.txt
Anthropic's ClaudeBot fetches content for Claude's web search. Blocking ClaudeBot removes your site from Claude's citation pool. Check for both ClaudeBot and the legacy anthropic-ai user agent.
3. PerplexityBot is allowed in robots.txt
Perplexity processes over 150 million queries per day and always cites its sources with direct links. Blocking PerplexityBot is blocking one of the highest-referral AI search engines. Verify there is no Disallow directive for PerplexityBot.
4. Google-Extended is allowed in robots.txt
Google-Extended controls whether Google uses your content for Gemini model training and AI Overviews. Blocking it may reduce your visibility in Google AI Overviews, which now appear on over 47% of US search queries. Note that blocking Google-Extended does not affect standard Googlebot indexing.
5. No blanket wildcard blocking AI crawlers
Check that a User-agent: * / Disallow: / rule is not accidentally blocking all bots including AI crawlers. Some sites intend to block only scrapers but end up blocking everything. If you use a wildcard disallow, add explicit Allow rules for the AI bots you want to support.
6. robots.txt is accessible and returns 200
Your robots.txt must be served at the root domain (/robots.txt) and return a 200 status code. A 404 or 500 response means crawlers cannot read your access rules — most will default to crawling everything, but some may treat a broken robots.txt as a signal of a poorly maintained site.
2. Structured Data (7 Items)
Structured data gives AI engines machine-readable context about your content. JSON-LD markup using schema.org vocabulary is the format every AI engine supports. These 7 checks ensure your structured data is present, valid, and useful.
7. Page has at least one JSON-LD block
Every page should contain at least one <script type="application/ld+json"> block. Without structured data, AI engines must infer your content type from raw HTML — which is less reliable and less likely to result in citations.
8. JSON-LD uses valid JSON syntax
Malformed JSON is silently ignored by every parser. A missing comma, unclosed bracket, or trailing comma means your structured data does not exist as far as AI engines are concerned. Use the JSON-LD Schema Validator to catch syntax errors.
9. @context points to schema.org
Every JSON-LD block must include "@context": "https://schema.org". Without this, parsers do not know what vocabulary you are using and will ignore the data entirely.
10. @type is present and specific
Use the most specific @type that matches your content: Article for blog posts, Product for product pages, FAQPage for FAQ content, HowTo for tutorials. Generic types like WebPage provide minimal value to AI engines.
11. Article schema includes author and datePublished
For any content page, the Article schema should include headline, datePublished, author (as a Person with a name), and publisher (as an Organization). These fields directly feed the E-E-A-T signals that AI engines use to assess content authority and freshness.
12. FAQ schema covers common questions
If your page answers common questions, wrap them in FAQPage schema with mainEntity containing Question and acceptedAnswer pairs. FAQ schema is one of the most directly consumed structured data types — AI engines use it to generate Q&A snippets verbatim.
13. Organization schema is present site-wide
Include Organization schema with your name, url, logo, and sameAs (linking to your social profiles). This establishes your brand as a known entity that AI engines can confidently attribute content to.
3. Content Quality (6 Items)
AI engines favor content that is well-structured, answer-first, and easy to extract. These checks ensure your content is formatted for both human readability and machine parsing.
14. First paragraph contains the direct answer
The opening paragraph should state the core answer or value proposition immediately. AI engines disproportionately weight the first 150 words when deciding what to cite. If your answer is buried below the fold, it is much less likely to be extracted.
15. Heading hierarchy is clean (H1 → H2 → H3)
Use exactly one H1 per page, followed by H2 sections with H3 subsections. Skipping levels (H1 directly to H3) or using multiple H1s confuses AI parsers about your content structure. Clean hierarchy helps AI engines build an accurate outline of your page.
16. Content is at least 300 words
Thin pages rarely get cited. A minimum of 300 words gives AI engines enough material to understand your topic and extract meaningful quotes. For competitive topics, 1,000+ words with clear sections perform significantly better.
17. Paragraphs are concise (under 150 words each)
Long, unbroken paragraphs are harder for AI engines to parse and extract from. Keep paragraphs focused on a single point. AI systems prefer content they can quote in self-contained chunks — a 40-word paragraph that answers a question directly is more citable than a 300-word paragraph that covers three topics.
18. Lists and tables are used for structured information
When presenting comparisons, steps, features, or specifications, use HTML lists (<ul>, <ol>) and tables (<table>). AI engines can parse these structures directly, while the same information embedded in prose is harder to extract accurately.
19. No duplicate or near-duplicate content
Duplicate content across your site splits AI attribution. If three pages contain the same answer, AI engines may cite none of them (due to confidence uncertainty) or cite the wrong one. Use canonical tags and consolidate duplicate content into a single authoritative page.
4. Meta Tags (5 Items)
Meta tags provide authoritative signals that AI engines use to understand page identity, authorship, and canonical status. Missing or misconfigured meta tags create ambiguity that reduces citation confidence.
20. Title tag is present and under 60 characters
Every page needs a unique <title> tag that accurately describes the content. Keep it under 60 characters to avoid truncation. AI engines use the title as a primary signal for what the page is about — a missing or generic title reduces citation likelihood.
21. Meta description is present and under 160 characters
The meta description serves as your content's elevator pitch to AI engines. Write a concise summary of what the page covers. AI systems often use this as a confidence check: if the description matches the content, the page is more likely to be cited.
22. Canonical URL is set and correct
Every page must have a <link rel="canonical"> pointing to its authoritative URL. Without it, AI engines may encounter your content via multiple URLs (www vs non-www, HTTP vs HTTPS, with or without trailing slashes) and split attribution across duplicates.
23. Language attribute is set on the HTML tag
The <html lang="en"> attribute tells AI engines what language your content is in. This is critical for multilingual sites and helps AI engines serve the right version of your content in language-appropriate answers.
24. Robots meta tag allows indexing
Check that your pages do not have <meta name="robots" content="noindex"> unless intentional. A noindex directive tells all crawlers — including AI bots — to skip the page entirely. This is separate from robots.txt and overrides it.
5. Open Graph (4 Items)
Open Graph tags provide structured preview data that AI engines use to evaluate source quality and generate rich citations. Incomplete OG data makes your content look less authoritative.
25. og:title is present
Set <meta property="og:title"> to your page's title. AI engines use OG tags as a cross-reference with your title tag — consistency between the two signals a well-maintained page.
26. og:description is present
Include <meta property="og:description"> with a summary of your content. This field is used by AI engines when generating preview cards and citation context.
27. og:image is present with valid URL
Set <meta property="og:image"> to an absolute URL of a representative image. AI engines that generate visual citations (like Perplexity) use this image. A missing og:image means your citation has no visual, which reduces click-through.
28. og:url matches canonical URL
The og:url must point to the same URL as your canonical tag. Mismatches between og:url and canonical create conflicting signals about which URL is authoritative, potentially splitting AI citation credit.
6. Sitemaps (5 Items)
Your XML sitemap is the roadmap AI crawlers use to discover and prioritize your content. A broken or incomplete sitemap means pages get missed or deprioritized.
29. XML sitemap exists at /sitemap.xml
Your site must serve a valid XML sitemap at the standard location. AI crawlers check this URL by convention. If it returns a 404, crawlers fall back to link-following, which is slower and less reliable for discovering all your pages.
30. Sitemap is referenced in robots.txt
Include Sitemap: https://yourdomain.com/sitemap.xml at the bottom of your robots.txt. This ensures every crawler — including AI bots — knows where to find your sitemap without guessing.
31. All important pages are in the sitemap
Every page you want AI engines to discover and cite should be in your sitemap. If a page is missing from the sitemap and has no inbound links, crawlers may never find it. Audit your sitemap against your actual page inventory regularly.
32. Sitemap URLs return 200 status codes
Every URL listed in your sitemap must return a 200 status. 404s, 301s, and 500s in your sitemap waste crawl budget and signal poor site maintenance. AI crawlers with limited time budgets may abandon your sitemap entirely if too many URLs fail.
33. lastmod dates are accurate and recent
The <lastmod> element tells crawlers when content was last updated. Accurate dates help AI engines prioritize fresh content. Never set all pages to today's date — this destroys the signal. Only update lastmod when you genuinely modify the page content.
7. Performance (5 Items)
AI crawlers have time budgets just like Googlebot. Slow sites get fewer pages crawled and may be deprioritized as unreliable sources. These checks ensure your site is fast enough for AI crawlers to process efficiently.
34. Server responds in under 500ms
Measure your Time to First Byte (TTFB). AI crawlers that wait more than 500ms may time out or deprioritize your domain. For static sites, aim for under 200ms. For dynamic sites, ensure your caching layer handles crawler requests efficiently.
35. Page fully loads in under 3 seconds
Total page load time affects whether crawlers can process your full content. JavaScript-heavy pages that take 5+ seconds to render may only be partially parsed. Server-side rendering or static generation ensures crawlers see your complete content immediately.
36. No render-blocking resources preventing content access
Critical CSS should be inlined, and JavaScript should not block the main content from rendering. AI crawlers that process your HTML directly need your content visible in the initial HTML response — not injected by JavaScript after load.
37. Images are optimized with proper formats
Use modern formats (WebP, AVIF) with appropriate compression. Oversized images slow down page load and waste crawler bandwidth. Include width and height attributes to prevent layout shift, and use lazy loading for below-the-fold images.
38. Gzip or Brotli compression is enabled
Your server should compress HTML, CSS, and JavaScript responses. Check that your Content-Encoding header shows gzip or br. Uncompressed responses are 5-10x larger, which directly slows crawler processing and wastes bandwidth.
8. Security (4 Items)
Security signals establish trust. AI engines are less likely to cite content from sites with security issues — mixed content, missing HTTPS, or weak headers all reduce source credibility.
39. HTTPS is enforced (HTTP redirects to HTTPS)
Every HTTP request must 301 redirect to HTTPS. An accessible HTTP version of your site creates duplicate content issues and signals poor security hygiene. AI engines strongly prefer HTTPS sources.
40. HSTS header is present
Send the Strict-Transport-Security header to tell browsers (and crawlers) to always use HTTPS. A typical value is max-age=31536000; includeSubDomains. This prevents downgrade attacks and signals that your site takes security seriously.
41. No mixed content (HTTP resources on HTTPS pages)
Check that all images, scripts, stylesheets, and fonts are loaded over HTTPS. Mixed content triggers browser warnings and signals to AI engines that your site has unresolved security issues. A single HTTP image on an HTTPS page can flag the entire page.
42. Security headers are properly configured
Set X-Content-Type-Options: nosniff, X-Frame-Options: DENY (or SAMEORIGIN), and Referrer-Policy: strict-origin-when-cross-origin. These headers indicate a professionally maintained site. While AI engines may not directly check each header, the aggregate security posture affects trust scoring.
43. Content-Security-Policy header is present
A Content-Security-Policy (CSP) header controls which resources can be loaded on your pages. Even a basic CSP demonstrates security awareness. Sites without any CSP are more vulnerable to XSS attacks, which can compromise content integrity — a trust signal AI engines consider.
9. Accessibility (5 Items)
Accessibility and AI readability are deeply connected. Semantic HTML, proper ARIA attributes, and structural cues that help screen readers also help AI engines parse your content structure accurately.
44. Semantic HTML elements are used
Use <article>, <nav>, <main>, <section>, <header>, and <footer> instead of generic <div> elements. Semantic elements give AI engines explicit signals about content hierarchy and purpose. A page built entirely with <div>s provides no structural information.
45. Images have descriptive alt text
Every meaningful image needs an alt attribute that describes its content. AI engines that process images use alt text as a caption and context signal. Missing alt text means AI systems cannot understand what your images show or how they relate to surrounding content.
46. ARIA landmarks are present
Use ARIA landmarks (role="main", role="navigation", role="banner") to complement semantic HTML. These roles provide redundant structural signals that help AI engines identify the primary content area versus navigation, sidebars, and footers.
47. Skip navigation and focus management work
A skip-to-content link and logical tab order indicate a well-structured page. While AI engines do not "tab" through your page, the presence of accessibility infrastructure correlates with clean, well-organized HTML that is easier for AI to parse. Sites that invest in accessibility tend to have cleaner DOM structures overall.
The Pre-Launch Gate
This checklist works best as a pre-launch gate — a final review before any page goes live. Here is how to integrate it into your workflow:
- Draft your content as usual, focusing on quality and completeness.
- Run through the checklist starting with AI Bot Access, then Structured Data, then Content Quality. Fix issues as you find them.
- Run an automated audit using AEOprobe to catch anything you missed. The automated scan checks all 47 points in about 60 seconds.
- Fix flagged items based on the audit results, prioritizing high-impact categories.
- Publish with confidence that your page is AI-search ready.
For ongoing maintenance, run AEOprobe audits weekly or monthly to catch regressions — especially after CMS updates, theme changes, or plugin installations that can break robots.txt rules, remove structured data, or introduce performance issues.
Automate the Entire Checklist
Manually checking 47 items for every page is thorough but time-consuming. AEOprobe automates the entire process: paste your URL, wait 60 seconds, and get a score from A+ to F with a detailed breakdown of every category. Each failing item includes a specific explanation and fix recommendation.
Use AEOprobe to automate this checklist — get your score in 60 seconds
The free audit covers all 9 categories, checks all 47 points, and requires no signup. For teams that need recurring monitoring, historical tracking, and exportable reports, premium plans are coming soon.
Frequently Asked Questions
How often should I run through this AEO checklist?
Run the full checklist before every page launch or major content update. For ongoing maintenance, run a complete site audit weekly or monthly using AEOprobe to catch regressions automatically. Categories like AI Bot Access and Security rarely change, but Content Quality and Structured Data need rechecking whenever you publish new pages.
Which checklist items have the biggest impact on AI citations?
AI Bot Access items have the highest impact — if crawlers are blocked, nothing else matters. After that, Structured Data (especially FAQ and Article schemas) and Content Quality (answer-first paragraphs, clear headings) are the strongest signals. These three categories alone account for roughly 60% of your AEO score.
Can I automate this checklist instead of checking manually?
Yes. AEOprobe automates all 47 checks across the 9 categories in a single scan that takes about 60 seconds. It scores each category, flags failures, and tells you exactly what to fix. Free audits are available with no signup required.
Do I need to pass all 47 points to rank in AI search?
No. Perfection is not required — but the more items you pass, the higher your AEO score and the more likely AI engines are to cite your content. Focus on the high-impact categories first: AI Bot Access, Structured Data, and Content Quality. A site scoring B or above across all categories is well-positioned for AI citations.
Related Posts
How to Improve Your AEO Score: A Step-by-Step Guide
Walk through a complete AEO audit and fix every category — robots.txt, structured data, meta tags, content, and more — to raise your score from F to A.
Read moreAEO Score Methodology: How AEOprobe Grades Your Website
AEOprobe publishes its full scoring methodology — 9 audit categories, category weighting, severity penalties, and A+ to F grading. Learn exactly how your AEO score is calculated and why transparency matters.
Read moreWhat Is Answer Engine Optimization (AEO)? The Complete Guide for 2026
Answer Engine Optimization (AEO) is the practice of optimizing content so AI search engines can crawl, parse, and cite your website. Learn the 9 audit categories and how to check your score.
Read morerobots.txt for AI Bots: Should You Block or Allow GPTBot, ClaudeBot, and PerplexityBot?
Learn which AI crawlers to allow or block in your robots.txt file. Covers GPTBot, ClaudeBot, PerplexityBot, and 11 more bots with 2026 best practices.
Read more