Skip to content

AEO for Gatsby

Used by 1%+ of React sites market share

Gatsby is a React-based static site generator that pre-builds every page as static HTML at build time, delivering exceptional performance and crawlability. Its GraphQL data layer and rich plugin ecosystem make it highly configurable for AEO, but everything requires explicit setup — structured data, robots.txt, and sitemaps all need dedicated plugins or manual implementation.

Default robots.txt Behavior

Gatsby does not generate a robots.txt by default. You must install gatsby-plugin-robots-txt to create one, or manually add a static robots.txt file to the /static directory. Without configuration, there are no robots rules at all — no AI bot directives, no sitemap reference, nothing.

Default Structured Data

Gatsby provides no built-in structured data. JSON-LD must be added manually via the Gatsby Head API (gatsby-head) or React Helmet. The GraphQL data layer can source content for schema generation, and the gatsby-plugin-schema-org community plugin offers some automation, but most sites require hand-written JSON-LD in page templates.

Common AEO Issues

IssueImpactFix
No robots.txt or sitemap by defaultAI crawlers have no guidance on what to crawl and no sitemap to discover content efficientlyInstall gatsby-plugin-robots-txt and gatsby-plugin-sitemap. Configure AI bot rules in gatsby-config.js and ensure the sitemap references all important pages
Manual JSON-LD implementation requiredWithout explicit schema markup in page templates, AI engines cannot identify content type, authorship, or relationshipsAdd JSON-LD via the Gatsby Head API or React Helmet in page templates. Create reusable schema components for Article, FAQ, Organization, and BreadcrumbList that pull data from the GraphQL layer
Build-time data stalenessStatic content is only as fresh as the last build — frequently updated content may be stale when AI crawlers visit between buildsUse Gatsby Cloud or a CI/CD pipeline with webhook triggers for content updates. Implement incremental builds or Deferred Static Generation for large sites with frequent content changes
Client-side hydration can delay interactivityWhile initial HTML is pre-built and crawlable, the React hydration step can load large JavaScript bundles that slow page rendering metricsAudit bundle size with gatsby-plugin-webpack-bundle-analyzer. Use dynamic imports for heavy components, remove unused dependencies, and leverage Gatsby Image for optimized media loading

Recommended Plugins

  • gatsby-plugin-sitemap

    Generates an XML sitemap at build time from all Gatsby pages. Supports custom queries, exclude patterns, and serialization functions for full sitemap control.

  • gatsby-plugin-robots-txt

    Generates a robots.txt file at build time with configurable User-agent rules, sitemap references, and environment-specific policies (e.g., block staging, allow production).

Step-by-Step Guide

  1. 1

    Run an AEO audit

    Use AEOprobe to scan your Gatsby site. Static HTML output means content accessibility should score well, but robots.txt and structured data will likely need attention.

  2. 2

    Install and configure robots.txt plugin

    Add gatsby-plugin-robots-txt to your project. In gatsby-config.js, define User-agent rules for GPTBot, ClaudeBot, PerplexityBot, and Amazonbot with appropriate Allow/Disallow directives and sitemap reference.

  3. 3

    Install and configure sitemap plugin

    Add gatsby-plugin-sitemap with a custom query that includes all important pages. Configure lastmod dates and exclude utility pages like 404 or search results.

  4. 4

    Add JSON-LD structured data

    Create reusable JSON-LD components using the Gatsby Head API. Implement Organization and WebSite schema in the root layout, Article schema on blog posts, FAQ schema on FAQ pages, and BreadcrumbList on all pages. Source data from the GraphQL layer.

  5. 5

    Optimize build and performance

    Audit JavaScript bundle size and eliminate unused code. Use Gatsby Image for all media, enable prefetching for internal links, and configure incremental builds to keep content fresh for AI crawlers.

  6. 6

    Re-audit and automate

    Run AEOprobe again to verify improvements. Set up CI/CD hooks to validate structured data on each build and trigger rebuilds on content changes to keep AI crawlers seeing fresh content.

Frequently Asked Questions

Is Gatsby good for AEO?

Gatsby is excellent for AEO fundamentals. Pre-built static HTML means AI crawlers always receive complete, fast-loading content with no JavaScript rendering dependency. The trade-off is that everything — robots.txt, sitemaps, structured data — must be explicitly configured via plugins or custom code.

How do I add structured data to a Gatsby site?

Use the Gatsby Head API (recommended) or React Helmet to add JSON-LD script tags in your page components. You can source schema data from the GraphQL data layer and create reusable schema components for Article, FAQ, Organization, and other types.

Does Gatsby need a robots.txt?

Yes. Gatsby does not create a robots.txt by default. Install gatsby-plugin-robots-txt and configure it in gatsby-config.js with rules for both traditional and AI crawlers, plus a sitemap reference.

How does Gatsby compare to Next.js for AEO?

Both require explicit AEO configuration. Gatsby excels at static content — pre-built HTML is always fast and crawlable. Next.js offers more flexibility with SSR, ISR, and dynamic rendering. For content-heavy sites with infrequent updates, Gatsby is ideal. For dynamic applications, Next.js is more versatile.

Check your site's AEO score

Run a free audit to see how AI search engines interact with your content.