BYTETOOLS

How to Write a robots.txt File and Block AI Bots

To create a robots.txt file, use the ByteTools Robots.txt Generator: pick a preset or build rule groups with a user-agent and allow/disallow paths, add your sitemap URL, and download a valid robots.txt — including a one-click preset that blocks the major AI training crawlers. The whole thing runs in your browser and downloads ready to upload to your site root.

robots.txt is the small text file at the root of your domain that tells crawlers where they may and may not go. It governs crawl behaviour for search engines and, increasingly, for AI bots harvesting content to train models. Writing it by hand means memorising the robots exclusion protocol; this generator lets you compose a correct file from simple choices instead.

What robots.txt controls and who needs it

Every site benefits from a deliberate robots.txt. It can keep crawlers out of admin areas, search-result pages and staging paths, point them to your sitemap, and manage crawl load. Lately the biggest reason people revisit their robots.txt is AI: many site owners want normal search engines to keep indexing them while stopping bots that scrape content for model training. This tool serves site owners who want that control without learning the syntax from scratch.

How to build a robots.txt in your browser

  1. Start from a preset — Allow all, Block all, or Block AI bots — or build your own rules from scratch.
  2. For each rule group, set the user-agent and list the paths to allow or disallow.
  3. Optionally add a crawl-delay and your sitemap URL.
  4. Copy the generated file or download robots.txt.
  5. Upload it to the root of your domain, e.g. https://example.com/robots.txt.

Which AI crawlers can you block?

The Block AI bots preset adds disallow rules for the main training and scraping crawlers in one click. Here's who they belong to.

User-agentOperated byPurpose
GPTBotOpenAITraining data collection
ClaudeBot / anthropic-aiAnthropicTraining and crawling
Google-ExtendedGoogleGemini training (separate from Search)
CCBotCommon CrawlOpen web dataset used by many models
PerplexityBot / BytespiderPerplexity / ByteDanceAnswer engines and scraping

Compliance is voluntary, but the reputable operators honour these rules — and blocking training crawlers via robots.txt does not affect how normal search engines index you.

Key features and benefits

  • Multiple user-agent rule groups with allow/disallow paths.
  • Presets for Allow all, Block all and Block AI bots.
  • Crawl-delay and Sitemap directives supported.
  • Live preview with copy and download.
  • Covers GPTBot, CCBot, anthropic-ai, Google-Extended and more.
  • Runs fully client-side.

Try the Robots.txt Generator now — it's free and runs entirely in your browser.

Frequently asked questions

Does robots.txt keep a page out of Google?

Not reliably. It blocks crawling, not indexing — a disallowed URL can still appear in results without a snippet if other sites link to it. To truly keep a page out of search, allow crawling and add a noindex meta tag, or protect it behind authentication.

How do I block AI bots from training on my content?

Add disallow rules for their user-agents — GPTBot, CCBot, anthropic-ai and ClaudeBot, Google-Extended, PerplexityBot and Bytespider. The Block AI bots preset adds these in one click. Compliance is voluntary, but reputable companies honour it.

Where does the file need to go?

At the root of the host: https://example.com/robots.txt, exactly that path and filename. A robots.txt in a subdirectory is ignored, and each subdomain like blog.example.com needs its own file.

Does Google respect crawl-delay?

No — Googlebot ignores crawl-delay; you manage Google's crawl rate in Search Console instead. Bing and Yandex do honour it, so it's safe to include for them, just don't expect it to slow Google down.

What does an empty Disallow value mean?

Disallow: with nothing after it means "nothing is disallowed" — allow everything for that user-agent. By contrast, Disallow: / blocks the whole site. That single slash is the difference between fully open and fully closed, so double-check it before publishing.

Related free tools

Built by ByteVancer

ByteTools is a free product of ByteVancer, a software and web development studio that builds web apps, SaaS platforms and custom software for businesses. Need help with technical SEO or a custom build? Explore ByteVancer's services and get in touch about your project.