BYTETOOLS

Robots.txt Best Practices and Costly Mistakes

The most expensive robots.txt mistake is a single stray character: Disallow: / blocks your entire site from search engines, while an empty Disallow: allows everything β€” and the difference is one slash. Robots.txt is small but unforgiving, so this guide focuses on the habits and checks that keep it from quietly wrecking your visibility.

Understand what robots.txt actually controls

The number-one conceptual error is treating robots.txt as a way to hide pages from search results. It controls crawling, not indexing. A URL you disallow can still appear in Google β€” without a snippet β€” if other sites link to it, because Google never needed to crawl the page to know it exists. If your goal is to keep a page out of results, do the opposite of blocking it: allow crawling and add a noindex meta tag, or put the page behind authentication. Reserve robots.txt for managing crawl budget and steering bots away from low-value paths, not for privacy.

Avoid the classic syntax and placement traps

MistakeEffectFix
Disallow: / left in by accidentBlocks the whole siteUse empty Disallow: to allow all
File in a subfolderIgnored entirelyPlace at the domain root
Same file for every subdomainSubdomains uncoveredGive each subdomain its own file
Blocking CSS/JSBroken rendering for botsAllow assets Google needs

The file must live at exactly https://example.com/robots.txt β€” a copy in a subdirectory is invisible to crawlers. Each subdomain like blog.example.com needs its own file, since one does not cover the others. And blocking your CSS or JavaScript can stop Google from rendering pages correctly, which hurts more than it helps.

Set crawl-delay and AI-bot rules with realistic expectations

Crawl-delay is widely misunderstood. Googlebot ignores it outright β€” you manage Google's crawl rate in Search Console β€” though Bing and Yandex do honour it, so it is safe to include for them without expecting it to slow Google. For AI crawlers, disallow rules targeting user-agents like GPTBot, CCBot, anthropic-ai, ClaudeBot, Google-Extended, PerplexityBot and Bytespider signal that you do not want your content used for training. Compliance is voluntary, but reputable operators respect it. Remember that Google-Extended governs AI training specifically and does not affect your normal Google Search ranking, so blocking it will not hurt regular SEO.

Test before you publish, and always list your sitemap

Because one wrong line can deindex a site, treat every robots.txt change as production-critical: review the generated file line by line, confirm the paths are what you intend, and validate it in a robots testing tool before uploading. One easy win to include every time is the Sitemap: directive with an absolute URL β€” it helps all search engines discover your sitemap without manual submission, and you can list multiple sitemaps. Keeping the file minimal and intentional beats a sprawling set of rules nobody remembers the reason for.

Try the Robots.txt Generator β€” free and 100% in your browser.

FAQ

Will blocking a page in robots.txt remove it from Google?

Not reliably. Robots.txt stops crawling, not indexing, so a blocked URL can still show in results without a snippet if others link to it. Use a noindex tag or authentication to truly keep a page out.

What is the single most dangerous robots.txt line?

Disallow: / under a broad user-agent blocks your entire site from being crawled. Left in by mistake, it can wipe out search visibility. An empty Disallow: is the opposite and allows everything.

Does blocking Google-Extended hurt my search rankings?

No. Google-Extended only governs whether your content is used for AI training; it is separate from Search crawling. Blocking it keeps you out of training data without affecting normal ranking.

Why is my robots.txt being ignored?

The most common cause is placement β€” it must sit at the domain root, not in a subfolder, and each subdomain needs its own file. Check that the exact path is /robots.txt on the correct host.

Related free tools

Built by ByteVancer

ByteTools is a free product of ByteVancer, a software and web development studio building web apps, SaaS and custom software. If your site needs technical SEO done right or a product built from scratch, explore how ByteVancer can help.