BYTETOOLS

XML Sitemap Best Practices and Mistakes to Avoid

The golden rule of XML sitemaps is to list only the canonical, indexable URLs you actually want in search β€” nothing redirected, blocked, noindexed or duplicated β€” with honest lastmod dates. A tidy sitemap earns crawler trust; a bloated one teaches Google to ignore it. This is a best-practices guide to getting the details right, not a walkthrough of the tool itself.

What belongs in a sitemap (and what does not)

A sitemap is a curated list of your best URLs, not a dump of every address your site can serve. Keep it clean:

IncludeExclude
Canonical, 200-status pagesRedirecting (301/302) URLs
Indexable content you want rankedPages with a noindex tag
One version per pageDuplicates and parameter variants
Absolute https URLsPages blocked in robots.txt
Live, reachable pages404s, staging and thank-you pages

Mixing non-canonical or blocked URLs into the file sends crawlers conflicting signals: the sitemap says "index me" while a tag or robots rule says the opposite. That contradiction is one of the most common reasons Search Console flags coverage warnings.

Get lastmod right, ignore changefreq and priority

Google uses lastmod as a crawl hint, but only when it is consistently accurate. The most damaging habit is stamping every URL with today's date on every regeneration β€” it looks like your whole site changes daily, so Google stops trusting the field entirely. Set lastmod to when the content genuinely changed, and leave older pages with their real, older dates.

Changefreq and priority are effectively ignored by Google. They do no harm, and some other engines may read them, but do not spend time fine-tuning them or expect them to move rankings. If you set them once and move on, you are doing it right.

Size, structure and submission pitfalls

  • Respect the limits. One file holds up to 50,000 URLs and 50 MB uncompressed. Beyond that, split into multiple sitemaps and reference them from a sitemap index.
  • Escape special characters. Ampersands and angle brackets in URLs must be encoded or the file fails validation β€” the ByteTools generator handles this automatically, but hand-edited files often break here.
  • Point robots.txt at it. Add a Sitemap: line so any crawler discovers the file even before you submit it.
  • Submit once, then leave it. Register the URL in Google Search Console and Bing Webmaster Tools; resubmitting daily does nothing.
  • Keep it fresh, not padded. Remove URLs when pages are deleted rather than letting the sitemap fill with 404s.

A quick troubleshooting checklist

If Search Console reports "Sitemap could not be read," check that the file is served with an XML content type, uses absolute URLs, and has no stray characters before the XML declaration. If pages are "Discovered β€” currently not indexed," the sitemap is working but the pages need better internal links and content quality; a sitemap invites crawling, it never forces indexing.

Try the XML Sitemap Generator β€” free and 100% in your browser.

FAQ

Should I put every page of my site in the sitemap?

No. Include only canonical, indexable pages you want to rank. Leaving out thin, duplicate or utility pages concentrates crawler attention on your best content and avoids the conflicting signals that trigger coverage warnings.

Will a fake or bulk lastmod date help me get crawled faster?

The opposite. Stamping every URL with the current date on each rebuild makes the field untrustworthy, and Google learns to disregard it. Accurate, genuine modification dates are what keep lastmod useful as a crawl hint.

Do I need a sitemap index for a small site?

Only if you exceed 50,000 URLs or 50 MB in a single file. Most small sites fit comfortably in one sitemap, so a single sitemap.xml is all you need until you grow well beyond that.

How often should I regenerate the sitemap?

Regenerate when you add, remove or meaningfully update pages β€” not on a fixed daily schedule for its own sake. The aim is an accurate snapshot, so update it when the underlying URLs actually change.

Related free tools

Built by ByteVancer

ByteTools is a free product of ByteVancer, a software and web development studio building web apps, SaaS and custom software. If your site needs bespoke SEO tooling or a custom crawl-and-audit workflow, explore what ByteVancer can build for you.