XML Sitemap Best Practices and Mistakes to Avoid
The golden rule of XML sitemaps is to list only the canonical, indexable URLs you actually want in search β nothing redirected, blocked, noindexed or duplicated β with honest lastmod dates. A tidy sitemap earns crawler trust; a bloated one teaches Google to ignore it. This is a best-practices guide to getting the details right, not a walkthrough of the tool itself.
What belongs in a sitemap (and what does not)
A sitemap is a curated list of your best URLs, not a dump of every address your site can serve. Keep it clean:
| Include | Exclude |
|---|---|
| Canonical, 200-status pages | Redirecting (301/302) URLs |
| Indexable content you want ranked | Pages with a noindex tag |
| One version per page | Duplicates and parameter variants |
| Absolute https URLs | Pages blocked in robots.txt |
| Live, reachable pages | 404s, staging and thank-you pages |
Mixing non-canonical or blocked URLs into the file sends crawlers conflicting signals: the sitemap says "index me" while a tag or robots rule says the opposite. That contradiction is one of the most common reasons Search Console flags coverage warnings.
Get lastmod right, ignore changefreq and priority
Google uses lastmod as a crawl hint, but only when it is consistently accurate. The most damaging habit is stamping every URL with today's date on every regeneration β it looks like your whole site changes daily, so Google stops trusting the field entirely. Set lastmod to when the content genuinely changed, and leave older pages with their real, older dates.
Changefreq and priority are effectively ignored by Google. They do no harm, and some other engines may read them, but do not spend time fine-tuning them or expect them to move rankings. If you set them once and move on, you are doing it right.
Size, structure and submission pitfalls
- Respect the limits. One file holds up to 50,000 URLs and 50 MB uncompressed. Beyond that, split into multiple sitemaps and reference them from a sitemap index.
- Escape special characters. Ampersands and angle brackets in URLs must be encoded or the file fails validation β the ByteTools generator handles this automatically, but hand-edited files often break here.
- Point robots.txt at it. Add a
Sitemap:line so any crawler discovers the file even before you submit it. - Submit once, then leave it. Register the URL in Google Search Console and Bing Webmaster Tools; resubmitting daily does nothing.
- Keep it fresh, not padded. Remove URLs when pages are deleted rather than letting the sitemap fill with 404s.
A quick troubleshooting checklist
If Search Console reports "Sitemap could not be read," check that the file is served with an XML content type, uses absolute URLs, and has no stray characters before the XML declaration. If pages are "Discovered β currently not indexed," the sitemap is working but the pages need better internal links and content quality; a sitemap invites crawling, it never forces indexing.
Try the XML Sitemap Generator β free and 100% in your browser.
FAQ
Should I put every page of my site in the sitemap?
No. Include only canonical, indexable pages you want to rank. Leaving out thin, duplicate or utility pages concentrates crawler attention on your best content and avoids the conflicting signals that trigger coverage warnings.
Will a fake or bulk lastmod date help me get crawled faster?
The opposite. Stamping every URL with the current date on each rebuild makes the field untrustworthy, and Google learns to disregard it. Accurate, genuine modification dates are what keep lastmod useful as a crawl hint.
Do I need a sitemap index for a small site?
Only if you exceed 50,000 URLs or 50 MB in a single file. Most small sites fit comfortably in one sitemap, so a single sitemap.xml is all you need until you grow well beyond that.
How often should I regenerate the sitemap?
Regenerate when you add, remove or meaningfully update pages β not on a fixed daily schedule for its own sake. The aim is an accurate snapshot, so update it when the underlying URLs actually change.
Related free tools
- Robots.txt Generator β add the Sitemap line and crawl rules.
- Canonical Tag Generator β signal the one true URL per page.
- XML Validator β confirm the file is well-formed before submitting.
- Hreflang Tag Generator β manage multi-language URL variants.
Built by ByteVancer
ByteTools is a free product of ByteVancer, a software and web development studio building web apps, SaaS and custom software. If your site needs bespoke SEO tooling or a custom crawl-and-audit workflow, explore what ByteVancer can build for you.
Recommended reading
Create an XML Sitemap From a List of URLs
Turn a plain list of URLs into a valid sitemap.xml with lastmod, changefreq and priority. A free, private sitemap generator for small sites.
XML Sitemap Generator Use Cases and Real Scenarios
Real scenarios for a browser-based XML sitemap generator: static sites, landing-page campaigns, agency handoffs, and quick fixes when a CMS plugin is overkill.
XOR Cipher Use Cases: CTFs, Learning, and Puzzles
Real use cases for the XOR cipher, from CTF challenges and teaching bitwise logic to lightweight obfuscation, with concrete worked examples.
XOR Cipher Tips: Keys, Security, and Common Mistakes
Pro tips and common mistakes for the repeating-key XOR cipher: key length, reuse pitfalls, format choices, and when to switch to real encryption.