/robots.txt • sitemap.xml.com
Verified & Active
# Robots.txt for Sitemap.xml
# https://sitemap.xml.com/robots.txt
# Last Updated: 2025-01-15
# Purpose: Control crawler access & declare sitemap locations

User-agent: *
Disallow: /admin/
Disallow: /api/internal/
Disallow: /cgi-bin/
Disallow: /wp-login.php
Allow: /
Sitemap: https://sitemap.xml.com/sitemap.xml
Sitemap: https://sitemap.xml.com/blog-sitemap.xml
Sitemap: https://sitemap.xml.com/images-sitemap.xml
Crawl-delay: 1

# Search Engine Specific Rules
User-agent: Googlebot
Allow: /
Sitemap: https://sitemap.xml.com/priority-sitemap.xml

User-agent: Bingbot
Allow: /
Crawl-delay: 2

# Disallow aggressive/non-standard bots
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: DotBot
Disallow: /

Configuration Summary

This /robots.txt directive file governs automated crawler behavior for Sitemap.xml. Public-facing routes are explicitly permitted while administrative panels, internal API endpoints, and login paths are restricted. Three primary sitemap endpoints are registered to ensure comprehensive indexation across core, blog, and media assets. Aggressive SEO spider bots are blocked to preserve server resources.