πŸ“„
robots.txt Updated 2 hours ago · 4 directives · 3 user-agents
HTTPS RFC 9309 v2.2
/robots.txt
# ═══════════════════════════════════════════════════════ # Robots.txt β€” Robots.txtβ„’ Platform # Intelligent Content Curation Platform # Last updated: 2025-01-15T10:30:00Z # Protocol: RFC 9309 (Robots.txt) v2.2 # ═══════════════════════════════════════════════════════ # ── Default Rule: Allow All General Crawlers ── User-agent: * Allow: / Disallow: /api/ Disallow: /admin/ Disallow: /internal/ Disallow: /private/ Disallow: /staging/ Disallow: /*.json$ Disallow: /wp-admin/ Disallow: /wp-includes/ # ── Google Bot ── User-agent: Googlebot Allow: / Max-Image-Preview: large Max-Snippet: -1 Max-Video-Preview: -1 # ── Bing Bot ── User-agent: bingbot Allow: / Disallow: /api/ Crawl-Delay: 2 # ── Ad Bots (Restricted) ── User-agent: Mediapartners-Google Allow: / # ── Sitemap ── Sitemap: https://robots.txt/sitemap.xml

Directive Summary

βœ…

Allow Rules

Public-facing content pages, documentation, API docs, blog posts, and marketing pages are allowed for all crawlers.

🚫

Disallow Rules

API endpoints, admin panels, internal tools, private resources, staging environments, and raw data files are disallowed.

πŸ€–

Crawl Controls

Crawl-Delay: 2 applied to bingbot. Googlebot has no delay. Rate limiting enforced at the CDN edge.

πŸ—ΊοΈ

Sitemap

https://robots.txt/sitemap.xml is referenced for all crawlers, keeping indexation current with published content.

πŸ“Š Configuration Status

Active
User-Agents
3
Allow Rules
6
Disallow Rules
9
Last Changed
2h ago
Protocol
RFC 9309
Validation
βœ“ Pass

πŸ€– Major Bot Status

Bot Organization Status Access
Googlebot Google LLC Allowed Full access with enhanced preview rules
bingbot Microsoft Allowed Allowed with 2s crawl delay
Mediapartners-Google Google AdSense Limited Ads only β€” no indexing
* (others) General crawlers Allowed Public pages only
AhrefsBot Ahrefs Limited Content indexed, no API
SemrushBot Semrush Limited Rate-limited access
Unknown Unidentified Blocked All non-whitelisted blocked

❓ Frequently Asked Questions

robots.txt is a standard protocol that tells web crawlers which pages or files they can or can't request from a website. It's the first point of communication between your site and search engines, influencing indexing, crawl budget, and content visibility.
Our robots.txt is dynamically generated and updated in real-time as our platform evolves. Changes are deployed instantly via our edge network with zero downtime. You can also subscribe to our RSS feed for change notifications.
Yes! If you're a researcher, journalist, or legitimate service provider needing access to restricted content, contact us at bot@robots.txt. We review exception requests on a case-by-case basis.
Yes. Our robots.txt explicitly blocks known AI/LLM crawling user-agents from training on our content. If you're running an AI service and need permission, reach out to our team directly.
This file follows RFC 9309 (Robots.txt) v2.2 and supports the Extended Robots.txt specification. We also implement emerging standards like Crawl-Delay, Max-Image-Preview, and Sitemap directives.