Live Configuration
Code robots.txt
Our crawl directives govern how automated bots and search engine crawlers interact with the Robots.txtβ’ platform and its indexed content.
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Robots.txt β Robots.txtβ’ Platform
# Intelligent Content Curation Platform
# Last updated: 2025-01-15T10:30:00Z
# Protocol: RFC 9309 (Robots.txt) v2.2
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# ββ Default Rule: Allow All General Crawlers ββ
User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
Disallow: /internal/
Disallow: /private/
Disallow: /staging/
Disallow: /*.json$
Disallow: /wp-admin/
Disallow: /wp-includes/
# ββ Google Bot ββ
User-agent: Googlebot
Allow: /
Max-Image-Preview: large
Max-Snippet: -1
Max-Video-Preview: -1
# ββ Bing Bot ββ
User-agent: bingbot
Allow: /
Disallow: /api/
Crawl-Delay: 2
# ββ Ad Bots (Restricted) ββ
User-agent: Mediapartners-Google
Allow: /
# ββ Sitemap ββ
Sitemap: https://robots.txt/sitemap.xml
Directive Summary
Allow Rules
Public-facing content pages, documentation, API docs, blog posts, and marketing pages are allowed for all crawlers.
Disallow Rules
API endpoints, admin panels, internal tools, private resources, staging environments, and raw data files are disallowed.
Crawl Controls
Crawl-Delay: 2 applied to bingbot. Googlebot has no delay. Rate limiting enforced at the CDN edge.
Sitemap
https://robots.txt/sitemap.xml is referenced for all crawlers, keeping indexation current with published content.
π Configuration Status
Active
User-Agents
3
Allow Rules
6
Disallow Rules
9
Last Changed
2h ago
Protocol
RFC 9309
Validation
β Pass
π€ Major Bot Status
| Bot | Organization | Status | Access |
|---|---|---|---|
| Googlebot | Google LLC | Allowed | Full access with enhanced preview rules |
| bingbot | Microsoft | Allowed | Allowed with 2s crawl delay |
| Mediapartners-Google | Google AdSense | Limited | Ads only β no indexing |
| * (others) | General crawlers | Allowed | Public pages only |
| AhrefsBot | Ahrefs | Limited | Content indexed, no API |
| SemrushBot | Semrush | Limited | Rate-limited access |
| Unknown | Unidentified | Blocked | All non-whitelisted blocked |
β Frequently Asked Questions
robots.txt is a standard protocol that tells web crawlers which pages or files they can or can't request from a website. It's the first point of communication between your site and search engines, influencing indexing, crawl budget, and content visibility.
Our robots.txt is dynamically generated and updated in real-time as our platform evolves. Changes are deployed instantly via our edge network with zero downtime. You can also subscribe to our RSS feed for change notifications.
Yes! If you're a researcher, journalist, or legitimate service provider needing access to restricted content, contact us at
bot@robots.txt. We review exception requests on a case-by-case basis.Yes. Our robots.txt explicitly blocks known AI/LLM crawling user-agents from training on our content. If you're running an AI service and need permission, reach out to our team directly.
This file follows
RFC 9309 (Robots.txt) v2.2 and supports the Extended Robots.txt specification. We also implement emerging standards like Crawl-Delay, Max-Image-Preview, and Sitemap directives.