🤖 ----------------robots.txt

📄

robots.txt Updated 2 hours ago · 4 directives · 3 user-agents

HTTPS RFC 9309 v2.2

                # ═══════════════════════════════════════════════════════
                # Robots.txt — Robots.txt™ Platform
                # Intelligent Content Curation Platform
                # Last updated: 2025-01-15T10:30:00Z
                # Protocol: RFC 9309 (Robots.txt) v2.2
                # ═══════════════════════════════════════════════════════
                
                # ── Default Rule: Allow All General Crawlers ──
                User-agent: *
                Allow: /
                Disallow: /api/
                Disallow: /admin/
                Disallow: /internal/
                Disallow: /private/
                Disallow: /staging/
                Disallow: /*.json$
                Disallow: /wp-admin/
                Disallow: /wp-includes/
                
                # ── Google Bot ──
                User-agent: Googlebot
                Allow: /
                Max-Image-Preview: large
                Max-Snippet: -1
                Max-Video-Preview: -1
                
                # ── Bing Bot ──
                User-agent: bingbot
                Allow: /
                Disallow: /api/
                Crawl-Delay: 2
                
                # ── Ad Bots (Restricted) ──
                User-agent: Mediapartners-Google
                Allow: /
                
                # ── Sitemap ──
                Sitemap: https://robots.txt/sitemap.xml
            

Directive Summary

✅

Allow Rules

Public-facing content pages, documentation, API docs, blog posts, and marketing pages are allowed for all crawlers.

🚫

Disallow Rules

API endpoints, admin panels, internal tools, private resources, staging environments, and raw data files are disallowed.

🤖

Crawl Controls

Crawl-Delay: 2 applied to bingbot. Googlebot has no delay. Rate limiting enforced at the CDN edge.

🗺️

Sitemap

https://robots.txt/sitemap.xml is referenced for all crawlers, keeping indexation current with published content.

📊 Configuration Status

Active

User-Agents

Allow Rules

Disallow Rules

Last Changed

2h ago

Protocol

RFC 9309

Validation

✓ Pass

🤖 Major Bot Status

Bot	Organization	Status	Access
Googlebot	Google LLC	Allowed	Full access with enhanced preview rules
bingbot	Microsoft	Allowed	Allowed with 2s crawl delay
Mediapartners-Google	Google AdSense	Limited	Ads only — no indexing
* (others)	General crawlers	Allowed	Public pages only
AhrefsBot	Ahrefs	Limited	Content indexed, no API
SemrushBot	Semrush	Limited	Rate-limited access
Unknown	Unidentified	Blocked	All non-whitelisted blocked

❓ Frequently Asked Questions

robots.txt is a standard protocol that tells web crawlers which pages or files they can or can't request from a website. It's the first point of communication between your site and search engines, influencing indexing, crawl budget, and content visibility.

Our robots.txt is dynamically generated and updated in real-time as our platform evolves. Changes are deployed instantly via our edge network with zero downtime. You can also subscribe to our RSS feed for change notifications.

Yes! If you're a researcher, journalist, or legitimate service provider needing access to restricted content, contact us at bot@robots.txt. We review exception requests on a case-by-case basis.

Yes. Our robots.txt explicitly blocks known AI/LLM crawling user-agents from training on our content. If you're running an AI service and need permission, reach out to our team directly.

This file follows RFC 9309 (Robots.txt) v2.2 and supports the Extended Robots.txt specification. We also implement emerging standards like Crawl-Delay, Max-Image-Preview, and Sitemap directives.

Code robots.txt

Directive Summary

Allow Rules

Disallow Rules

Crawl Controls

Sitemap

📊 Configuration Status

🤖 Major Bot Status

❓ Frequently Asked Questions

Code `robots.txt`