Welcome to Robots.txt Documentation

Master intelligent content curation, crawler management, and SEO optimization with our comprehensive platform. This guide covers everything from basic setup to advanced crawl orchestration.

Overview

Robots.txt is an enterprise-grade platform that automates and optimizes how web crawlers interact with your digital assets. Instead of manually maintaining static robots.txt files across dozens of environments, our AI-driven engine dynamically generates, validates, and deploys crawl directives while protecting sensitive routes and maximizing indexing efficiency.

â„šī¸
Tip: New users should start with the Quick Start guide to deploy your first crawl policy in under 5 minutes.

Quick Start

Get up and running in three simple steps:

1. Install the CLI

bash
# Install via npm
npm install -g @robots-txt/cli

# Or via Homebrew (macOS/Linux)
brew tap robots-txt/cli
brew install robots-txt

2. Initialize your project

bash
robots init my-project
# Creates robots.config.js with best-practice defaults

3. Deploy & Monitor

bash
robots deploy --env production
# Live dashboard: https://app.robots.txt/dashboard/my-project
✅
Success! Your crawl policy is now live and actively optimized by our AI engine.

Core Directives

The platform supports all standard Robots Exclusion Protocol directives, plus several proprietary extensions for advanced control:

  • User-agent - Target specific crawlers or use * for all agents
  • Allow / Disallow - Path-based access control with regex support
  • Crawl-delay - Throttle request rates (proprietary extension)
  • Max-image-preview - Control thumbnail generation size
  • Sitemap - XML sitemap discovery endpoints

Configuration Schema

javascript
module.exports = {
  userAgent: ["Googlebot", "Bingbot", "*"],
  rules: {
    "/api/*": "deny",
    "/private/*": "deny",
    "/public/*": "allow",
    "/blog/*": { allow: true, maxAge: 86400 }
  },
  crawlDelay: 2,
  sitemaps: ["https://example.com/sitemap.xml"],
  aiOptimization: true // Auto-tunes rules weekly
};

API Reference

Manage crawl policies programmatically via our REST API. All endpoints require Bearer token authentication.

Method Endpoint Description Auth
GET /v2/policies List all active crawl policies Required
POST /v2/policies Create or update a policy Required
GET /v2/analytics/crawls Fetch real-time crawl metrics Required
DELETE /v2/policies/:id Disable and archive a policy Admin
âš ī¸
Rate Limits: API requests are capped at 120 requests/minute per token. Exceeding this returns 429 Too Many Requests. Implement exponential backoff in production workflows.
"}{