Welcome to Robots.txt Documentation

Master intelligent content curation, crawler management, and SEO optimization with our comprehensive platform. This guide covers everything from basic setup to advanced crawl orchestration.

Overview

Robots.txt is an enterprise-grade platform that automates and optimizes how web crawlers interact with your digital assets. Instead of manually maintaining static robots.txt files across dozens of environments, our AI-driven engine dynamically generates, validates, and deploys crawl directives while protecting sensitive routes and maximizing indexing efficiency.

ℹ️

Tip: New users should start with the Quick Start guide to deploy your first crawl policy in under 5 minutes.

Quick Start

Get up and running in three simple steps:

1. Install the CLI

bash

# Install via npm
npm install -g @robots-txt/cli

# Or via Homebrew (macOS/Linux)
brew tap robots-txt/cli
brew install robots-txt

2. Initialize your project

bash

robots init my-project
# Creates robots.config.js with best-practice defaults

3. Deploy & Monitor

bash

robots deploy --env production
# Live dashboard: https://app.robots.txt/dashboard/my-project

✅

Success! Your crawl policy is now live and actively optimized by our AI engine.

Core Directives

The platform supports all standard Robots Exclusion Protocol directives, plus several proprietary extensions for advanced control:

User-agent - Target specific crawlers or use * for all agents
Allow / Disallow - Path-based access control with regex support
Crawl-delay - Throttle request rates (proprietary extension)
Max-image-preview - Control thumbnail generation size
Sitemap - XML sitemap discovery endpoints

Configuration Schema

javascript

module.exports = {
  userAgent: ["Googlebot", "Bingbot", "*"],
  rules: {
    "/api/*": "deny",
    "/private/*": "deny",
    "/public/*": "allow",
    "/blog/*": { allow: true, maxAge: 86400 }
  },
  crawlDelay: 2,
  sitemaps: ["https://example.com/sitemap.xml"],
  aiOptimization: true // Auto-tunes rules weekly
};

API Reference

Manage crawl policies programmatically via our REST API. All endpoints require Bearer token authentication.

Method	Endpoint	Description	Auth
`GET`	`/v2/policies`	List all active crawl policies	Required
`POST`	`/v2/policies`	Create or update a policy	Required
`GET`	`/v2/analytics/crawls`	Fetch real-time crawl metrics	Required
`DELETE`	`/v2/policies/:id`	Disable and archive a policy	Admin

⚠️

Rate Limits: API requests are capped at 120 requests/minute per token. Exceeding this returns 429 Too Many Requests. Implement exponential backoff in production workflows.