πŸ“–

API Documentation

Welcome to the Robots.txt API. Our RESTful API allows you to programmatically manage crawl rules, generate sitemaps, monitor crawler activity, and optimize your website's discoverability by search engines.

All API requests are made over HTTPS to https://api.robotstxt.io/v2. Data is sent and received as JSON. Every request must include an API key in the Authorization header.

ℹ️
API Versioning

The current stable version is v2. Each version is supported for at least 12 months after a new version is released. Include the version in your base URL.

Base URL https://api.robotstxt.io/v2

Authentication

The Robots.txt API uses API keys for authentication. You can generate and manage your API keys from the Dashboard Settings.

API Key (Bearer Token)
OAuth 2.0 (Service Accounts)

Bearer Token Authentication

Include your API key in the Authorization header as a Bearer token:

curl https://api.robotstxt.io/v2/rules \
  -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d" \
  -H "Content-Type: application/json"
⚠️
Keep Your Keys Secure

Never expose your secret API keys in client-side code, public repositories, or browser-accessible environments. Use environment variables or a secrets manager.

Rate Limits

API requests are rate-limited based on your plan tier. When you exceed the limit, you'll receive a 429 Too Many Requests response.

Plan Requests / Minute Requests / Day
Free 60 1,000
Pro 300 50,000
Enterprise 2,000 Unlimited

Rate limit headers are included in every response:

X-RateLimit-Limit: 300
X-RateLimit-Remaining: 247
X-RateLimit-Reset: 1709251200

Error Handling

The API uses standard HTTP status codes and returns detailed error objects with a machine-readable code and a human-readable message.

// Example error response (400 Bad Request)
{
  "error": {
    "code": "invalid_request",
    "message": "The user_agent field is required and must be a valid bot identifier.",
    "type": "validation_error",
    "details": [
      {
        "field": "user_agent",
        "issue": "missing_required_field"
      }
    ],
    "request_id": "req_8f2d1a4c9b7e3f6a0d5c8e2b"
  }
}
πŸ“œ

Crawl Rules

Crawl rules define how different user-agents (search engine bots, crawlers, scrapers) should interact with your website. Each rule specifies which paths to allow or disallow for a given agent.

Rule Object

{
  "id": "rule_3f7a2c1d9e6b",
  "user_agent": "*",
  "directives": [
    {
      "type": "allow",
      "path": "/blog/"
    },
    {
      "type": "disallow",
      "path": "/api/"
    },
    {
      "type": "disallow",
      "path": "/admin/"
    }
  ],
  "crawl_delay": 2,
  "sitemap": "https://example.com/sitemap.xml",
  "active": true,
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-02-28T14:22:00Z"
}
GET /rules List all crawl rules β–Ό

Returns a paginated list of all crawl rules configured for your account. Rules are sorted by creation date in descending order.

Query Parameters

Parameter Type Location Description
pageoptional integer query Page number for pagination. Defaults to 1.
per_pageoptional integer query Number of results per page. Max 100. Defaults to 20.
user_agentoptional string query Filter rules by user-agent string.
activeoptional boolean query Filter by active/inactive status.
sortoptional string query Sort field: created_at, updated_at, user_agent.
orderoptional string query Sort order: asc or desc. Defaults to desc.

Example Request

curl https://api.robotstxt.io/v2/rules?page=1&per_page=10 \
  -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d"

Example Response 200

{
  "data": [
    {
      "id": "rule_3f7a2c1d9e6b",
      "user_agent": "*",
      "directives": [...],
      "active": true,
      "created_at": "2024-01-15T10:30:00Z"
    },
    {
      "id": "rule_8e2b1a4f7c9d",
      "user_agent": "Googlebot",
      "directives": [...],
      "active": true,
      "created_at": "2024-01-10T08:15:00Z"
    }
  ],
  "pagination": {
    "current_page": 1,
    "per_page": 10,
    "total_pages": 3,
    "total_count": 28,
    "has_next": true,
    "has_prev": false
  }
}
POST /rules Create a new crawl rule β–Ό

Creates a new crawl rule for a specified user-agent. The rule is immediately applied to your generated robots.txt file and propagated to the edge network.

Request Body

Parameter Type Location Description
user_agentrequired string body Target user-agent. Use * for all bots, or specific agents like Googlebot, Bingbot, facebookexternalhit.
directivesrequired array body Array of allow/disallow directive objects. Each object requires type ("allow" or "disallow") and path (string starting with /).
crawl_delayoptional integer body Minimum seconds between requests. Range: 1–60. Note: Googlebot ignores this directive.
sitemapoptional string body Absolute URL to your sitemap.xml. If not provided, the system uses your default sitemap.
activeoptional boolean body Whether the rule is active. Defaults to true. Inactive rules are excluded from the generated robots.txt.
commentoptional string body Optional comment included in the generated file for documentation purposes.

Example Request

curl -X POST https://api.robotstxt.io/v2/rules \
  -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d" \
  -H "Content-Type: application/json" \
  -d '{
    "user_agent": "*",
    "directives": [
      { "type": "allow", "path": "/blog/" },
      { "type": "disallow", "path": "/api/" },
      { "type": "disallow", "path": "/admin/" },
      { "type": "disallow", "path": "/private/" }
    ],
    "crawl_delay": 2,
    "comment": "Default rules for all crawlers"
  }'

Example Response 201

{
  "id": "rule_3f7a2c1d9e6b",
  "user_agent": "*",
  "directives": [
    { "type": "allow", "path": "/blog/" },
    { "type": "disallow", "path": "/api/" },
    { "type": "disallow", "path": "/admin/" },
    { "type": "disallow", "path": "/private/" }
  ],
  "crawl_delay": 2,
  "comment": "Default rules for all crawlers",
  "active": true,
  "created_at": "2024-03-15T12:00:00Z",
  "updated_at": "2024-03-15T12:00:00Z"
}
GET /rules/{rule_id} Get a single rule β–Ό

Retrieves a specific crawl rule by its unique identifier.

Path Parameters

ParameterTypeLocationDescription
rule_idrequired string path The unique identifier of the rule. Format: rule_ prefix.

Example Request

curl https://api.robotstxt.io/v2/rules/rule_3f7a2c1d9e6b \
  -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d"

Example Response 200

{
  "id": "rule_3f7a2c1d9e6b",
  "user_agent": "*",
  "directives": [
    { "type": "allow", "path": "/blog/" },
    { "type": "disallow", "path": "/api/" },
    { "type": "disallow", "path": "/admin/" }
  ],
  "crawl_delay": 2,
  "active": true,
  "created_at": "2024-03-15T12:00:00Z",
  "updated_at": "2024-03-15T12:00:00Z"
}
PUT /rules/{rule_id} Update a crawl rule β–Ό

Fully updates an existing crawl rule. All optional fields must be re-specified; unset fields revert to their defaults. For partial updates, use PATCH.

Path Parameters

ParameterTypeLocationDescription
rule_idrequired string path The unique identifier of the rule to update.

Example Request

curl -X PUT https://api.robotstxt.io/v2/rules/rule_3f7a2c1d9e6b \
  -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d" \
  -H "Content-Type: application/json" \
  -d '{
    "user_agent": "*",
    "directives": [
      { "type": "allow", "path": "/" },
      { "type": "disallow", "path": "/api/" },
      { "type": "disallow", "path": "/admin/" }
    ],
    "crawl_delay": 5
  }'
DELETE /rules/{rule_id} Delete a crawl rule β–Ό

Permanently deletes a crawl rule. This action cannot be undone. The rule is immediately removed from the generated robots.txt.

Example Request

curl -X DELETE https://api.robotstxt.io/v2/rules/rule_3f7a2c1d9e6b \
  -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d"

Example Response 204

// No content β€” successful deletion
πŸ—ΊοΈ

Sitemap Management

Automatically generate and manage sitemaps for your website. Sitemaps tell search engines which pages are available for crawling and when they were last modified.

GET /sitemaps List all sitemaps β–Ό

Returns all sitemaps associated with your account, including their status, last generation time, and URL.

Example Response 200

{
  "data": [
    {
      "id": "sm_7c9d3f2a1b4e",
      "name": "main-sitemap",
      "url": "https://example.com/sitemap.xml",
      "urls_count": 1243,
      "status": "active",
      "last_generated": "2024-03-15T06:00:00Z",
      "auto_generate": true,
      "created_at": "2024-01-01T00:00:00Z"
    }
  ]
}
POST /sitemaps Generate a new sitemap β–Ό

Creates a new sitemap by crawling your website or importing a URL list. The system discovers all public URLs and builds a valid sitemap.xml with proper lastmod, changefreq, and priority values.

Request Body

ParameterTypeLocationDescription
namerequired string body A descriptive name for the sitemap (e.g., main-sitemap).
source_urlrequired string body The root URL to crawl. The system will discover all linked pages.
max_depthoptional integer body Maximum link depth to follow. Defaults to 10.
auto_generateoptional boolean body If true, the sitemap is regenerated daily. Defaults to true.
exclude_patternsoptional array body Glob patterns to exclude from the sitemap (e.g., ["/admin/*", "*.pdf"]).
POST /sitemaps/{sitemap_id}/submit Submit sitemap to search engines β–Ό

Submits the specified sitemap to Google Search Console, Bing Webmaster Tools, and other configured search engines simultaneously.

Request Body

ParameterTypeLocationDescription
enginesrequired array body List of engines to submit to: ["google", "bing", "yandex", "baidu"].
πŸ“Š

Crawl Analytics

Access detailed analytics about how search engine bots and crawlers interact with your website. Monitor crawl frequency, identify problematic bots, and track rule compliance.

GET /analytics/overview Get crawl overview β–Ό

Returns a high-level summary of crawl activity including total requests, unique bots, blocked requests, and violations over a time period.

Query Parameters

ParameterTypeLocationDescription
fromrequired string query Start date in ISO 8601 format (e.g., 2024-01-01T00:00:00Z).
tooptional string query End date in ISO 8601 format. Defaults to now.
granularityoptional string query Data granularity: hour, day, week, month. Defaults to day.

Example Response 200

{
  "period": {
    "from": "2024-03-01T00:00:00Z",
    "to": "2024-03-15T23:59:59Z"
  },
  "summary": {
    "total_requests": 48723,
    "unique_bots": 14,
    "blocked_requests": 12405,
    "violations": 37,
    "avg_response_time_ms": 23
  },
  "top_bots": [
    { "agent": "Googlebot", "requests": 23410 },
    { "agent": "bingbot", "requests": 8921 },
    { "agent": "YandexBot", "requests": 4532 }
  ]
}
GET /analytics/crawlers Crawler activity log β–Ό

Returns a detailed log of individual crawler requests, including the bot's user-agent, requested path, rule outcome (allowed/blocked), and response time.

Query Parameters

ParameterTypeLocationDescription
fromrequired string query Start timestamp in ISO 8601 format.
tooptional string query End timestamp. Defaults to now.
agentoptional string query Filter by user-agent string.
outcomeoptional string query Filter by outcome: allowed, blocked, violated.
limitoptional integer query Max results to return. Defaults to 100, max 1000.
GET /analytics/violations Rule violations log β–Ό

Returns a list of crawl violations β€” instances where a bot accessed a path that was explicitly disallowed by your rules.

⚠️
What counts as a violation?

A violation is recorded when a crawler accesses a disallowed path. Well-behaved bots (Googlebot, Bingbot) typically respect rules. Violations often indicate malicious scrapers or misconfigured bots.

πŸ””

Webhook Configuration

Configure webhooks to receive real-time notifications about crawl events, rule violations, sitemap updates, and analytics milestones.

GET /webhooks List all webhooks β–Ό

Returns all webhook endpoints configured for your account.

POST /webhooks Create a webhook β–Ό

Creates a new webhook endpoint that receives JSON payloads for specified events.

Request Body

ParameterTypeLocationDescription
urlrequired string body The HTTPS URL that will receive webhook payloads.
eventsrequired array body List of event types: violation.detected, sitemap.generated, rule.updated, crawl.milestone.
secretoptional string body A secret string used to sign webhook payloads. Included in the X-Robots-Signature header.
activeoptional boolean body Whether the webhook is active. Defaults to true.

Example Webhook Payload

{
  "id": "evt_9f2a3b7c1d4e",
  "type": "violation.detected",
  "timestamp": "2024-03-15T14:32:11Z",
  "data": {
    "agent": "Scrapy/2.11",
    "path": "/api/internal/users",
    "rule_id": "rule_3f7a2c1d9e6b",
    "ip_address": "198.51.100.42"
  }
}
πŸ”’

Status Codes Reference

All responses follow standard HTTP status codes. Here is a complete reference of codes used by the Robots.txt API:

200 OK β€” Request succeeded.
201 Created β€” A new resource was successfully created.
204 No Content β€” Request succeeded with no response body (e.g., deletion).
400 Bad Request β€” Invalid request body or parameters.
401 Unauthorized β€” Missing or invalid API key.
403 Forbidden β€” API key lacks permission for this action.
404 Not Found β€” The requested resource doesn't exist.
429 Too Many Requests β€” Rate limit exceeded. Retry after the reset time.
500 Internal Server Error β€” Something went wrong on our end.
503 Service Unavailable β€” Temporary maintenance or overload.
πŸ“¦

SDK Libraries

We provide official SDKs for popular languages and frameworks. All SDKs are open-source and available on GitHub.

Node.js npm install robotstxt-sdk β€” GitHub β†—
Python pip install robotstxt β€” GitHub β†—
Go go get github.com/robotstxt/go-sdk β€” GitHub β†—
Ruby gem install robotstxt-ruby β€” GitHub β†—
PHP composer require robotstxt/sdk-php β€” GitHub β†—
βœ…
Need a different language?

Our REST API is language-agnostic. You can use any HTTP client or make direct API calls from any programming language. Check out our API client examples for more guidance.

πŸ“

Changelog

v2.4.1 2024-03-10 β€” Added webhook signature verification support, improved rate limit headers.
v2.4.0 2024-02-20 β€” New analytics endpoints, sitemap auto-generation, bulk rule operations.
v2.3.0 2024-01-15 β€” Webhook system, violation detection alerts, enhanced error messages.
v2.2.0 2023-12-01 β€” Sitemap API, search engine submission, exclude pattern support.
v2.1.0 2023-10-15 β€” Crawl delay directive, rule comments, sorting and filtering.
v2.0.0 2023-09-01 β€” API v2 launch with full CRUD for rules, pagination, and structured errors.

Robots.txt API Documentation β€” v2.4.1  |  Report an Issue  |  Contact Support