📖

API Documentation

Welcome to the Robots.txt API. Our RESTful API allows you to programmatically manage crawl rules, generate sitemaps, monitor crawler activity, and optimize your website's discoverability by search engines.

All API requests are made over HTTPS to https://api.robotstxt.io/v2. Data is sent and received as JSON. Every request must include an API key in the Authorization header.

ℹ️

API Versioning

The current stable version is v2. Each version is supported for at least 12 months after a new version is released. Include the version in your base URL.

Base URL https://api.robotstxt.io/v2

Authentication

The Robots.txt API uses API keys for authentication. You can generate and manage your API keys from the Dashboard Settings.

API Key (Bearer Token)

OAuth 2.0 (Service Accounts)

Bearer Token Authentication

Include your API key in the Authorization header as a Bearer token:

curl https://api.robotstxt.io/v2/rules \
  -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d" \
  -H "Content-Type: application/json"

const robots = require('robotstxt-sdk');

const client = new RobotsClient('sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d');

const rules = await client.rules.list();

import robotstxt

client = robotstxt.Client("sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d")

rules = client.rules.list()

⚠️

Keep Your Keys Secure

Never expose your secret API keys in client-side code, public repositories, or browser-accessible environments. Use environment variables or a secrets manager.

Rate Limits

API requests are rate-limited based on your plan tier. When you exceed the limit, you'll receive a 429 Too Many Requests response.

Plan	Requests / Minute	Requests / Day
Free	60	1,000
Pro	300	50,000
Enterprise	2,000	Unlimited

Rate limit headers are included in every response:

X-RateLimit-Limit: 300
X-RateLimit-Remaining: 247
X-RateLimit-Reset: 1709251200

Error Handling

The API uses standard HTTP status codes and returns detailed error objects with a machine-readable code and a human-readable message.

// Example error response (400 Bad Request)
{
  "error": {
    "code": "invalid_request",
    "message": "The user_agent field is required and must be a valid bot identifier.",
    "type": "validation_error",
    "details": [
      {
        "field": "user_agent",
        "issue": "missing_required_field"
      }
    ],
    "request_id": "req_8f2d1a4c9b7e3f6a0d5c8e2b"
  }
}

📜

Crawl Rules

Crawl rules define how different user-agents (search engine bots, crawlers, scrapers) should interact with your website. Each rule specifies which paths to allow or disallow for a given agent.

Rule Object

{
  "id": "rule_3f7a2c1d9e6b",
  "user_agent": "*",
  "directives": [
    {
      "type": "allow",
      "path": "/blog/"
    },
    {
      "type": "disallow",
      "path": "/api/"
    },
    {
      "type": "disallow",
      "path": "/admin/"
    }
  ],
  "crawl_delay": 2,
  "sitemap": "https://example.com/sitemap.xml",
  "active": true,
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-02-28T14:22:00Z"
}

GET /rules List all crawl rules ▼

Returns a paginated list of all crawl rules configured for your account. Rules are sorted by creation date in descending order.

Query Parameters

Parameter	Type	Location	Description
pageoptional	integer	query	Page number for pagination. Defaults to `1`.
per_pageoptional	integer	query	Number of results per page. Max `100`. Defaults to `20`.
user_agentoptional	string	query	Filter rules by user-agent string.
activeoptional	boolean	query	Filter by active/inactive status.
sortoptional	string	query	Sort field: `created_at`, `updated_at`, `user_agent`.
orderoptional	string	query	Sort order: `asc` or `desc`. Defaults to `desc`.

Example Request

curl https://api.robotstxt.io/v2/rules?page=1&per_page=10 \
  -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d"

Example Response 200

{
  "data": [
    {
      "id": "rule_3f7a2c1d9e6b",
      "user_agent": "*",
      "directives": [...],
      "active": true,
      "created_at": "2024-01-15T10:30:00Z"
    },
    {
      "id": "rule_8e2b1a4f7c9d",
      "user_agent": "Googlebot",
      "directives": [...],
      "active": true,
      "created_at": "2024-01-10T08:15:00Z"
    }
  ],
  "pagination": {
    "current_page": 1,
    "per_page": 10,
    "total_pages": 3,
    "total_count": 28,
    "has_next": true,
    "has_prev": false
  }
}

POST /rules Create a new crawl rule ▼

Creates a new crawl rule for a specified user-agent. The rule is immediately applied to your generated robots.txt file and propagated to the edge network.

Request Body

Parameter	Type	Location	Description
user_agentrequired	string	body	Target user-agent. Use `*` for all bots, or specific agents like `Googlebot`, `Bingbot`, `facebookexternalhit`.
directivesrequired	array	body	Array of allow/disallow directive objects. Each object requires `type` (`"allow"` or `"disallow"`) and `path` (string starting with `/`).
crawl_delayoptional	integer	body	Minimum seconds between requests. Range: `1`–`60`. Note: Googlebot ignores this directive.
sitemapoptional	string	body	Absolute URL to your sitemap.xml. If not provided, the system uses your default sitemap.
activeoptional	boolean	body	Whether the rule is active. Defaults to `true`. Inactive rules are excluded from the generated robots.txt.
commentoptional	string	body	Optional comment included in the generated file for documentation purposes.

Example Request

curl -X POST https://api.robotstxt.io/v2/rules \
  -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d" \
  -H "Content-Type: application/json" \
  -d '{
    "user_agent": "*",
    "directives": [
      { "type": "allow", "path": "/blog/" },
      { "type": "disallow", "path": "/api/" },
      { "type": "disallow", "path": "/admin/" },
      { "type": "disallow", "path": "/private/" }
    ],
    "crawl_delay": 2,
    "comment": "Default rules for all crawlers"
  }'

const rule = await client.rules.create({
  user_agent: "*",
  directives: [
    { type: "allow", path: "/blog/" },
    { type: "disallow", path: "/api/" },
    { type: "disallow", path: "/admin/" }
  ],
  crawl_delay: 2,
  comment: "Default rules for all crawlers"
});

rule = client.rules.create(
    user_agent="*",
    directives=[
        {"type": "allow", "path": "/blog/"},
        {"type": "disallow", "path": "/api/"},
        {"type": "disallow", "path": "/admin/"}
    ],
    crawl_delay=2,
    comment="Default rules for all crawlers"
)

Example Response 201

{
  "id": "rule_3f7a2c1d9e6b",
  "user_agent": "*",
  "directives": [
    { "type": "allow", "path": "/blog/" },
    { "type": "disallow", "path": "/api/" },
    { "type": "disallow", "path": "/admin/" },
    { "type": "disallow", "path": "/private/" }
  ],
  "crawl_delay": 2,
  "comment": "Default rules for all crawlers",
  "active": true,
  "created_at": "2024-03-15T12:00:00Z",
  "updated_at": "2024-03-15T12:00:00Z"
}

GET /rules/{rule_id} Get a single rule ▼

Retrieves a specific crawl rule by its unique identifier.

Path Parameters

Parameter	Type	Location	Description
rule_idrequired	string	path	The unique identifier of the rule. Format: `rule_` prefix.

Example Request

curl https://api.robotstxt.io/v2/rules/rule_3f7a2c1d9e6b \
  -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d"

Example Response 200

{
  "id": "rule_3f7a2c1d9e6b",
  "user_agent": "*",
  "directives": [
    { "type": "allow", "path": "/blog/" },
    { "type": "disallow", "path": "/api/" },
    { "type": "disallow", "path": "/admin/" }
  ],
  "crawl_delay": 2,
  "active": true,
  "created_at": "2024-03-15T12:00:00Z",
  "updated_at": "2024-03-15T12:00:00Z"
}

PUT /rules/{rule_id} Update a crawl rule ▼

Fully updates an existing crawl rule. All optional fields must be re-specified; unset fields revert to their defaults. For partial updates, use PATCH.

Path Parameters

Parameter	Type	Location	Description
rule_idrequired	string	path	The unique identifier of the rule to update.

Example Request

curl -X PUT https://api.robotstxt.io/v2/rules/rule_3f7a2c1d9e6b \
  -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d" \
  -H "Content-Type: application/json" \
  -d '{
    "user_agent": "*",
    "directives": [
      { "type": "allow", "path": "/" },
      { "type": "disallow", "path": "/api/" },
      { "type": "disallow", "path": "/admin/" }
    ],
    "crawl_delay": 5
  }'

DELETE /rules/{rule_id} Delete a crawl rule ▼

Permanently deletes a crawl rule. This action cannot be undone. The rule is immediately removed from the generated robots.txt.

Example Request

curl -X DELETE https://api.robotstxt.io/v2/rules/rule_3f7a2c1d9e6b \
  -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d"

Example Response 204

// No content — successful deletion

🗺️

Sitemap Management

Automatically generate and manage sitemaps for your website. Sitemaps tell search engines which pages are available for crawling and when they were last modified.

GET /sitemaps List all sitemaps ▼

Returns all sitemaps associated with your account, including their status, last generation time, and URL.

Example Response 200

{
  "data": [
    {
      "id": "sm_7c9d3f2a1b4e",
      "name": "main-sitemap",
      "url": "https://example.com/sitemap.xml",
      "urls_count": 1243,
      "status": "active",
      "last_generated": "2024-03-15T06:00:00Z",
      "auto_generate": true,
      "created_at": "2024-01-01T00:00:00Z"
    }
  ]
}

POST /sitemaps Generate a new sitemap ▼

Creates a new sitemap by crawling your website or importing a URL list. The system discovers all public URLs and builds a valid sitemap.xml with proper lastmod, changefreq, and priority values.

Request Body

Parameter	Type	Location	Description
namerequired	string	body	A descriptive name for the sitemap (e.g., `main-sitemap`).
source_urlrequired	string	body	The root URL to crawl. The system will discover all linked pages.
max_depthoptional	integer	body	Maximum link depth to follow. Defaults to `10`.
auto_generateoptional	boolean	body	If `true`, the sitemap is regenerated daily. Defaults to `true`.
exclude_patternsoptional	array	body	Glob patterns to exclude from the sitemap (e.g., `["/admin/", ".pdf"]`).

POST /sitemaps/{sitemap_id}/submit Submit sitemap to search engines ▼

Submits the specified sitemap to Google Search Console, Bing Webmaster Tools, and other configured search engines simultaneously.

Request Body

Parameter	Type	Location	Description
enginesrequired	array	body	List of engines to submit to: `["google", "bing", "yandex", "baidu"]`.

📊

Crawl Analytics

Access detailed analytics about how search engine bots and crawlers interact with your website. Monitor crawl frequency, identify problematic bots, and track rule compliance.

GET /analytics/overview Get crawl overview ▼

Returns a high-level summary of crawl activity including total requests, unique bots, blocked requests, and violations over a time period.

Query Parameters

Parameter	Type	Location	Description
fromrequired	string	query	Start date in ISO 8601 format (e.g., `2024-01-01T00:00:00Z`).
tooptional	string	query	End date in ISO 8601 format. Defaults to now.
granularityoptional	string	query	Data granularity: `hour`, `day`, `week`, `month`. Defaults to `day`.

Example Response 200

{
  "period": {
    "from": "2024-03-01T00:00:00Z",
    "to": "2024-03-15T23:59:59Z"
  },
  "summary": {
    "total_requests": 48723,
    "unique_bots": 14,
    "blocked_requests": 12405,
    "violations": 37,
    "avg_response_time_ms": 23
  },
  "top_bots": [
    { "agent": "Googlebot", "requests": 23410 },
    { "agent": "bingbot", "requests": 8921 },
    { "agent": "YandexBot", "requests": 4532 }
  ]
}

GET /analytics/crawlers Crawler activity log ▼

Returns a detailed log of individual crawler requests, including the bot's user-agent, requested path, rule outcome (allowed/blocked), and response time.

Query Parameters

Parameter	Type	Location	Description
fromrequired	string	query	Start timestamp in ISO 8601 format.
tooptional	string	query	End timestamp. Defaults to now.
agentoptional	string	query	Filter by user-agent string.
outcomeoptional	string	query	Filter by outcome: `allowed`, `blocked`, `violated`.
limitoptional	integer	query	Max results to return. Defaults to `100`, max `1000`.

GET /analytics/violations Rule violations log ▼

Returns a list of crawl violations — instances where a bot accessed a path that was explicitly disallowed by your rules.

⚠️

What counts as a violation?

A violation is recorded when a crawler accesses a disallowed path. Well-behaved bots (Googlebot, Bingbot) typically respect rules. Violations often indicate malicious scrapers or misconfigured bots.

🔔

Webhook Configuration

Configure webhooks to receive real-time notifications about crawl events, rule violations, sitemap updates, and analytics milestones.

GET /webhooks List all webhooks ▼

Returns all webhook endpoints configured for your account.

POST /webhooks Create a webhook ▼

Creates a new webhook endpoint that receives JSON payloads for specified events.

Request Body

Parameter	Type	Location	Description
urlrequired	string	body	The HTTPS URL that will receive webhook payloads.
eventsrequired	array	body	List of event types: `violation.detected`, `sitemap.generated`, `rule.updated`, `crawl.milestone`.
secretoptional	string	body	A secret string used to sign webhook payloads. Included in the `X-Robots-Signature` header.
activeoptional	boolean	body	Whether the webhook is active. Defaults to `true`.

Example Webhook Payload

{
  "id": "evt_9f2a3b7c1d4e",
  "type": "violation.detected",
  "timestamp": "2024-03-15T14:32:11Z",
  "data": {
    "agent": "Scrapy/2.11",
    "path": "/api/internal/users",
    "rule_id": "rule_3f7a2c1d9e6b",
    "ip_address": "198.51.100.42"
  }
}

🔢

Status Codes Reference

All responses follow standard HTTP status codes. Here is a complete reference of codes used by the Robots.txt API:

200 OK — Request succeeded.

201 Created — A new resource was successfully created.

204 No Content — Request succeeded with no response body (e.g., deletion).

400 Bad Request — Invalid request body or parameters.

401 Unauthorized — Missing or invalid API key.

403 Forbidden — API key lacks permission for this action.

404 Not Found — The requested resource doesn't exist.

429 Too Many Requests — Rate limit exceeded. Retry after the reset time.

500 Internal Server Error — Something went wrong on our end.

503 Service Unavailable — Temporary maintenance or overload.

📦

SDK Libraries

We provide official SDKs for popular languages and frameworks. All SDKs are open-source and available on GitHub.

Node.js npm install robotstxt-sdk — GitHub ↗

Python pip install robotstxt — GitHub ↗

Go go get github.com/robotstxt/go-sdk — GitHub ↗

Ruby gem install robotstxt-ruby — GitHub ↗

PHP composer require robotstxt/sdk-php — GitHub ↗

✅

Need a different language?

Our REST API is language-agnostic. You can use any HTTP client or make direct API calls from any programming language. Check out our API client examples for more guidance.

📝

Changelog

v2.4.1 2024-03-10 — Added webhook signature verification support, improved rate limit headers.

v2.4.0 2024-02-20 — New analytics endpoints, sitemap auto-generation, bulk rule operations.

v2.3.0 2024-01-15 — Webhook system, violation detection alerts, enhanced error messages.

v2.2.0 2023-12-01 — Sitemap API, search engine submission, exclude pattern support.

v2.1.0 2023-10-15 — Crawl delay directive, rule comments, sorting and filtering.

v2.0.0 2023-09-01 — API v2 launch with full CRUD for rules, pagination, and structured errors.