API Documentation
Welcome to the Robots.txt API. Our RESTful API allows you to programmatically manage crawl rules, generate sitemaps, monitor crawler activity, and optimize your website's discoverability by search engines.
All API requests are made over HTTPS to https://api.robotstxt.io/v2. Data is sent and received as JSON. Every request must include an API key in the Authorization header.
The current stable version is v2. Each version is supported for at least 12 months after a new version is released. Include the version in your base URL.
Authentication
The Robots.txt API uses API keys for authentication. You can generate and manage your API keys from the Dashboard Settings.
Bearer Token Authentication
Include your API key in the Authorization header as a Bearer token:
curl https://api.robotstxt.io/v2/rules \ -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d" \ -H "Content-Type: application/json"
Never expose your secret API keys in client-side code, public repositories, or browser-accessible environments. Use environment variables or a secrets manager.
Rate Limits
API requests are rate-limited based on your plan tier. When you exceed the limit, you'll receive a 429 Too Many Requests response.
| Plan | Requests / Minute | Requests / Day |
|---|---|---|
| Free | 60 | 1,000 |
| Pro | 300 | 50,000 |
| Enterprise | 2,000 | Unlimited |
Rate limit headers are included in every response:
X-RateLimit-Limit: 300 X-RateLimit-Remaining: 247 X-RateLimit-Reset: 1709251200
Error Handling
The API uses standard HTTP status codes and returns detailed error objects with a machine-readable code and a human-readable message.
// Example error response (400 Bad Request) { "error": { "code": "invalid_request", "message": "The user_agent field is required and must be a valid bot identifier.", "type": "validation_error", "details": [ { "field": "user_agent", "issue": "missing_required_field" } ], "request_id": "req_8f2d1a4c9b7e3f6a0d5c8e2b" } }
Crawl Rules
Crawl rules define how different user-agents (search engine bots, crawlers, scrapers) should interact with your website. Each rule specifies which paths to allow or disallow for a given agent.
Rule Object
{
"id": "rule_3f7a2c1d9e6b",
"user_agent": "*",
"directives": [
{
"type": "allow",
"path": "/blog/"
},
{
"type": "disallow",
"path": "/api/"
},
{
"type": "disallow",
"path": "/admin/"
}
],
"crawl_delay": 2,
"sitemap": "https://example.com/sitemap.xml",
"active": true,
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-02-28T14:22:00Z"
}
Returns a paginated list of all crawl rules configured for your account. Rules are sorted by creation date in descending order.
Query Parameters
| Parameter | Type | Location | Description |
|---|---|---|---|
| pageoptional | integer | query | Page number for pagination. Defaults to 1. |
| per_pageoptional | integer | query | Number of results per page. Max 100. Defaults to 20. |
| user_agentoptional | string | query | Filter rules by user-agent string. |
| activeoptional | boolean | query | Filter by active/inactive status. |
| sortoptional | string | query | Sort field: created_at, updated_at, user_agent. |
| orderoptional | string | query | Sort order: asc or desc. Defaults to desc. |
Example Request
curl https://api.robotstxt.io/v2/rules?page=1&per_page=10 \ -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d"
Example Response 200
{
"data": [
{
"id": "rule_3f7a2c1d9e6b",
"user_agent": "*",
"directives": [...],
"active": true,
"created_at": "2024-01-15T10:30:00Z"
},
{
"id": "rule_8e2b1a4f7c9d",
"user_agent": "Googlebot",
"directives": [...],
"active": true,
"created_at": "2024-01-10T08:15:00Z"
}
],
"pagination": {
"current_page": 1,
"per_page": 10,
"total_pages": 3,
"total_count": 28,
"has_next": true,
"has_prev": false
}
}
Creates a new crawl rule for a specified user-agent. The rule is immediately applied to your generated robots.txt file and propagated to the edge network.
Request Body
| Parameter | Type | Location | Description |
|---|---|---|---|
| user_agentrequired | string | body | Target user-agent. Use * for all bots, or specific agents like Googlebot, Bingbot, facebookexternalhit. |
| directivesrequired | array | body | Array of allow/disallow directive objects. Each object requires type ("allow" or "disallow") and path (string starting with /). |
| crawl_delayoptional | integer | body | Minimum seconds between requests. Range: 1β60. Note: Googlebot ignores this directive. |
| sitemapoptional | string | body | Absolute URL to your sitemap.xml. If not provided, the system uses your default sitemap. |
| activeoptional | boolean | body | Whether the rule is active. Defaults to true. Inactive rules are excluded from the generated robots.txt. |
| commentoptional | string | body | Optional comment included in the generated file for documentation purposes. |
Example Request
curl -X POST https://api.robotstxt.io/v2/rules \ -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d" \ -H "Content-Type: application/json" \ -d '{ "user_agent": "*", "directives": [ { "type": "allow", "path": "/blog/" }, { "type": "disallow", "path": "/api/" }, { "type": "disallow", "path": "/admin/" }, { "type": "disallow", "path": "/private/" } ], "crawl_delay": 2, "comment": "Default rules for all crawlers" }'
Example Response 201
{
"id": "rule_3f7a2c1d9e6b",
"user_agent": "*",
"directives": [
{ "type": "allow", "path": "/blog/" },
{ "type": "disallow", "path": "/api/" },
{ "type": "disallow", "path": "/admin/" },
{ "type": "disallow", "path": "/private/" }
],
"crawl_delay": 2,
"comment": "Default rules for all crawlers",
"active": true,
"created_at": "2024-03-15T12:00:00Z",
"updated_at": "2024-03-15T12:00:00Z"
}
Retrieves a specific crawl rule by its unique identifier.
Path Parameters
| Parameter | Type | Location | Description |
|---|---|---|---|
| rule_idrequired | string | path | The unique identifier of the rule. Format: rule_ prefix. |
Example Request
curl https://api.robotstxt.io/v2/rules/rule_3f7a2c1d9e6b \ -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d"
Example Response 200
{
"id": "rule_3f7a2c1d9e6b",
"user_agent": "*",
"directives": [
{ "type": "allow", "path": "/blog/" },
{ "type": "disallow", "path": "/api/" },
{ "type": "disallow", "path": "/admin/" }
],
"crawl_delay": 2,
"active": true,
"created_at": "2024-03-15T12:00:00Z",
"updated_at": "2024-03-15T12:00:00Z"
}
Fully updates an existing crawl rule. All optional fields must be re-specified; unset fields revert to their defaults. For partial updates, use PATCH.
Path Parameters
| Parameter | Type | Location | Description |
|---|---|---|---|
| rule_idrequired | string | path | The unique identifier of the rule to update. |
Example Request
curl -X PUT https://api.robotstxt.io/v2/rules/rule_3f7a2c1d9e6b \ -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d" \ -H "Content-Type: application/json" \ -d '{ "user_agent": "*", "directives": [ { "type": "allow", "path": "/" }, { "type": "disallow", "path": "/api/" }, { "type": "disallow", "path": "/admin/" } ], "crawl_delay": 5 }'
Permanently deletes a crawl rule. This action cannot be undone. The rule is immediately removed from the generated robots.txt.
Example Request
curl -X DELETE https://api.robotstxt.io/v2/rules/rule_3f7a2c1d9e6b \ -H "Authorization: Bearer sk_live_4f8a2c1d9e6b3a7f0c5d8e2b1a4f7c9d"
Example Response 204
// No content β successful deletion
Sitemap Management
Automatically generate and manage sitemaps for your website. Sitemaps tell search engines which pages are available for crawling and when they were last modified.
Returns all sitemaps associated with your account, including their status, last generation time, and URL.
Example Response 200
{
"data": [
{
"id": "sm_7c9d3f2a1b4e",
"name": "main-sitemap",
"url": "https://example.com/sitemap.xml",
"urls_count": 1243,
"status": "active",
"last_generated": "2024-03-15T06:00:00Z",
"auto_generate": true,
"created_at": "2024-01-01T00:00:00Z"
}
]
}
Creates a new sitemap by crawling your website or importing a URL list. The system discovers all public URLs and builds a valid sitemap.xml with proper lastmod, changefreq, and priority values.
Request Body
| Parameter | Type | Location | Description |
|---|---|---|---|
| namerequired | string | body | A descriptive name for the sitemap (e.g., main-sitemap). |
| source_urlrequired | string | body | The root URL to crawl. The system will discover all linked pages. |
| max_depthoptional | integer | body | Maximum link depth to follow. Defaults to 10. |
| auto_generateoptional | boolean | body | If true, the sitemap is regenerated daily. Defaults to true. |
| exclude_patternsoptional | array | body | Glob patterns to exclude from the sitemap (e.g., ["/admin/*", "*.pdf"]). |
Submits the specified sitemap to Google Search Console, Bing Webmaster Tools, and other configured search engines simultaneously.
Request Body
| Parameter | Type | Location | Description |
|---|---|---|---|
| enginesrequired | array | body | List of engines to submit to: ["google", "bing", "yandex", "baidu"]. |
Crawl Analytics
Access detailed analytics about how search engine bots and crawlers interact with your website. Monitor crawl frequency, identify problematic bots, and track rule compliance.
Returns a high-level summary of crawl activity including total requests, unique bots, blocked requests, and violations over a time period.
Query Parameters
| Parameter | Type | Location | Description |
|---|---|---|---|
| fromrequired | string | query | Start date in ISO 8601 format (e.g., 2024-01-01T00:00:00Z). |
| tooptional | string | query | End date in ISO 8601 format. Defaults to now. |
| granularityoptional | string | query | Data granularity: hour, day, week, month. Defaults to day. |
Example Response 200
{
"period": {
"from": "2024-03-01T00:00:00Z",
"to": "2024-03-15T23:59:59Z"
},
"summary": {
"total_requests": 48723,
"unique_bots": 14,
"blocked_requests": 12405,
"violations": 37,
"avg_response_time_ms": 23
},
"top_bots": [
{ "agent": "Googlebot", "requests": 23410 },
{ "agent": "bingbot", "requests": 8921 },
{ "agent": "YandexBot", "requests": 4532 }
]
}
Returns a detailed log of individual crawler requests, including the bot's user-agent, requested path, rule outcome (allowed/blocked), and response time.
Query Parameters
| Parameter | Type | Location | Description |
|---|---|---|---|
| fromrequired | string | query | Start timestamp in ISO 8601 format. |
| tooptional | string | query | End timestamp. Defaults to now. |
| agentoptional | string | query | Filter by user-agent string. |
| outcomeoptional | string | query | Filter by outcome: allowed, blocked, violated. |
| limitoptional | integer | query | Max results to return. Defaults to 100, max 1000. |
Returns a list of crawl violations β instances where a bot accessed a path that was explicitly disallowed by your rules.
A violation is recorded when a crawler accesses a disallowed path. Well-behaved bots (Googlebot, Bingbot) typically respect rules. Violations often indicate malicious scrapers or misconfigured bots.
Webhook Configuration
Configure webhooks to receive real-time notifications about crawl events, rule violations, sitemap updates, and analytics milestones.
Returns all webhook endpoints configured for your account.
Creates a new webhook endpoint that receives JSON payloads for specified events.
Request Body
| Parameter | Type | Location | Description |
|---|---|---|---|
| urlrequired | string | body | The HTTPS URL that will receive webhook payloads. |
| eventsrequired | array | body | List of event types: violation.detected, sitemap.generated, rule.updated, crawl.milestone. |
| secretoptional | string | body | A secret string used to sign webhook payloads. Included in the X-Robots-Signature header. |
| activeoptional | boolean | body | Whether the webhook is active. Defaults to true. |
Example Webhook Payload
{
"id": "evt_9f2a3b7c1d4e",
"type": "violation.detected",
"timestamp": "2024-03-15T14:32:11Z",
"data": {
"agent": "Scrapy/2.11",
"path": "/api/internal/users",
"rule_id": "rule_3f7a2c1d9e6b",
"ip_address": "198.51.100.42"
}
}
Status Codes Reference
All responses follow standard HTTP status codes. Here is a complete reference of codes used by the Robots.txt API:
SDK Libraries
We provide official SDKs for popular languages and frameworks. All SDKs are open-source and available on GitHub.
npm install robotstxt-sdk β GitHub β
pip install robotstxt β GitHub β
go get github.com/robotstxt/go-sdk β GitHub β
gem install robotstxt-ruby β GitHub β
composer require robotstxt/sdk-php β GitHub β
Our REST API is language-agnostic. You can use any HTTP client or make direct API calls from any programming language. Check out our API client examples for more guidance.
Changelog
Robots.txt API Documentation β v2.4.1 | Report an Issue | Contact Support