Skip to main content

Sitemap Connector

The Sitemap connector allows you to extract and process pages from website sitemaps when creating a source.

Features

  • Processes pages listed in XML sitemaps
  • Supports regex patterns for URL filtering
  • Respects robots.txt directives

Configuration

type
required
string (Type)
Value: "sitemap"
sitemap_url
required
string (Sitemap Url)
Include Pattern (string) or Include Pattern (null) (Include Pattern)
Exclude Pattern (string) or Exclude Pattern (null) (Exclude Pattern)
{
  • "type": "sitemap",
  • "sitemap_url": "string",
  • "include_pattern": "string",
  • "exclude_pattern": "string"
}

Example Payload

{
"type": "sitemap",
"sitemap_url": "https://example.com/sitemap.xml",
"include_pattern": "https://example.com/docs/*",
"exclude_pattern": "https://example.com/docs/blog/*"
}