Introduction
Static sitemaps work great for small websites, but modern e-commerce platforms and content management systems often generate tens of thousands of URLs dynamically. Manually maintaining a sitemap.xml file is impossible at this scale.
In this tutorial, we'll walk through building a robust, automated sitemap generation pipeline using Sitemap.xml's API and Node.js. You'll learn how to handle URL parameters, exclude thin content, prioritize high-value pages, and ensure search engines crawl your site efficiently.
Prerequisites
- Node.js 18+ installed
- Basic familiarity with REST APIs & XML structure
- An active Sitemap.xml account (free tier works fine)
- Access to your site's routing logic or CMS API
1Understanding Dynamic URLs
Dynamic URLs typically follow patterns like /products/:id, /blog/:slug, or /search?q=keyword&category=electronics. Not all variations should be indexed. Search engines prefer canonical, high-value pages.
Warning: Parameter Bloat
Including every URL variant with sorting, pagination, and session parameters will dilute crawl budget and may trigger duplicate content penalties.
We'll filter URLs based on three criteria:
- Canonicality: Only index the preferred version of a page
- Value: Prioritize pages with substantial content or commercial intent
- Freshness: Update modification dates based on actual content changes
2Programmatic Generation
First, install the Sitemap.xml SDK:
npm install @sitemap/sdk
Next, create a generation script that queries your database or CMS, filters valid URLs, and formats them according to the Sitemap XML protocol:
import { SitemapClient } from '@sitemap/sdk';
const client = new SitemapClient({
apiKey: process.env.SITEMAP_API_KEY,
projectId: process.env.SITEMAP_PROJECT_ID
});
async function generateSitemap() {
const urls = await db.collection('pages').find({
status: 'published',
robots: 'index'
}).toArray();
const sitemapData = urls.map(doc => ({
loc: `https://yoursite.com/${doc.slug}`,
lastmod: doc.updatedAt.toISOString().split('T')[0],
priority: doc.type === 'product' ? 0.9 : 0.7,
changefreq: doc.type === 'blog' ? 'weekly' : 'monthly'
}));
return client.submit(sitemapData);
}
generateSitemap().catch(console.error);
Pro Tip: Priority & Change Frequency
While Google officially states these tags don't directly impact ranking, they help allocate crawl budget efficiently. Use 0.8-1.0 for core products/pages and 0.3-0.6 for archive or utility pages.
3Handling Pagination & Filters
E-commerce sites often generate URLs like /category/electronics?page=2&sort=price_asc. Indexing every combination is counterproductive.
Instead, implement a relational filtering strategy:
- Index only
page=1for each category - Exclude sorting, filtering, and session parameters
- Use
<rel=next>and<rel=prev>in HTML headers for multi-page content - Add filtered URLs to
robots.txtif they lack substantial unique content
function shouldIndex(url) {
const params = new URL(url).searchParams;
if (params.has('page') && params.get('page') !== '1') return false;
if (params.has('sort') || params.has('filter')) return false;
if (params.has('utm_')) return false;
return true;
}
4Search Console Submission
Once generated, you need to notify search engines. Sitemap.xml handles this automatically via our indexing API, but you can also configure manual triggers:
# Trigger via CLI after deployment
sitemap-cli submit --project-id=proj_8x92 --notify=all
# Or use webhooks in your CI/CD pipeline
POST https://api.sitemap.xml/v1/notify
Headers: { "Authorization": "Bearer YOUR_KEY" }
Body: { "urls": ["https://yoursite.com/new-product"] }
Within minutes, Google Search Console will reflect the new URLs under Indexing > Sitemaps. Monitor for crawl errors or exclusion reasons.
Best Practices
- Split large sitemaps: Keep each file under 50MB uncompressed or 50,000 URLs. Use a
sitemap-index.xmlto link them. - Validate XML syntax: Malformed tags cause entire sitemaps to be ignored by parsers.
- Exclude thin content: Tags pages with
<100words, empty categories, or 404/redirected links. - Automate regeneration: Run generation on content publishes, category updates, or nightly cron jobs.
- Monitor coverage: Use Search Console's Coverage report to spot indexing drops.
Schema & Rich Results
While sitemaps handle discovery, pair them with structured data (JSON-LD) for product, article, and FAQ rich results to maximize SERP visibility.
Conclusion
Dynamic sitemap generation is essential for scalable web applications. By filtering low-value URLs, automating submissions, and maintaining clean XML structure, you ensure search engines efficiently discover your most important content.
Ready to implement? Use the Sitemap.xml API or drop us a line in support. In the next tutorial, we'll cover Image & Video Sitemap Extensions for Media-Heavy Sites.