hittp

HTTP library specifically designed for crawling the web. Built-in caching and per-domain queueing

getsitemap

Node.js module that recursively crawls a website's sitemap and returns a stream of URLs