simplecrawler

Very straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.

spider-detector

A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS