syphonx
SyphonX is a tool that extracts data from HTML data, transforming it into JSON of any shape or size. It combines the power of CSS Selectors and jQuery, Regular Expressions, and Javascript into a declarative template format to elegantly solve the simplest
mrspider
simple polite crawling of the web.
roboto
A web crawler for Nodejs.
spider2
A 2nd generation spider to crawl any article site, automatic reading title and content.