ilanguage

A Language Independent 'word finding' tool, useful for stemming, tokenizing, indexing, spell checking and other common NLP tasks. Works on any human language and any unicode character set, learns from the data you give it. (Uses compression, maximum entro

html-stemmer

Extracts all [porter2] stemmed words from an HTML file, with the goal of aiding web-based NLP