Text tokenization, transformation & analysis transducers, utilities, stop words, porter stemming, vector encodings, similarities