Preprocess text in html files replacing quotes, ellipses, and other characters with their encoded utf equivalent