textract
Extracting text from files of various type including html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf, text/*, and various open office.
fast-xml-parser
Validate XML, Parse XML, Build XML without C/C++ based libraries
jsdom
A JavaScript implementation of many web standards