ppu-pdf
Easily extract text from digital PDF files with coordinate and font size included, and optionally group text by lines or render scanned pdf to canvas/png.
n8n-nodes-extract-pdf
n8n node to extract text, images and tables from PDF with multilingual support, language detection and comprehensive test suite
node-easyocr
A Node.js wrapper for the Python EasyOCR library
markdown-crawler
A powerful web crawler that extracts content from web pages and converts them to clean Markdown format, with support for code blocks and GitHub Flavored Markdown