| Type | Name | Comments |
|---|---|---|
| Web Tools | Stirling-PDF | - robust - local hosted - Docker container based |
| Parseur | - AI-based PDF parser | |
| DocuSign | - Great for convert PDF files to MS Office files, etc. - non-free: 1 file per 30 minutes | |
| Free PDF Convert | - Great for convert PDF files to MS Office files, etc. - non-free: 1 file per 30 minutes | |
| Adobe Rearrage PDF | - Sign in needed - Paid service but free trial available | |
| I Love PDF | - No need to sign in - Paid service but free trial available | |
| Linux Desktop | PDFArranger | - Opensource and free - Easy to use |
| Okular | - support annotating PDFs - does NOT support removing/adding PDF pages | |
| Evince | - most popular PDF viewer in Linux - does NOT support editing PDF files in any way | |
| Master PDF Editor | - Free version available but with very limited features. - Not recommended. | |
| macOS Desktop | Preview | - Default PDF viewer on macOS - Support rotating, adding and removing pages |
| Windows Desktop | Master PDF Editor | - Free version available but with very limited features. - Not recommended. |
| Wondershare PDFelement | - Great one - support Chinese font when filling forms - need to purchase a licence | |
| PDFfiller | - good one - does NOT support Chinese font when filling forms | |
| Bluebeam Revue eXtreme | - Great one - support Chinese fonts when filling forms - need to purchase a license but 30 days free trial available | |
| Python Libraries | PyPDF | A utility to read and write PDFs with Python. |
| pdfplumber | Plumbs a PDF for detailed information about each char, rectangle, line, et cetera, and easily extract text and tables. | |
| pdftotext | Great at parsing text from PDFs which also keeps the original layout as much as possible. | |
| pdfminer.six | A Python library for parsing PDF. It is good for manipulating PDF files but weak at parsing text from PDF files. | |
| camelot | A Python library for extracting data tables in PDF files. | |
| tabula-py | A Python binding for [tabulapdf/tabula](https://github.com/tabulapdf/tabula). | |
| tika-python |
Java Libraries¶
tabulapdf/tabula¶
A Java library for liberating data tables trapped inside PDF files.
apache/tika¶
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
Command-line Tools¶
pdftk¶
A command-line tool for filling fileds in PDF docs.