Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Type | Name | Comments |
---|---|---|
Web Tools | Parseur | - AI-based PDF parser |
DocuSign |
- Great for convert PDF files to MS Office files, etc. - non-free: 1 file per 30 minutes |
|
Free PDF Convert |
- Great for convert PDF files to MS Office files, etc. - non-free: 1 file per 30 minutes |
|
Adobe Rearrage PDF |
- Sign in needed - Paid service but free trial available |
|
I Love PDF |
- No need to sign in - Paid service but free trial available |
|
Linux Desktop | PDFArranger |
- Opensource and free - Easy to use |
Okular |
- support annotating PDFs - does NOT support removing/adding PDF pages |
|
Evince |
- most popular PDF viewer in Linux - does NOT support editing PDF files in any way |
|
Master PDF Editor |
- Free version available but with very limited features. - Not recommended. |
|
macOS Desktop | Preview |
- Default PDF viewer on macOS - Support rotating, adding and removing pages |
Windows Desktop | Master PDF Editor |
- Free version available but with very limited features. - Not recommended. |
Wondershare PDFelement |
- Great one - support Chinese font when filling forms - need to purchase a licence |
|
PDFfiller |
- good one - does NOT support Chinese font when filling forms |
|
Bluebeam Revue eXtreme |
- Great one - support Chinese fonts when filling forms - need to purchase a license but 30 days free trial available |
|
Python Libraries | PyPDF | A utility to read and write PDFs with Python. |
pdfplumber | Plumbs a PDF for detailed information about each char, rectangle, line, et cetera, and easily extract text and tables. | |
pdftotext | Great at parsing text from PDFs which also keeps the original layout as much as possible. | |
pdfminer.six | A Python library for parsing PDF. It is good for manipulating PDF files but weak at parsing text from PDF files. | |
camelot | A Python library for extracting data tables in PDF files. | |
tabula-py | A Python binding for [tabulapdf/tabula](https://github.com/tabulapdf/tabula). | |
tika-python |
Java Libraries
tabulapdf/tabula
A Java library for liberating data tables trapped inside PDF files.
apache/tika
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
Command-line Tools
pdftk
A command-line tool for filling fileds in PDF docs.