Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

TypeNameComments
Web ToolsStirling-PDF- robust
- local hosted
- Docker container based
Parseur- AI-based PDF parser
DocuSign- Great for convert PDF files to MS Office files, etc.
- non-free: 1 file per 30 minutes
Free PDF Convert- Great for convert PDF files to MS Office files, etc.
- non-free: 1 file per 30 minutes
Adobe Rearrage PDF- Sign in needed
- Paid service but free trial available
I Love PDF- No need to sign in
- Paid service but free trial available
Linux DesktopPDFArranger- Opensource and free
- Easy to use
Okular- support annotating PDFs
- does NOT support removing/adding PDF pages
Evince- most popular PDF viewer in Linux
- does NOT support editing PDF files in any way
Master PDF Editor- Free version available but with very limited features.
- Not recommended.
macOS DesktopPreview- Default PDF viewer on macOS
- Support rotating, adding and removing pages
Windows DesktopMaster PDF Editor- Free version available but with very limited features.
- Not recommended.
Wondershare PDFelement- Great one
- support Chinese font when filling forms
- need to purchase a licence
PDFfiller- good one
- does NOT support Chinese font when filling forms
Bluebeam Revue eXtreme- Great one
- support Chinese fonts when filling forms
- need to purchase a license but 30 days free trial available
Python LibrariesPyPDFA utility to read and write PDFs with Python.
pdfplumberPlumbs a PDF for detailed information about each char, rectangle, line, et cetera, and easily extract text and tables.
pdftotextGreat at parsing text from PDFs which also keeps the original layout as much as possible.
pdfminer.sixA Python library for parsing PDF. It is good for manipulating PDF files but weak at parsing text from PDF files.
camelotA Python library for extracting data tables in PDF files.
tabula-pyA Python binding for [tabulapdf/tabula](https://github.com/tabulapdf/tabula).
tika-python

Java Libraries

tabulapdf/tabula

A Java library for liberating data tables trapped inside PDF files.

apache/tika

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Command-line Tools

pdftk

A command-line tool for filling fileds in PDF docs.

References