Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Editing PDF Files

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Type Name Comments
Web Tools Parseur - AI-based PDF parser
DocuSign - Great for convert PDF files to MS Office files, etc.
- non-free: 1 file per 30 minutes
Free PDF Convert - Great for convert PDF files to MS Office files, etc.
- non-free: 1 file per 30 minutes
Adobe Rearrage PDF - Sign in needed
- Paid service but free trial available
I Love PDF - No need to sign in
- Paid service but free trial available
Linux Desktop PDFArranger - Opensource and free
- Easy to use
Okular - support annotating PDFs
- does NOT support removing/adding PDF pages
Evince - most popular PDF viewer in Linux
- does NOT support editing PDF files in any way
Master PDF Editor - Free version available but with very limited features.
- Not recommended.
macOS Desktop Preview - Default PDF viewer on macOS
- Support rotating, adding and removing pages
Windows Desktop Master PDF Editor - Free version available but with very limited features.
- Not recommended.
Wondershare PDFelement - Great one
- support Chinese font when filling forms
- need to purchase a licence
PDFfiller - good one
- does NOT support Chinese font when filling forms
Bluebeam Revue eXtreme - Great one
- support Chinese fonts when filling forms
- need to purchase a license but 30 days free trial available
Python Libraries PyPDF A utility to read and write PDFs with Python.
pdfplumber Plumbs a PDF for detailed information about each char, rectangle, line, et cetera, and easily extract text and tables.
pdftotext Great at parsing text from PDFs which also keeps the original layout as much as possible.
pdfminer.six A Python library for parsing PDF. It is good for manipulating PDF files but weak at parsing text from PDF files.
camelot A Python library for extracting data tables in PDF files.
tabula-py A Python binding for [tabulapdf/tabula](https://github.com/tabulapdf/tabula).
tika-python

Java Libraries

tabulapdf/tabula

A Java library for liberating data tables trapped inside PDF files.

apache/tika

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Command-line Tools

pdftk

A command-line tool for filling fileds in PDF docs.

References

Comments