Editing PDF Files

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Type	Name	Comments
Web Tools	Stirling-PDF	- robust - local hosted - Docker container based
	Parseur	- AI-based PDF parser
	DocuSign	- Great for convert PDF files to MS Office files, etc. - non-free: 1 file per 30 minutes
	Free PDF Convert	- Great for convert PDF files to MS Office files, etc. - non-free: 1 file per 30 minutes
	Adobe Rearrage PDF	- Sign in needed - Paid service but free trial available
	I Love PDF	- No need to sign in - Paid service but free trial available
Linux Desktop	PDFArranger	- Opensource and free - Easy to use
	Okular	- support annotating PDFs - does NOT support removing/adding PDF pages
	Evince	- most popular PDF viewer in Linux - does NOT support editing PDF files in any way
	Master PDF Editor	- Free version available but with very limited features. - Not recommended.
macOS Desktop	Preview	- Default PDF viewer on macOS - Support rotating, adding and removing pages
Windows Desktop	Master PDF Editor	- Free version available but with very limited features. - Not recommended.
	Wondershare PDFelement	- Great one - support Chinese font when filling forms - need to purchase a licence
	PDFfiller	- good one - does NOT support Chinese font when filling forms
	Bluebeam Revue eXtreme	- Great one - support Chinese fonts when filling forms - need to purchase a license but 30 days free trial available
Python Libraries	PyPDF	A utility to read and write PDFs with Python.
	pdfplumber	Plumbs a PDF for detailed information about each char, rectangle, line, et cetera, and easily extract text and tables.
	pdftotext	Great at parsing text from PDFs which also keeps the original layout as much as possible.
	pdfminer.six	A Python library for parsing PDF. It is good for manipulating PDF files but weak at parsing text from PDF files.
	camelot	A Python library for extracting data tables in PDF files.
	tabula-py	A Python binding for [tabulapdf/tabula](https://github.com/tabulapdf/tabula).
	tika-python

Java Libraries

tabulapdf/tabula

A Java library for liberating data tables trapped inside PDF files.

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Command-line Tools

pdftk

A command-line tool for filling fileds in PDF docs.

Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Editing PDF Files

Java Libraries

tabulapdf/tabula

apache/tika

Command-line Tools

pdftk

References

Comments