Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

markitdown

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. To this end, it is most comparable to textract , but with a focus on preserving important document structure and content as Markdown (including: headings, lists, tables, links, etc.) While the output is often reasonably presentable and human-friendly, it is meant to be consumed by text analysis tools -- and may not be the best option for high-fidelity document conversions for human consumption.

ImageMagick

ImageMagic provides a command convert which helps convert between different types of images.

HTML2Image

Sphinx

pandoc (mark-up language)

markdown (markdown to HTML)

tex4ht (convert Latex code to HTML)

wkhtmltopdf (html to PDF, support URL, best)

rmarkdown (an R package)

CutyCapt

pdflatex

dos2unix, unix2dos