Get the latest tech news

Pdf.tocgen


in.pdf │ ┌──────────────────────┼────────────────────┐ │ │ │ ▽ ▽ ▽ ┌──────────┐ recipe ┌───────────┐ ToC ┌──────────┐ │ pdfxmeta ├─────────▷│ pdftocgen ├────────▷│ pdftocio ├───▷ out.pdf └──────────┘ └───────────┘ └──────────┘ pdf.tocgen is a set of command-line tools for automatically extracting and generating the table of contents (ToC) of a PDF file. It uses the embedded font attributes and position of headings to deduce the basic outline of a PDF file.

For example, for the PDF version of Paul Graham’s book On Lisp, available for download on his website but comes without a table of content, we can use the pdfxmeta command to build a recipe file, This format is intentionally designed to be easily edited(in Vim), since the output of pdftocgen is expected to be inaccurate in many cases and you are likely to tweak the table of contents before you import it to the original PDF file. If you are a Emacs user, you could install Daniel Nicolai’s toc-mode package as a GUI front end for pdf.tocgen, though it offers many more functionalities, such as extracting (printed) table of contents from a PDF file.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Pdf.tocgen

Pdf.tocgen