[2009-09-23] PDF Manipulation Tools

I wanted to manipulate a few PDF files recently and was on the lookout for suitable tools. More specifically, I wanted to convert a few double-page PDF files (containing two pages of text on a single page) into single-page PDF files. I also wanted to drop some of the pages in order to have the files contain just the text that I was interested in. Fortunately for me, there are several freely-available tools that do the job well.

The Perl PDF::API2 CPAN module is fairly versatile and quite handy if you know a bit of programming. For example, this script by "iblis" converts a double-page PDF into a single-page PDF. (There is a slight bug in that script - you should remove the quotes on line #28 or your output files will always literally be named "$newfilename".) According to its web-site the PDF::API2 is unfortunately no longer being maintained, though that certainly does not diminish its utility.

The pdftk command-line tool is quite useful for a number of tasks on PDF files. For example, to extract pages 2 to 10 and 15 to 23 from a PDF file named "foo.pdf" and create a PDF file named "bar.pdf", you can execute:

  pdftk A=foo.pdf cat A2-10 A15-23 output bar.pdf

See its web-site for a number of other examples that show its power. Its command-line syntax takes a little while to get used to, but that's worth the effort. Note that the author of the tool uses GCJ to create standalone executables, especially on Windows - it is gratifying to realise that yours truly had a part to play, however small, in making this happen.

I also looked at some other tools, notably PDF Split and Merge (PDFsam) and PDFill. PDFsam is written in Java and looks promising; unfortunately for me, a warning tone was all that I could get out of it as I tried out its different plug-ins. I didn't get around to trying PDFill as pdftk and PDF::API2 were more than enough for my purpose.

(Originally posted on Blogspot.)

Other Posts from 2009