As part of the oacdp I convert a sections of manuals into pdfs. I’ve been using imagemagick for a while to batch convert. To take any number of images and convert them to a single pdf it’s a simple one line command:
convert *.png -density 100% some.pdf
An annoying feature of imagemagick is that it pulls all the images (in the above case the png files) into memory, converts them in memory to a 16bit (per color channel) representation and then when it’s all assembled writes the content out to disk. Normally this isn’t too bad since I’ve got 6Gigs of memory and 12Gigs of swap and if it takes 3Gigs to process one document I can deal with it.
In working on a type1 manual, the 8th section is 317 images I found it impossible to deal with. I enlarged the amount of swap space, but the load on my system would rise above 15 and eventually the process or my machine would crash.
Andrew Whitlock, who was the previous maintainer of the oacdp confirmed that this happened to him in the past and that he had swapped over to a windows machine to use some application with a reasonable algorithm to convert to PDF. I on the other hand don’t have a windows install, so I began researching online.
Firstly in digging around I found an imagemagick based project called graphicsmagick that is written specifically for multi-core machines like I have. I tried this out, but it uses similar algorithms as imagemagick for processing PDF’s, so really wasn’t an answer, but was an improvement in processing times… it basically made things crash faster.
A little more searching revealed this thread, which describes converting each image to individual pdf’s and then assembling them using pdfjoin of the like. pdfjoin is fast, only taking a handful of seconds to process all 317 pages using pdflatex. The results are what I’d expect imagemagick could do.
It would be nice if imagemagick would get their pdf processing up to snuff, but at least for now I have a work around. Hopefully this can help some other people convering pdf’s using command line tools like imagemagick/graphicsmagick/pdfjoin.