I wrote a few posts ago about the joys of converting a bunch of images into a pdf with a single command. Well I’ve been working with a set of tiff files files that Andrew Whitlock gave me for the oacdp project. The tiff files are a composite tiff’s meaning that they themselves are made up of a collection of tiff files.
My goal is to extract the tiff files, convert them to png, and then create a pdf from those pngs. I chose imagemagick because it’s been a great tool and is straightforward to use. I found that wasn’t quite the case for these files.
Firstly the tiff file needs to be broken into it’s component images
convert composite.tiff image-%03d.png
This produces a set of files that are named “image-000.png” and up.
Secondly the tiff files need to be assembled into a pdf file
convert images-*.png final.pdf
As discussed in my previous post, it’s not that easy because there is the issue of imagemagick’s poor memory handling when creating a pdf, so I used the pdfjoin technique from the previous post
pdfjoin --outfile final.pdf images-*.png
All is good.
Now enter an enormous tiff file, with 318 sub images that also brings it’s own set of issues with imagemagick. Often attempting to extract the sub images was causing my machine to completely freeze and kernel panic, which may be more of a kernel issue than imagemagick’s problem, but it was the immensely huge memory usage that brought it there, so I decided to learn another lesson from my previous post and I broke the job into many small calls to imagemagick rather than one large one.
Using my language of choice, I wrote a bit of lisp to do the decomposition of the tiff file, conversion to png, and assemblage into a pdf. Here it is:
;;;; Copyright 2010 Elliott Johnson ;;;; Distributed under the GPL - 3.0 ;;;; http://www.gnu.org/licenses/gpl.html (defpackage :net.elliottjohnson.lisp.convert (:use :cl :cl-user #+sbcl :sb-ext) (:nicknames :convert)) (in-package :net.elliottjohnson.lisp.convert) (defvar *convert-binary* "/usr/bin/convert" "An acceptable name for the convert binary that's installed on your system.") (defvar *identify-binary* "/usr/bin/identify" "An acceptable name for the identify binary that's installed on your system.") (defvar *pdfjoin-binary* "/usr/bin/pdfjoin" "An acceptable name for the pdfjoin binary that's installed on your system.") (defun acceptable-exit-code (process) (when (= 0 (process-exit-code process)) t)) #+sbcl (defun image-count (multi-image-filename) (let ((process (run-program *identify-binary* (list multi-image-filename) :output :stream :error nil))) (if (acceptable-exit-code process) (let ((file-count 0) (stream (process-output process))) (if (input-stream-p stream) (progn (loop for line = (read-line stream nil nil) while line do (incf file-count)) (1- file-count)) (error "Bad Outputstream: \"~S ~S\" return ~S~%" *identify-binary* multi-image-filename process))) (error "Failed to execute: \"~S ~S\" return ~S~%" *identify-binary* multi-image-filename process)))) #+sbcl (defun simple-convert (source destination &optional args) (let ((process (run-program *convert-binary* (if args (if (listp args) `(,source ,@args ,destination) (list source args destination)) (list source destination))))) (unless (acceptable-exit-code process) (error "Failed to execute: \"~S ~S ~S\" returned ~S~%" *convert-binary* source destination process)))) #+sbcl (defun join-pdf (pdf-file-pattern dest-pdf-file) (let ((process (run-program *pdfjoin-binary* (list "--fitpaper" "false" "--outfile" dest-pdf-file pdf-file-pattern)))) (unless (acceptable-exit-code process) (error "Failed to join pdf: \"~A --outfile ~A ~A\" returned ~S~%" *pdfjoin-binary* dest-pdf-file pdf-file-pattern process)))) (defun name-source-file (multi-image-filename current-count) (format nil "~A\[~D-~D\]" multi-image-filename current-count current-count)) (defun name-dest-file (dest-dir dest-prefix current-count dest-suffix) (format nil "~A/~A~3,'0D.~A" dest-dir dest-prefix current-count dest-suffix)) (defun convert-suffix-to-pdf (source-file) (format nil "~A/~A.pdf" (directory-namestring source-file) (pathname-name source-file) '("-density" "100%"))) (defun name-dest-files (dest-dir dest-prefix dest-suffix) (format nil "~A/~A*.~A" dest-dir dest-prefix dest-suffix)) (defun name-dest-pdffile (dest-dir dest-pdf-name) (format nil "~A/~A" dest-dir dest-pdf-name)) (defun create-pdf-from-images (dest-dir dest-prefix dest-suffix dest-pdf-name) (let ((source-files (directory (name-dest-files dest-dir dest-prefix dest-suffix)))) (loop for file in source-files do (simple-convert (format nil "~A" file) (convert-suffix-to-pdf file) '("-density" "100%")))) (let ((pdf-file-pattern (name-dest-files dest-dir dest-prefix "pdf"))) (join-pdf pdf-file-pattern (name-dest-pdffile dest-dir dest-pdf-name)))) (defun convert-multi-image-file (multi-image-file dest-dir dest-prefix dest-suffix dest-pdf-name) (let ((multi-image-count (image-count multi-image-file))) (loop for i from 0 to multi-image-count do (simple-convert (name-source-file multi-image-file i) (name-dest-file dest-dir dest-prefix i dest-suffix)))) (create-pdf-from-images dest-dir dest-prefix dest-suffix dest-pdf-name))
Currently it’s only extended to run on sbcl, but it would be trivial to have it work other places. The function to use is the last one and it’s used something like this:
(convert-multi-image-file "/path/to/my/file.tiff" "/my/dir/" "image-" "png" "final.pdf")
The end result is a pdf file and the png files needed for the project. The code makes use of the square bracket syntax of imagemagick to specify a particular file in a multi image file (such as a video file, pdf, or in this case a tiff). The Imagemagick FAQ is what eventually lead me to the technique.
This method can render everything out with little load in less than one minute for what used to rarely complete in over 45 minutes with massive loads (above 9).