Converting composite tiff files to pdf

I wrote a few posts ago about the joys of converting a bunch of images into a pdf with a single command. Well I’ve been working with a set of tiff files files that Andrew Whitlock gave me for the oacdp project. The tiff files are a composite tiff’s meaning that they themselves are made up of a collection of tiff files.

My goal is to extract the tiff files, convert them to png, and then create a pdf from those pngs. I chose imagemagick because it’s been a great tool and is straightforward to use. I found that wasn’t quite the case for these files.

Firstly the tiff file needs to be broken into it’s component images

convert composite.tiff image-%03d.png

This produces a set of files that are named “image-000.png” and up.

Secondly the tiff files need to be assembled into a pdf file

convert images-*.png final.pdf

As discussed in my previous post, it’s not that easy because there is the issue of imagemagick’s poor memory handling when creating a pdf, so I used the pdfjoin technique from the previous post

pdfjoin --outfile final.pdf images-*.png

All is good.

Now enter an enormous tiff file, with 318 sub images that also brings it’s own set of issues with imagemagick. Often attempting to extract the sub images was causing my machine to completely freeze and kernel panic, which may be more of a kernel issue than imagemagick’s problem, but it was the immensely huge memory usage that brought it there, so I decided to learn another lesson from my previous post and I broke the job into many small calls to imagemagick rather than one large one.

Using my language of choice, I wrote a bit of lisp to do the decomposition of the tiff file, conversion to png, and assemblage into a pdf. Here it is:


;;;; Copyright 2010 Elliott Johnson
;;;; Distributed under the GPL - 3.0
;;;; http://www.gnu.org/licenses/gpl.html

(defpackage :net.elliottjohnson.lisp.convert
  (:use :cl
        :cl-user
        #+sbcl :sb-ext)
  (:nicknames :convert))
(in-package :net.elliottjohnson.lisp.convert)

(defvar *convert-binary*
  "/usr/bin/convert"
  "An acceptable name for the convert binary that's installed on your system.")
(defvar *identify-binary*
  "/usr/bin/identify"
  "An acceptable name for the identify binary that's installed on your system.")
(defvar *pdfjoin-binary*
  "/usr/bin/pdfjoin"
  "An acceptable name for the pdfjoin binary that's installed on your system.")

(defun acceptable-exit-code (process)
  (when (= 0 (process-exit-code process))
    t))

#+sbcl
(defun image-count (multi-image-filename)
  (let ((process (run-program
                  *identify-binary*
                  (list multi-image-filename)
                  :output :stream
                  :error nil)))
    (if (acceptable-exit-code process)
        (let ((file-count 0)
              (stream (process-output process)))
          (if (input-stream-p stream)
              (progn
                (loop for line = (read-line stream nil nil)
                   while line
                   do (incf file-count))
                (1- file-count))
              (error "Bad Outputstream: \"~S ~S\" return ~S~%"
                     *identify-binary*
                     multi-image-filename
                     process)))

        (error "Failed to execute: \"~S ~S\" return ~S~%"
               *identify-binary*
               multi-image-filename
               process))))

#+sbcl
(defun simple-convert (source destination &optional args)
  (let ((process (run-program *convert-binary*
                              (if args
                                  (if (listp args)
                                      `(,source ,@args ,destination)
                                      (list source args destination))
                                  (list source destination)))))
    (unless (acceptable-exit-code process)
      (error "Failed to execute: \"~S ~S ~S\" returned ~S~%"
             *convert-binary*
             source
             destination
             process))))

#+sbcl
(defun join-pdf (pdf-file-pattern dest-pdf-file)
  (let ((process (run-program *pdfjoin-binary*
                              (list "--fitpaper" "false"
                                    "--outfile" dest-pdf-file
                                    pdf-file-pattern))))
    (unless (acceptable-exit-code process)
      (error "Failed to join pdf: \"~A --outfile ~A ~A\" returned ~S~%"
             *pdfjoin-binary*
             dest-pdf-file
             pdf-file-pattern
             process))))

(defun name-source-file (multi-image-filename current-count)
  (format nil "~A\[~D-~D\]" multi-image-filename current-count current-count))

(defun name-dest-file (dest-dir dest-prefix current-count dest-suffix)
  (format nil
          "~A/~A~3,'0D.~A"
          dest-dir
          dest-prefix
          current-count
          dest-suffix))

(defun convert-suffix-to-pdf (source-file)
  (format nil
          "~A/~A.pdf"
          (directory-namestring source-file)
          (pathname-name source-file)
          '("-density" "100%")))

(defun name-dest-files (dest-dir dest-prefix dest-suffix)
  (format nil
          "~A/~A*.~A"
          dest-dir
          dest-prefix
          dest-suffix))

(defun name-dest-pdffile (dest-dir dest-pdf-name)
  (format nil
          "~A/~A"
          dest-dir
          dest-pdf-name))

(defun create-pdf-from-images (dest-dir
                               dest-prefix
                               dest-suffix
                               dest-pdf-name)
  (let ((source-files (directory (name-dest-files dest-dir
                                                  dest-prefix
                                                  dest-suffix))))
    (loop for file in source-files
         do (simple-convert (format nil "~A" file)
                            (convert-suffix-to-pdf file)
                            '("-density" "100%"))))
  (let ((pdf-file-pattern (name-dest-files dest-dir
                                           dest-prefix
                                           "pdf")))
    (join-pdf pdf-file-pattern
              (name-dest-pdffile dest-dir dest-pdf-name))))

(defun convert-multi-image-file (multi-image-file
                                 dest-dir
                                 dest-prefix
                                 dest-suffix
                                 dest-pdf-name)
  (let ((multi-image-count (image-count multi-image-file)))
    (loop for i from 0 to multi-image-count
         do (simple-convert (name-source-file multi-image-file i)
                            (name-dest-file dest-dir
                                            dest-prefix
                                            i
                                            dest-suffix))))
  (create-pdf-from-images dest-dir dest-prefix dest-suffix dest-pdf-name))

Currently it’s only extended to run on sbcl, but it would be trivial to have it work other places. The function to use is the last one and it’s used something like this:

(convert-multi-image-file "/path/to/my/file.tiff" 
                          "/my/dir/" 
                          "image-" 
                          "png" 
                          "final.pdf")

The end result is a pdf file and the png files needed for the project. The code makes use of the square bracket syntax of imagemagick to specify a particular file in a multi image file (such as a video file, pdf, or in this case a tiff). The Imagemagick FAQ is what eventually lead me to the technique.

This method can render everything out with little load in less than one minute for what used to rarely complete in over 45 minutes with massive loads (above 9).

Leave a Reply

Your email address will not be published. Required fields are marked *