Converting composite tiff files to pdf

I wrote a few posts ago about the joys of converting a bunch of images into a pdf with a single command. Well I’ve been working with a set of tiff files files that Andrew Whitlock gave me for the oacdp project. The tiff files are a composite tiff’s meaning that they themselves are made up of a collection of tiff files.

My goal is to extract the tiff files, convert them to png, and then create a pdf from those pngs. I chose imagemagick because it’s been a great tool and is straightforward to use. I found that wasn’t quite the case for these files.

Firstly the tiff file needs to be broken into it’s component images

convert composite.tiff image-%03d.png

This produces a set of files that are named “image-000.png” and up.

Secondly the tiff files need to be assembled into a pdf file

convert images-*.png final.pdf

As discussed in my previous post, it’s not that easy because there is the issue of imagemagick’s poor memory handling when creating a pdf, so I used the pdfjoin technique from the previous post

pdfjoin --outfile final.pdf images-*.png

All is good.

Now enter an enormous tiff file, with 318 sub images that also brings it’s own set of issues with imagemagick. Often attempting to extract the sub images was causing my machine to completely freeze and kernel panic, which may be more of a kernel issue than imagemagick’s problem, but it was the immensely huge memory usage that brought it there, so I decided to learn another lesson from my previous post and I broke the job into many small calls to imagemagick rather than one large one.

Using my language of choice, I wrote a bit of lisp to do the decomposition of the tiff file, conversion to png, and assemblage into a pdf. Here it is:


;;;; Copyright 2010 Elliott Johnson
;;;; Distributed under the GPL - 3.0
;;;; http://www.gnu.org/licenses/gpl.html

(defpackage :net.elliottjohnson.lisp.convert
  (:use :cl
        :cl-user
        #+sbcl :sb-ext)
  (:nicknames :convert))
(in-package :net.elliottjohnson.lisp.convert)

(defvar *convert-binary*
  "/usr/bin/convert"
  "An acceptable name for the convert binary that's installed on your system.")
(defvar *identify-binary*
  "/usr/bin/identify"
  "An acceptable name for the identify binary that's installed on your system.")
(defvar *pdfjoin-binary*
  "/usr/bin/pdfjoin"
  "An acceptable name for the pdfjoin binary that's installed on your system.")

(defun acceptable-exit-code (process)
  (when (= 0 (process-exit-code process))
    t))

#+sbcl
(defun image-count (multi-image-filename)
  (let ((process (run-program
                  *identify-binary*
                  (list multi-image-filename)
                  :output :stream
                  :error nil)))
    (if (acceptable-exit-code process)
        (let ((file-count 0)
              (stream (process-output process)))
          (if (input-stream-p stream)
              (progn
                (loop for line = (read-line stream nil nil)
                   while line
                   do (incf file-count))
                (1- file-count))
              (error "Bad Outputstream: \"~S ~S\" return ~S~%"
                     *identify-binary*
                     multi-image-filename
                     process)))

        (error "Failed to execute: \"~S ~S\" return ~S~%"
               *identify-binary*
               multi-image-filename
               process))))

#+sbcl
(defun simple-convert (source destination &optional args)
  (let ((process (run-program *convert-binary*
                              (if args
                                  (if (listp args)
                                      `(,source ,@args ,destination)
                                      (list source args destination))
                                  (list source destination)))))
    (unless (acceptable-exit-code process)
      (error "Failed to execute: \"~S ~S ~S\" returned ~S~%"
             *convert-binary*
             source
             destination
             process))))

#+sbcl
(defun join-pdf (pdf-file-pattern dest-pdf-file)
  (let ((process (run-program *pdfjoin-binary*
                              (list "--fitpaper" "false"
                                    "--outfile" dest-pdf-file
                                    pdf-file-pattern))))
    (unless (acceptable-exit-code process)
      (error "Failed to join pdf: \"~A --outfile ~A ~A\" returned ~S~%"
             *pdfjoin-binary*
             dest-pdf-file
             pdf-file-pattern
             process))))

(defun name-source-file (multi-image-filename current-count)
  (format nil "~A\[~D-~D\]" multi-image-filename current-count current-count))

(defun name-dest-file (dest-dir dest-prefix current-count dest-suffix)
  (format nil
          "~A/~A~3,'0D.~A"
          dest-dir
          dest-prefix
          current-count
          dest-suffix))

(defun convert-suffix-to-pdf (source-file)
  (format nil
          "~A/~A.pdf"
          (directory-namestring source-file)
          (pathname-name source-file)
          '("-density" "100%")))

(defun name-dest-files (dest-dir dest-prefix dest-suffix)
  (format nil
          "~A/~A*.~A"
          dest-dir
          dest-prefix
          dest-suffix))

(defun name-dest-pdffile (dest-dir dest-pdf-name)
  (format nil
          "~A/~A"
          dest-dir
          dest-pdf-name))

(defun create-pdf-from-images (dest-dir
                               dest-prefix
                               dest-suffix
                               dest-pdf-name)
  (let ((source-files (directory (name-dest-files dest-dir
                                                  dest-prefix
                                                  dest-suffix))))
    (loop for file in source-files
         do (simple-convert (format nil "~A" file)
                            (convert-suffix-to-pdf file)
                            '("-density" "100%"))))
  (let ((pdf-file-pattern (name-dest-files dest-dir
                                           dest-prefix
                                           "pdf")))
    (join-pdf pdf-file-pattern
              (name-dest-pdffile dest-dir dest-pdf-name))))

(defun convert-multi-image-file (multi-image-file
                                 dest-dir
                                 dest-prefix
                                 dest-suffix
                                 dest-pdf-name)
  (let ((multi-image-count (image-count multi-image-file)))
    (loop for i from 0 to multi-image-count
         do (simple-convert (name-source-file multi-image-file i)
                            (name-dest-file dest-dir
                                            dest-prefix
                                            i
                                            dest-suffix))))
  (create-pdf-from-images dest-dir dest-prefix dest-suffix dest-pdf-name))

Currently it’s only extended to run on sbcl, but it would be trivial to have it work other places. The function to use is the last one and it’s used something like this:

(convert-multi-image-file "/path/to/my/file.tiff" 
                          "/my/dir/" 
                          "image-" 
                          "png" 
                          "final.pdf")

The end result is a pdf file and the png files needed for the project. The code makes use of the square bracket syntax of imagemagick to specify a particular file in a multi image file (such as a video file, pdf, or in this case a tiff). The Imagemagick FAQ is what eventually lead me to the technique.

This method can render everything out with little load in less than one minute for what used to rarely complete in over 45 minutes with massive loads (above 9).

Updating the Canon 5d MII firmware under linux

I just grabbed the 2.0.4 version of the Canon 5d Mark II firmware and these are the steps it took to install it using GNU/Linux.

  1. Fetch the Mac OS X dmg version of the firmware
  2. Install dmg2img
  3. Convert the DMG to an IMG file
    dmg2img eos5d2204.dmg
  4. mount it temporarily
    mount -t hfs -o loop eos5d2204.dmg /mnt/tmp
  5. insert a CF card and note where it’s mounted
  6. Copy it to the card:
    cp /mnt/tmp/5d200204.fir /path/to/CF/card

See linux is easy ;)

Actually, would it be a pain for Canon to just release the firmware in a zip file? I’d think that would work for everybody, but I’m guessing that they are relying on a possible built in checksumming in a DMG and their exe files? I’m not sure about the checksum, but that would make sense.

For those curious, here is the sha256 checksum of the 5d200204.fir file the above process created:

424b1990b52af12748f9675c2085e58949ac3fc682b1e61717c2218f89cdd149  5d200204.fir

The new firmware adds a bunch more features to the video feature set of the camera, including various frame rate options. For more info see canon’s site.

Encoding 5d mII video

The 5d mark II shoots rather high quality videos. Using ffmpeg it’s possible to view the specifics of the original file during encoding. For my example file I’ve been seeing the following:

  Duration: 00:02:24.50, start: 0.000000, bitrate: 41087 kb/s
    Stream #0.0(eng): Video: h264, yuv420p, 1920x1088, 39674 kb/s, 30 fps, 30 tbr, 3k tbn, 6k tbc
    Stream #0.1(eng): Audio: pcm_s16le, 44100 Hz, 2 channels, s16, 1411 kb/s

This shows the two parts of the file’s stream (#0.0 and #0.1) are divided between video (#0.0) and audio (#0.1). The video part is encoded using the h264 codec, which is a nice high quality / highly compressed video codec with a bit rate of 39,674Kb/s, so close to 40Mb/s. The audio is also a fairly high quality capture, using the uncompressed, 16bit (bit depth), little-endian, PCM format at a fairly high rate of 1,411Kb/s.

Recently I wanted to post up some videos I took of Niilo Smeds playing and the above clip is 2:24 seconds worth of video, which consumes 708Mb (yes ~one CD-R worth of data). This makes uploading to a free Vimeo account rather difficult due to the 500Mb per week limit.

I’ve turned to recoding the videos to a lower bit rate so that I can post and have the knowledge that I still have the original 1080p version to play with more later.

I found that mp4 was one of the few container formats that I could easily convert to from videos using the older firmware (that I’m currently upgrading). The older firmware captured at exactly 30 frames per second (not 29.97/sec), which seems to cause havoc for other container formats (like mpeg), but mp4 seems to be quite happy. I’m interested to see if the newer firmware, which allows some choice in frame rate will allow for converting to other containers or if the problem is actually localized to the open source libraries that are used for conversion.

For audio I would have liked to use the original audio (I am recording a guitar player after all) and there is the -acodec copy option to ffmpeg, but again the mp4 container format didn’t like the raw pcm audio. Instead I chose MP2, which didn’t have as much sound degradation as MP3.

After much playing with options my simple script to encode videos from the 5dMII for Vimeo is as follows:

#!/bin/bash

# Copyright Elliott Johnson 2010
# Distributed under the GPL-3.0
#    http://www.gnu.org/licenses/gpl.html

# For encoding video from my 5D into a high quality,
# small file format suitable for vimeo.

THREADS=7 # tune this to your number of execution units - 1

function help() {
  echo "$0: infilename outfilename"
}

INPUTFILE=$1
OUTPUTFILE=$2

ffmpeg -i $INPUTFILE \
       -b 14515.5kb \
       -s 960x544 \
       -r 30 \
       -threads $THREADS \
       -acodec mp2 \
       -ab 256kb \
       -ac 2 \
       $OUTPUTFILE