Converting composite tiff files to pdf

I wrote a few posts ago about the joys of converting a bunch of images into a pdf with a single command. Well I’ve been working with a set of tiff files files that Andrew Whitlock gave me for the oacdp project. The tiff files are a composite tiff’s meaning that they themselves are made up of a collection of tiff files.

My goal is to extract the tiff files, convert them to png, and then create a pdf from those pngs. I chose imagemagick because it’s been a great tool and is straightforward to use. I found that wasn’t quite the case for these files.

Firstly the tiff file needs to be broken into it’s component images

convert composite.tiff image-%03d.png

This produces a set of files that are named “image-000.png” and up.

Secondly the tiff files need to be assembled into a pdf file

convert images-*.png final.pdf

As discussed in my previous post, it’s not that easy because there is the issue of imagemagick’s poor memory handling when creating a pdf, so I used the pdfjoin technique from the previous post

pdfjoin --outfile final.pdf images-*.png

All is good.

Now enter an enormous tiff file, with 318 sub images that also brings it’s own set of issues with imagemagick. Often attempting to extract the sub images was causing my machine to completely freeze and kernel panic, which may be more of a kernel issue than imagemagick’s problem, but it was the immensely huge memory usage that brought it there, so I decided to learn another lesson from my previous post and I broke the job into many small calls to imagemagick rather than one large one.

Using my language of choice, I wrote a bit of lisp to do the decomposition of the tiff file, conversion to png, and assemblage into a pdf. Here it is:


;;;; Copyright 2010 Elliott Johnson
;;;; Distributed under the GPL - 3.0
;;;; http://www.gnu.org/licenses/gpl.html

(defpackage :net.elliottjohnson.lisp.convert
  (:use :cl
        :cl-user
        #+sbcl :sb-ext)
  (:nicknames :convert))
(in-package :net.elliottjohnson.lisp.convert)

(defvar *convert-binary*
  "/usr/bin/convert"
  "An acceptable name for the convert binary that's installed on your system.")
(defvar *identify-binary*
  "/usr/bin/identify"
  "An acceptable name for the identify binary that's installed on your system.")
(defvar *pdfjoin-binary*
  "/usr/bin/pdfjoin"
  "An acceptable name for the pdfjoin binary that's installed on your system.")

(defun acceptable-exit-code (process)
  (when (= 0 (process-exit-code process))
    t))

#+sbcl
(defun image-count (multi-image-filename)
  (let ((process (run-program
                  *identify-binary*
                  (list multi-image-filename)
                  :output :stream
                  :error nil)))
    (if (acceptable-exit-code process)
        (let ((file-count 0)
              (stream (process-output process)))
          (if (input-stream-p stream)
              (progn
                (loop for line = (read-line stream nil nil)
                   while line
                   do (incf file-count))
                (1- file-count))
              (error "Bad Outputstream: \"~S ~S\" return ~S~%"
                     *identify-binary*
                     multi-image-filename
                     process)))

        (error "Failed to execute: \"~S ~S\" return ~S~%"
               *identify-binary*
               multi-image-filename
               process))))

#+sbcl
(defun simple-convert (source destination &optional args)
  (let ((process (run-program *convert-binary*
                              (if args
                                  (if (listp args)
                                      `(,source ,@args ,destination)
                                      (list source args destination))
                                  (list source destination)))))
    (unless (acceptable-exit-code process)
      (error "Failed to execute: \"~S ~S ~S\" returned ~S~%"
             *convert-binary*
             source
             destination
             process))))

#+sbcl
(defun join-pdf (pdf-file-pattern dest-pdf-file)
  (let ((process (run-program *pdfjoin-binary*
                              (list "--fitpaper" "false"
                                    "--outfile" dest-pdf-file
                                    pdf-file-pattern))))
    (unless (acceptable-exit-code process)
      (error "Failed to join pdf: \"~A --outfile ~A ~A\" returned ~S~%"
             *pdfjoin-binary*
             dest-pdf-file
             pdf-file-pattern
             process))))

(defun name-source-file (multi-image-filename current-count)
  (format nil "~A\[~D-~D\]" multi-image-filename current-count current-count))

(defun name-dest-file (dest-dir dest-prefix current-count dest-suffix)
  (format nil
          "~A/~A~3,'0D.~A"
          dest-dir
          dest-prefix
          current-count
          dest-suffix))

(defun convert-suffix-to-pdf (source-file)
  (format nil
          "~A/~A.pdf"
          (directory-namestring source-file)
          (pathname-name source-file)
          '("-density" "100%")))

(defun name-dest-files (dest-dir dest-prefix dest-suffix)
  (format nil
          "~A/~A*.~A"
          dest-dir
          dest-prefix
          dest-suffix))

(defun name-dest-pdffile (dest-dir dest-pdf-name)
  (format nil
          "~A/~A"
          dest-dir
          dest-pdf-name))

(defun create-pdf-from-images (dest-dir
                               dest-prefix
                               dest-suffix
                               dest-pdf-name)
  (let ((source-files (directory (name-dest-files dest-dir
                                                  dest-prefix
                                                  dest-suffix))))
    (loop for file in source-files
         do (simple-convert (format nil "~A" file)
                            (convert-suffix-to-pdf file)
                            '("-density" "100%"))))
  (let ((pdf-file-pattern (name-dest-files dest-dir
                                           dest-prefix
                                           "pdf")))
    (join-pdf pdf-file-pattern
              (name-dest-pdffile dest-dir dest-pdf-name))))

(defun convert-multi-image-file (multi-image-file
                                 dest-dir
                                 dest-prefix
                                 dest-suffix
                                 dest-pdf-name)
  (let ((multi-image-count (image-count multi-image-file)))
    (loop for i from 0 to multi-image-count
         do (simple-convert (name-source-file multi-image-file i)
                            (name-dest-file dest-dir
                                            dest-prefix
                                            i
                                            dest-suffix))))
  (create-pdf-from-images dest-dir dest-prefix dest-suffix dest-pdf-name))

Currently it’s only extended to run on sbcl, but it would be trivial to have it work other places. The function to use is the last one and it’s used something like this:

(convert-multi-image-file "/path/to/my/file.tiff" 
                          "/my/dir/" 
                          "image-" 
                          "png" 
                          "final.pdf")

The end result is a pdf file and the png files needed for the project. The code makes use of the square bracket syntax of imagemagick to specify a particular file in a multi image file (such as a video file, pdf, or in this case a tiff). The Imagemagick FAQ is what eventually lead me to the technique.

This method can render everything out with little load in less than one minute for what used to rarely complete in over 45 minutes with massive loads (above 9).

Updating the Canon 5d MII firmware under linux

I just grabbed the 2.0.4 version of the Canon 5d Mark II firmware and these are the steps it took to install it using GNU/Linux.

  1. Fetch the Mac OS X dmg version of the firmware
  2. Install dmg2img
  3. Convert the DMG to an IMG file
    dmg2img eos5d2204.dmg
  4. mount it temporarily
    mount -t hfs -o loop eos5d2204.dmg /mnt/tmp
  5. insert a CF card and note where it’s mounted
  6. Copy it to the card:
    cp /mnt/tmp/5d200204.fir /path/to/CF/card

See linux is easy ;)

Actually, would it be a pain for Canon to just release the firmware in a zip file? I’d think that would work for everybody, but I’m guessing that they are relying on a possible built in checksumming in a DMG and their exe files? I’m not sure about the checksum, but that would make sense.

For those curious, here is the sha256 checksum of the 5d200204.fir file the above process created:

424b1990b52af12748f9675c2085e58949ac3fc682b1e61717c2218f89cdd149  5d200204.fir

The new firmware adds a bunch more features to the video feature set of the camera, including various frame rate options. For more info see canon’s site.

Encoding 5d mII video

The 5d mark II shoots rather high quality videos. Using ffmpeg it’s possible to view the specifics of the original file during encoding. For my example file I’ve been seeing the following:

  Duration: 00:02:24.50, start: 0.000000, bitrate: 41087 kb/s
    Stream #0.0(eng): Video: h264, yuv420p, 1920x1088, 39674 kb/s, 30 fps, 30 tbr, 3k tbn, 6k tbc
    Stream #0.1(eng): Audio: pcm_s16le, 44100 Hz, 2 channels, s16, 1411 kb/s

This shows the two parts of the file’s stream (#0.0 and #0.1) are divided between video (#0.0) and audio (#0.1). The video part is encoded using the h264 codec, which is a nice high quality / highly compressed video codec with a bit rate of 39,674Kb/s, so close to 40Mb/s. The audio is also a fairly high quality capture, using the uncompressed, 16bit (bit depth), little-endian, PCM format at a fairly high rate of 1,411Kb/s.

Recently I wanted to post up some videos I took of Niilo Smeds playing and the above clip is 2:24 seconds worth of video, which consumes 708Mb (yes ~one CD-R worth of data). This makes uploading to a free Vimeo account rather difficult due to the 500Mb per week limit.

I’ve turned to recoding the videos to a lower bit rate so that I can post and have the knowledge that I still have the original 1080p version to play with more later.

I found that mp4 was one of the few container formats that I could easily convert to from videos using the older firmware (that I’m currently upgrading). The older firmware captured at exactly 30 frames per second (not 29.97/sec), which seems to cause havoc for other container formats (like mpeg), but mp4 seems to be quite happy. I’m interested to see if the newer firmware, which allows some choice in frame rate will allow for converting to other containers or if the problem is actually localized to the open source libraries that are used for conversion.

For audio I would have liked to use the original audio (I am recording a guitar player after all) and there is the -acodec copy option to ffmpeg, but again the mp4 container format didn’t like the raw pcm audio. Instead I chose MP2, which didn’t have as much sound degradation as MP3.

After much playing with options my simple script to encode videos from the 5dMII for Vimeo is as follows:

#!/bin/bash

# Copyright Elliott Johnson 2010
# Distributed under the GPL-3.0
#    http://www.gnu.org/licenses/gpl.html

# For encoding video from my 5D into a high quality,
# small file format suitable for vimeo.

THREADS=7 # tune this to your number of execution units - 1

function help() {
  echo "$0: infilename outfilename"
}

INPUTFILE=$1
OUTPUTFILE=$2

ffmpeg -i $INPUTFILE \
       -b 14515.5kb \
       -s 960x544 \
       -r 30 \
       -threads $THREADS \
       -acodec mp2 \
       -ab 256kb \
       -ac 2 \
       $OUTPUTFILE

The joys of pdf conversion

As part of the oacdp I convert a sections of manuals into pdfs.  I’ve been using imagemagick for a while to batch convert. To take any number of images and convert them to a single pdf it’s a simple one line command:

convert *.png -density 100% some.pdf

An annoying feature of imagemagick is that it pulls all the images (in the above case the png files) into memory, converts them in memory to a 16bit (per color channel) representation and then when it’s all assembled writes the content out to disk. Normally this isn’t too bad since I’ve got 6Gigs of memory and 12Gigs of swap and if it takes 3Gigs to process one document I can deal with it.

In working on a type1 manual, the 8th section is 317 images I found it impossible to deal with. I enlarged the amount of swap space, but the load on my system would rise above 15 and eventually the process or my machine would crash.

Andrew Whitlock, who was the previous maintainer of the oacdp confirmed that this happened to him in the past and that he had swapped over to a windows machine to use some application with a reasonable algorithm to convert to PDF. I on the other hand don’t have a windows install, so I began researching online.

Firstly in digging around I found an imagemagick based project called graphicsmagick that is written specifically for multi-core machines like I have. I tried this out, but it uses similar algorithms as imagemagick for processing PDF’s, so really wasn’t an answer, but was an improvement in processing times… it basically made things crash faster.

A little more searching revealed this thread, which describes converting each image to individual pdf’s and then assembling them using pdfjoin of the like. pdfjoin is fast, only taking a handful of seconds to process all 317 pages using pdflatex. The results are what I’d expect imagemagick could do.

It would be nice if imagemagick would get their pdf processing up to snuff, but at least for now I have a work around. Hopefully this can help some other people convering pdf’s using command line tools like imagemagick/graphicsmagick/pdfjoin.

Slowly building strength

My practice mute arrived today with a few other “ointments” for the horn.  I bought a nice Denis Wick practice mute for my roommates sanity sake and so I moistened the mute’s cork with my breath and put it in.  After a few notes it was obvious that I’m not even at the level to need the mute yet.  Slow and quiet long tones are still where I am and it’s really hard to focus on the quality of the sound when it’s muted.

The biggest thing that the long tones are helping with is relaxation.  I’ve noticed periodic muscle twitches in my right arm and face.  In the back of my mind I got the idea that it’s muscle memory trying to do what it was capable of 11+ years ago.  I had to just stop and relax a few times with deep breaths.  There is this in-tenseness I’ve acquired from the last few years of typing that I need to learn to get over.

I’m still working on buzzing with the mouth piece on and away from my face.  I can do it only for a few moments before fatigue sets in.  Before when the need of practicing complex pieces over weighed the desire to retrain my self to not use pressure I would just give up and go on.  It’s really nice to go back to the attitude I had when I first began playing (wow 20 years ago) and just focus on playing without expectations other than my own.

practicing trombone

For the first time in about 11 years I decided to start practicing trombone again. At one point I was really serious about the trombone, practicing for a few hours a day… everyday. It’s really relaxing when done properly, almost a meditation that you can feel throughout the core of your body. I felt the incredible warmth that comes from the combination of deep breathing, vibration, and focus on sound. I’m really happy about adding this back into my life even just for daily long tones.

Eventually I’d it would be nice to get to the point of playing duets again and I found out a few weeks ago a friend of mine, Alexander, is a classically trained tuba player, so maybe that’s something to look forward to once I’m back in shape.

Speaking of getting into shape, I took it slowly today. Buzzing with and without a mouthpiece to try and overcome the common crutch of too much lip pressure. I can feel how weak my lips are, they kind of waver periodically and hopefully with time my abdomin and lip muscles will strengthen, relax, and create a steady tone.

2010

It’s pretty obvious that I’ve fallen behind in both progress on my bus and on this blog. It seems that in part I haven’t been progressing because I end up visiting with friends and family when I travel to work on it and also in part because it’s cold in that garage. I’ve thought about a space heater, but there are certain dangers that come with using one in a confined space with various chemicals.

So far since September I’ve:

  • pulled the engine and wrapped it up until I deal with the oil leaks
  • pulled the gas tank, drained it, and have it stored with a POR15 refurb kit for when the time comes
  • got a new wiring harness from Bob Novak at wiring works.. yet to install it
  • Pulled out the old hardlines and installed a whole new kit from wolfsburg west
  • Installed new rear axle seal kits on both rear axles
  • refilled the reduction gear boxes with 0.25L of 90w gear oil
  • Drained the transmissions gear oil and cleaned out a medium amount of gunk (no metal chunks) from the drain plugs
  • POR15’d the drum brake backing plates
  • media blasted the engine compartments rusty areas (battery tray, above the driver side rear wheel well, and the slot for the engine seal) and painted with some silver Eastwood rust encapsulator spray paint. If I had a larger air compressor I’d have done the whole thing, so I focused on the really bad areas. Everything is solid so far.
  • got a set of notched rear cargo door rods from a 56 bus off of http://thesamba.com for cheap
  • degreased various areas under the car especially around the transmission and reduction gear boxes, which were totally caked with a thick gear oil and dirt mixture.

So looking back I have gotten quite a bit done, just not what I expected to do over the last 11 weekends.

Sept 4th 2009

Madalynn and I drove down to Fresno this time.  She dropped me off and I got some visiting time in before checking out what the mail had brought.

My main mission for this weekend was to drop the engine and prep for removing tar.  On the way Madalynn and I picked up some xylene to help removing it.  Quite a few boxes arrived.  One big one from Wolfsburg West with a complete stock exhaust setup.  Another flat box, which was the decklid I found.  It’s in pretty bad shape.  A couple of drill holes that should be easy to fill, but a medium sided dull dent, which is probably a bit harder to get out.  A tach/dwell from a store on Amazon and a small box from Wolfgang Int with slave cylinders and reduction box gaskets.

So first things first I needed to finish testing from two weeks before when a bad sound started coming from the engine.  I removed the engine tin and fired it up.  There still was the sound.  My dad and I tried a few different variables and found that it’s loudest at low rpms and sort of evens out as it revs up.

Since it isn’t a simple problem I’ll need to have it checked out and after a little food I decided to pull off the old muffler and prep for tomorrow.  The bus came with a peashooter bug muffler and I ordered the entire bus setup to replace it.  Taking off the peashooter pipes it was obvious that the engine was running rich by the thick layer of black soot inside.  The muffler itself was pretty rusty and I’d like to find a good paint to ensure the new parts I bought will hold up.  In unbolting the manifold’s passenger side top bolts one twisted apart like butter.  Luckily the bolt snapped off in the old muffler instead of the manifold, so it wasn’t too big of a deal.

Once the muffler was off it was cool to look at the push rod tubes more directly.  The rear most passenger side tube is actually patched by the previous owner.  The patch is a hunk of rubber that is held on by a hose clamp.  A bunch of oil had sprayed every were and it’s been slowly leaking.since it’s at my Dad’s house.  Pretty amasing that he ran it that way.

Akismet

After having this blog up for a month or two it’s been around long enough for bots to locate the comment sections.  When I get legitimate comments I get a notification on my phone, so it’s been pretty noisy with spam emails.  Funny thing that while watching the President on TV being spammed during a town hall by some wing nut I happened to get three in a row.  It’s interesting the similarity of the tactics of spammers and the ultra-right extremists.  BTW – Did you know that the President is a Kenyan and needs your help in depositing $1billion into your bank account?

I noticed that Akismet, one of WordPress’ default plugins was designed to control spam.  Fully setting it up involved signing up for a wordpress account and copy and pasting in the API identifier that is associated with my account.  Pretty easy.  Since last night it’s already classified about 10 bogus comments and saved me from being bothered.  Well worth the 20 seconds of setup time.

Thanks Akismet and WordPress devs!