{"id":100,"date":"2010-03-23T22:12:28","date_gmt":"2010-03-24T05:12:28","guid":{"rendered":"http:\/\/elliottjohnson.net\/blog\/?p=100"},"modified":"2010-04-16T00:00:50","modified_gmt":"2010-04-16T07:00:50","slug":"converting-composite-tiff-files-to-pdf","status":"publish","type":"post","link":"http:\/\/elliottjohnson.net\/blog\/converting-composite-tiff-files-to-pdf\/","title":{"rendered":"Converting composite tiff files to pdf"},"content":{"rendered":"<p>I wrote a few posts ago about the joys of converting a bunch of images into a pdf with a single command.  Well I&#8217;ve been working with a set of tiff files files that <a href=\"http:\/\/awhilock.net\">Andrew Whitlock<\/a> gave me for the <a href=\"http:\/\/oacdp.org\">oacdp project<\/a>.  The tiff files are a composite tiff&#8217;s meaning that they themselves are made up of a collection of tiff files.<\/p>\n<p>My goal is to extract the tiff files, convert them to png, and then create a pdf from those pngs.  I chose imagemagick because it&#8217;s been a great tool and is straightforward to use.  I found that wasn&#8217;t quite the case for these files.<\/p>\n<p>Firstly the tiff file needs to be broken into it&#8217;s component images <\/p>\n<pre>convert composite.tiff image-%03d.png<\/pre>\n<p>This produces a set of files that are named &#8220;image-000.png&#8221; and up.  <\/p>\n<p>Secondly the tiff files need to be assembled into a pdf file <\/p>\n<pre>convert images-*.png final.pdf<\/pre>\n<p>As discussed in my previous <a href=\"http:\/\/elliottjohnson.net\/blog\/?p=82\">post<\/a>, it&#8217;s not that easy because there is the issue of imagemagick&#8217;s poor memory handling when creating a pdf, so I used the pdfjoin technique from the previous post <\/p>\n<pre>pdfjoin --outfile final.pdf images-*.png<\/pre>\n<p>All is good.<\/p>\n<p>Now enter an enormous tiff file, with 318 sub images that also brings it&#8217;s own set of issues with imagemagick.  Often attempting to extract the sub images was causing my machine to completely freeze and kernel panic, which may be more of a kernel issue than imagemagick&#8217;s problem, but it was the immensely huge memory usage that brought it there, so I decided to learn another lesson from my previous post and I broke the job into many small calls to imagemagick rather than one large one.<\/p>\n<p>Using my language of choice, I wrote a bit of lisp to do the decomposition of the tiff file, conversion to png, and assemblage into a pdf.  Here it is:<\/p>\n<pre>\r\n\r\n;;;; Copyright 2010 Elliott Johnson\r\n;;;; Distributed under the GPL - 3.0\r\n;;;; http:\/\/www.gnu.org\/licenses\/gpl.html\r\n\r\n(defpackage :net.elliottjohnson.lisp.convert\r\n  (:use :cl\r\n        :cl-user\r\n        #+sbcl :sb-ext)\r\n  (:nicknames :convert))\r\n(in-package :net.elliottjohnson.lisp.convert)\r\n\r\n(defvar *convert-binary*\r\n  \"\/usr\/bin\/convert\"\r\n  \"An acceptable name for the convert binary that's installed on your system.\")\r\n(defvar *identify-binary*\r\n  \"\/usr\/bin\/identify\"\r\n  \"An acceptable name for the identify binary that's installed on your system.\")\r\n(defvar *pdfjoin-binary*\r\n  \"\/usr\/bin\/pdfjoin\"\r\n  \"An acceptable name for the pdfjoin binary that's installed on your system.\")\r\n\r\n(defun acceptable-exit-code (process)\r\n  (when (= 0 (process-exit-code process))\r\n    t))\r\n\r\n#+sbcl\r\n(defun image-count (multi-image-filename)\r\n  (let ((process (run-program\r\n                  *identify-binary*\r\n                  (list multi-image-filename)\r\n                  :output :stream\r\n                  :error nil)))\r\n    (if (acceptable-exit-code process)\r\n        (let ((file-count 0)\r\n              (stream (process-output process)))\r\n          (if (input-stream-p stream)\r\n              (progn\r\n                (loop for line = (read-line stream nil nil)\r\n                   while line\r\n                   do (incf file-count))\r\n                (1- file-count))\r\n              (error \"Bad Outputstream: \\\"~S ~S\\\" return ~S~%\"\r\n                     *identify-binary*\r\n                     multi-image-filename\r\n                     process)))\r\n\r\n        (error \"Failed to execute: \\\"~S ~S\\\" return ~S~%\"\r\n               *identify-binary*\r\n               multi-image-filename\r\n               process))))\r\n\r\n#+sbcl\r\n(defun simple-convert (source destination &optional args)\r\n  (let ((process (run-program *convert-binary*\r\n                              (if args\r\n                                  (if (listp args)\r\n                                      `(,source ,@args ,destination)\r\n                                      (list source args destination))\r\n                                  (list source destination)))))\r\n    (unless (acceptable-exit-code process)\r\n      (error \"Failed to execute: \\\"~S ~S ~S\\\" returned ~S~%\"\r\n             *convert-binary*\r\n             source\r\n             destination\r\n             process))))\r\n\r\n#+sbcl\r\n(defun join-pdf (pdf-file-pattern dest-pdf-file)\r\n  (let ((process (run-program *pdfjoin-binary*\r\n                              (list \"--fitpaper\" \"false\"\r\n                                    \"--outfile\" dest-pdf-file\r\n                                    pdf-file-pattern))))\r\n    (unless (acceptable-exit-code process)\r\n      (error \"Failed to join pdf: \\\"~A --outfile ~A ~A\\\" returned ~S~%\"\r\n             *pdfjoin-binary*\r\n             dest-pdf-file\r\n             pdf-file-pattern\r\n             process))))\r\n\r\n(defun name-source-file (multi-image-filename current-count)\r\n  (format nil \"~A\\[~D-~D\\]\" multi-image-filename current-count current-count))\r\n\r\n(defun name-dest-file (dest-dir dest-prefix current-count dest-suffix)\r\n  (format nil\r\n          \"~A\/~A~3,'0D.~A\"\r\n          dest-dir\r\n          dest-prefix\r\n          current-count\r\n          dest-suffix))\r\n\r\n(defun convert-suffix-to-pdf (source-file)\r\n  (format nil\r\n          \"~A\/~A.pdf\"\r\n          (directory-namestring source-file)\r\n          (pathname-name source-file)\r\n          '(\"-density\" \"100%\")))\r\n\r\n(defun name-dest-files (dest-dir dest-prefix dest-suffix)\r\n  (format nil\r\n          \"~A\/~A*.~A\"\r\n          dest-dir\r\n          dest-prefix\r\n          dest-suffix))\r\n\r\n(defun name-dest-pdffile (dest-dir dest-pdf-name)\r\n  (format nil\r\n          \"~A\/~A\"\r\n          dest-dir\r\n          dest-pdf-name))\r\n\r\n(defun create-pdf-from-images (dest-dir\r\n                               dest-prefix\r\n                               dest-suffix\r\n                               dest-pdf-name)\r\n  (let ((source-files (directory (name-dest-files dest-dir\r\n                                                  dest-prefix\r\n                                                  dest-suffix))))\r\n    (loop for file in source-files\r\n         do (simple-convert (format nil \"~A\" file)\r\n                            (convert-suffix-to-pdf file)\r\n                            '(\"-density\" \"100%\"))))\r\n  (let ((pdf-file-pattern (name-dest-files dest-dir\r\n                                           dest-prefix\r\n                                           \"pdf\")))\r\n    (join-pdf pdf-file-pattern\r\n              (name-dest-pdffile dest-dir dest-pdf-name))))\r\n\r\n(defun convert-multi-image-file (multi-image-file\r\n                                 dest-dir\r\n                                 dest-prefix\r\n                                 dest-suffix\r\n                                 dest-pdf-name)\r\n  (let ((multi-image-count (image-count multi-image-file)))\r\n    (loop for i from 0 to multi-image-count\r\n         do (simple-convert (name-source-file multi-image-file i)\r\n                            (name-dest-file dest-dir\r\n                                            dest-prefix\r\n                                            i\r\n                                            dest-suffix))))\r\n  (create-pdf-from-images dest-dir dest-prefix dest-suffix dest-pdf-name))\r\n<\/pre>\n<p>Currently it&#8217;s only extended to run on sbcl, but it would be trivial to have it work other places.  The function to use is the last one and it&#8217;s used something like this:<\/p>\n<pre>\r\n(convert-multi-image-file \"\/path\/to\/my\/file.tiff\" \r\n                          \"\/my\/dir\/\" \r\n                          \"image-\" \r\n                          \"png\" \r\n                          \"final.pdf\")\r\n<\/pre>\n<p>The end result is a pdf file and the png files needed for the project.  The code makes use of the square bracket syntax of imagemagick to specify a particular file in a multi image file (such as a video file, pdf, or in this case a tiff).  The Imagemagick <a href=\"http:\/\/imagemagick.sourceforge.net\/http\/www\/FAQ.html#C32\">FAQ<\/a> is what eventually lead me to the technique.<\/p>\n<p>This method can render everything out with little load in less than one minute for what used to rarely complete in over 45 minutes with massive loads (above 9).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I wrote a few posts ago about the joys of converting a bunch of images into a pdf with a single command. Well I&#8217;ve been working with a set of tiff files files that Andrew Whitlock gave me for the &hellip; <a href=\"http:\/\/elliottjohnson.net\/blog\/converting-composite-tiff-files-to-pdf\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-100","post","type-post","status-publish","format-standard","hentry","category-mybus"],"_links":{"self":[{"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/posts\/100","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/comments?post=100"}],"version-history":[{"count":16,"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/posts\/100\/revisions"}],"predecessor-version":[{"id":126,"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/posts\/100\/revisions\/126"}],"wp:attachment":[{"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/media?parent=100"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/categories?post=100"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/tags?post=100"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}