{"id":82,"date":"2010-02-08T20:05:46","date_gmt":"2010-02-09T04:05:46","guid":{"rendered":"http:\/\/elliottjohnson.net\/blog\/?p=82"},"modified":"2010-02-08T20:12:07","modified_gmt":"2010-02-09T04:12:07","slug":"the-joys-of-pdf-conversion","status":"publish","type":"post","link":"http:\/\/elliottjohnson.net\/blog\/the-joys-of-pdf-conversion\/","title":{"rendered":"The joys of pdf conversion"},"content":{"rendered":"<p>As part of the oacdp I convert a sections of manuals into pdfs.\u00a0 I&#8217;ve been using <a href=\"http:\/\/www.imagemagick.org\/\">imagemagick<\/a> for a while to batch convert.  To take any number of images and convert them to a single pdf it&#8217;s a simple one line command:<\/p>\n<p><code>convert *.png -density 100% some.pdf<\/code><\/p>\n<p>An annoying feature of imagemagick is that it pulls all the images (in the above case the png files) into memory, converts them in memory to a 16bit (per color channel) representation and then when it&#8217;s all assembled writes the content out to disk.  Normally this isn&#8217;t too bad since I&#8217;ve got 6Gigs of memory and 12Gigs of swap and if it takes 3Gigs to process one document I can deal with it.<\/p>\n<p>In working on a <a href=\"http:\/\/oacdp.org\/type1part.html\">type1 manual<\/a>, the 8th section is 317 images I found it impossible to deal with.  I enlarged the amount of swap space, but the load on my system would rise above 15 and eventually the process or my machine would crash.<\/p>\n<p><a href=\"http:\/\/awhitlock.net\/\">Andrew Whitlock<\/a>, who was the previous maintainer of the oacdp confirmed that this happened to him in the past and that he had swapped over to a windows machine to use some application with a reasonable algorithm to convert to PDF.  I on the other hand don&#8217;t have a windows install, so I began researching online.<\/p>\n<p>Firstly in digging around I found an imagemagick based project called <a href=\"http:\/\/www.graphicsmagick.org\">graphicsmagick<\/a> that is written specifically for multi-core machines like I have.  I tried this out, but it uses similar algorithms as imagemagick for processing PDF&#8217;s, so really wasn&#8217;t an answer, but was an improvement in processing times&#8230; it basically made things crash faster.<\/p>\n<p>A little more searching revealed <a href=\"http:\/\/www.imagemagick.org\/discourse-server\/viewtopic.php?t=13126\">this thread<\/a>, which describes converting each image to individual pdf&#8217;s and then assembling them using <a href=\"http:\/\/www2.warwick.ac.uk\/fac\/sci\/statistics\/staff\/academic\/firth\/software\/pdfjam\">pdfjoin<\/a> of the like.  pdfjoin is fast, only taking a handful of seconds to process all 317 pages using pdflatex.  The results are what I&#8217;d expect imagemagick could do.<\/p>\n<p>It would be nice if imagemagick would get their pdf processing up to snuff, but at least for now I have a work around.  Hopefully this can help some other people convering pdf&#8217;s using command line tools like imagemagick\/graphicsmagick\/pdfjoin.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As part of the oacdp I convert a sections of manuals into pdfs.\u00a0 I&#8217;ve been using imagemagick for a while to batch convert. To take any number of images and convert them to a single pdf it&#8217;s a simple one &hellip; <a href=\"http:\/\/elliottjohnson.net\/blog\/the-joys-of-pdf-conversion\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"footnotes":""},"categories":[8],"tags":[],"class_list":["post-82","post","type-post","status-publish","format-standard","hentry","category-tech"],"_links":{"self":[{"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/posts\/82","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/comments?post=82"}],"version-history":[{"count":4,"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/posts\/82\/revisions"}],"predecessor-version":[{"id":86,"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/posts\/82\/revisions\/86"}],"wp:attachment":[{"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/media?parent=82"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/categories?post=82"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/elliottjohnson.net\/blog\/wp-json\/wp\/v2\/tags?post=82"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}