Document scanning: From images to PDF
Here I will attempt to describe the method used to convert high quality magazine scans to a final PDF document. I did a lot of experimenting through December 2007 and into January 2008 with different file types like PNG, GIF, BMP. Most types made the PDF much larger than it should have except JPG and TIFF. I finally arrived at a method that may seem complicated but actually isn't and accomplished the main goals of this project which are:
- Retains a high readability no matter what the page type
- The final PDF must be as small as possible
- Speedy printing
Software tools used:
- IrfanView for image deskewing
- Adobe Photoshop for image cropping, adjusting, resizing, mode conversions and final image saving
- Image to PDF 2.1 for the final PDF packaging
Scanning of the original magazine
I've tried both the Windows Scanner Wizard and the HP Scanjet tools. They both do the same job, but the Windows wizard is easier so I've stuck with that. All pages were scanned at 300 DPI and saved as a TIFF LZW compressed file, with color pages scanned as color images and black and white pages (greyscale) scanned as greyscale images. An entire magazine scanned at this point can take over 300Mb of disk space. This size will be reduced to about 200Mb once the images are cropped, cleaned up and resaved.
I thought Adobe Photoshop CS 2.0 would have a good de-skewing tool but it doesn't. Instead you have to manually rotate the document to correct the skew. However, Photoshop's rotate tool doesn work well. It seems to break the document up into regions so that a rotated image has visible stair-stepping in the straight lines, and even in the text. I found that the rotate tool in IfranView works well, but it is still a manually process. Load each file individually into IrfanView and rotate to correct the skew. Most Transactor pages have a square border so you have good reference points to check if the skew is removed. The only down-side is IrfanView can only work on one file as a time so you will have to work on each image separately. Re-save each image as TIFF LZW compressed.
Re-load the image into Photoshop. For white text or ad pages, crop off the white borders to the innermost margin you can. For pages with full-page images like the covers or color ads, crop off as little as possible to retain as much of the original image.
Adjust Canvas or Image Size
For the images that were minimally cropped (covers), adjust the image size (not the canvas size) back to 8.5x11. This will likely increase the size of the image non-linearly but that is hard to avoid. For black and white pages, re-adjust the canvas size back to 8.5x11 as this will bring the white space back on the outer margins making the page look original again.
I suspect this is a step that most people won't have to do. My scanner doesn't produce good black and white but the black turns out dark grey so I adjust the midpoint of the levels of every page, except the color images, to ~0.70 (down from 1.00). For pages with photographs on them, I will select all the photos and then invert the selection so that the level adjustment won't affect them.
Change Image Mode
For all non-color pages, change the mode of the image to greyscale. You will be asked about discarding the color information but this is OK as there isn't any color on the page.
Save the Final Archive Image
Re-save the image as TIFF LZW compressed. This is now your final image, to be stored and archived.
Change Archive Image to JPG or Bitmap TIFF
First, create a new folder for the files produced in this section to be saved in. Otherwise you can re-save them with a slightly modifed name to differentiate them. I add a "_" underscore character after the filename and before the extension. For color images, I simply save them from within Photoshop as a JPG using a high 1 or low 2 scale compression (out of 10, highly compressed). For all text pages, I do a mode change to bitmap and save them as TIFF LZW compressed. If the compression factor chosen produces color images with visible color distortion, raise the level up a bit.
For images with greyscale components, I will select all the greyscale components and cut them out, then do a mode change to bitmap, paste in the cut parts & reposition them to where they were, then save the image to TIFF LZW compressed. The reason I cut the photos first is due to the way Photoshop converts pages with greyscale to bitmap. Most of the greyscale component is lost in a straight mode conversion, but cutting them before the conversion, and repasting them back forces Photoshop to preserve the greyscale much better. Strange but true!
Convert All Images to PDF
Load up Image to PDF and drag and drop all the converted images onto it. Be sure to put them in the correct order (front cover, inside cover, pages, inside rear cover, rear cover). Now click on the Make PDF button to create the PDF and view the PDF to see how it looks. Try printing it as well, if possible.
The above procedure may seem onerous and overly complicated but the results speak for the process. Once you streamline it by loading up batches of files in Photoshop things go much faster. The slowest part by far is the de-skewing using IrfanView. I wish I could find a utility that did this part automatically (batch would be wonderful!) as it would make the conversion go twice as fast.
Email the author: Peter Schepers | Last updated: May 9, 2008