I have been recently working on a project that requires for us to take a pdf file (which is basically an image) and convert it into a file that the Kindle can use (as in search text, highlight and underline text). A MOBI file is ideal for a Kindle (a PRC- palm file works as well). I will share the workflow process that we developed (there is no easy or right way to convert these files) and what software we used.

1. OCR your PDF. You will need to use an Optical Character Recognition tool. If you do not do this step, you can load your PDF as an image onto your Kindle DX, but it will be an image. You can not resize the text, search the terms in the text, or highlight and underline the text and make notes. Adobe PDF Reader has an OCR feature built into it, but it tripped on footnotes, page headers and poorly scanned documents in our case. If you are OCRing your document in another software like ReadIris or Abby Reader, make sure you crop out the page numbers, book headers and footers. Put all foot notes at the end of the file. If you are scanning an old text, make sure you scan it as straight and clean as possible. It was save you editing work in the long run. ReadIris worked every well in our work flow (text without image was best example it worked well).

2. Save the document as HTML or RTF. RTF will save images and will keep the formatting of the PDF document. One advantage of saving it as HTML is that it is easier to format the file and to hyperlink footnotes using an HTML editor (like Dreamweaver) versus using the bookmarks function in Word for an RTF. Also, HTML is easier to format spaces in between paragraphs of text using the <p> tag. Also remember when saving images for the Kindle, make sure the images are less than 600px and have to be a really small file size.

3. Edit the document. Here you will have to use a word processor (like Word) or an HTML editor (like Dreamweaver) to clean up any text that was misspelled or to place images in the document or hyperlink footnotes or links mentioned in the article. We used Dreamweaver and Microsoft word for this step, but any program will do for the editing process. Remember to put all footnotes at the end of your document. It’s cleaner for the conversion process to a MOBI file or Kindle. This step will take the most time.

4. Convert the file into a MOBI file or a PRC file. (use MOBI whenever possible). I used MobiPocket Creator (publisher edition) and Calibre to convert the RTF and HTML files. Calibre handled images better, but footnote links where handled better in Mobipocket Creator (or it could be the file type HTML vs. an RTF that converted the footnotes better than the other). In the converter you can fill out the metadata like title and author, image of book cover, etc. Both were very easy to use and free (but not Open Source).

5. Upload to a webserver and test on your Kindle DX by downloading from the web. Make sure you have a good wireless connection for the step or else the Kindle will freeze up on you/

Depending on the type of PDF, how clean the text is, and if it has images, will depend on the amount of time it takes to create a mobi file. The editing process can be very time consuming (the most time consuming part) and be painfully boring. Amazon has an option where you can email them the PDF and they can convert it for you (at a cost). I hope this helps Kindle users out there with their PDFs.

Links:

http://www.abbyy.com/ Abby Reader
http://download.cnet.com/Readiris-Pro/3000-2079_4-10216918.html Read Iris
http://www.mobipocket.com/en/downloadSoft/ProductDetailsCreator.asp Mobipocket Creator (publisher edition)
http://calibre.kovidgoyal.net/download Calibre

Advertisements