PDF Tools Article (Overview of Tools)

Last Update: 22 May 2007

If you have used the Intenet for any length of time, you are probably familiar with the Portable Document Format (PDF). Files encoded in PDF usually carry the file extension *.pdf, at least for Microsoft Windows based platforms. PDF was invented by Adobe Systems over 15 years ago, and has captured a large market share for a method to share text, photos, and interactive content on the Internet.

PDFs are nice for sharing printed documents in an electronic format. However, editing or working with PDFs can get quite expensive if you want to use Adobe's solution: Adobe Acrobat. While the reader is free and is available for multiple platforms, Adobe Acrobat is  expensive at $449 list.

Thankfully, there are many free and low cost tools that allow you to perform many of the functions that you might need.

There are two pages to this section. This page provides an overview of the main PDF Tool features and functions that you might need.

The PDF Tool List page is simply a long listing of the websites and program names of many of the PDF tools I found on the Internet.

Outline of this Page:

Part I - PDF Creation and Conversion
1. PDF Creation
2. PDF Conversion and Extraction
3. PDF Image to Scannable PDF
Part II - Selective PDF Tools
4. PDF Security
5. Split and Merge
6. Document Properties
7. PDF Reader Speedup
8. Page Numbering
9. PDF Viewer Startup Options
10. Compression
11. Form Fill
12. Watermarks
13. Thumbnail
14. Attachments
15. PDF Version

Part I - PDF Creation and Conversion

1. Create PDF

As the creator of the PDF format, Adobe has developed Adobe Acrobat as the primier tool for the creation of PDF files. It allows you full editing capabilities to incorporate text, images, and multimedia into a document complete with hyperlinks, thumbnails, and table of contents listings, and more. Though full features, Acrobat is also expensive, starting at $449. One of the main advantages of using the Adobe product is that you will be up to date with the latest Adobe PDF format version, which is currently up to PDF Specification 1.7.

Other editors exist, some offering simple text editing into PDF, and others providing more full featured tools. A few of these tools are even provided free, though the feature set of these is quite less than that of Acrobat!

A more popular option is through the use of a specialized Microsoft Windows Printer Driver. This driver sits in your "Printers and Faxes" area of your start menu, and acts just like a printer. However, instead of printing to a physical printer, the printer driver saves the output in the format of a PDF file.

There are tons of PDF printer drivers out there (see the PDF Tools listing for a variety of them). They range from free products to versions over $100. Some of the free versions print footers at the bottom of every page advertising their product.

There are a lot of Creation Properties that can go into creating a PDF file. Some PDF creators are so simple they don't allow you to maniplate any of these properties. Others allow setting up all of these parameters. In general, most of the free programs are more limited in this area. I recommend experimenting with the programs that look interesting to you, to see how well they implement the following features.

Font Embedding: You may have written a document in your word processing program using the COMIC font, and then printed it to a PDF. When someone else who doesn't have COMIC on thier system opens the PDF, however, the COMIC font appears as some other font, ruining your planned presentation. To solve this problem, the PDF specification allows for the embeding of fonts in the document. This guarantees the reader will see the fonts you intended. The downside to embedding fonts is that it makes the document larger.

To compensate for this, PDF creation tools usually allow you some flexibility on specifying what fonts (if any) you want to embed. A couple of choices include:

Common Fonts: If most of your PDF readers will be on Microsoft Windows, you can choose not to embed common fonts that are found on most systems such as "Ariel" or "Times Roman". I believe that these fonts are common on most other operating systems as well, such as Linix and MacOS.
Embed Font Subsets: Some programs allow you to embed even a subset of a given Font. Most fonts include many characters that you never use, such as a smiley face or other extended characters. By specifying only to embed Font Subsets, only Font characters associated with your document are embedded.

Image Downsizing: Images can be very large and may greatly increase the size of your PDF file. Rather than storing the original image in the PDF file, some creation programs allow you to resample images so that a smaller version is used within the PDF. For example, a image resolution of 96 dots per inch (dpi) produces a PDF that will look good on the screen only, but the overall PDF has a much smaller size than if the original images were embedded. When this option is selected, images that have resolution greater than 96dpi will be reduced to this resolution.  A 300dpi is suitable for normal printing and is likely the best overall choice. If you plan on allowing high resolution printing, however, or if you want to ensure maximum quality if the PDF is converted to another format later on in its life, then you can turn off image compression and allow images up to 4000dpi.

Compression: You can also use a compression algorithm for reducing image size. ZIP is a lossless compression, and works best on images with repeating patterns. JPEG is a lossy compression method that achieves a greater compression rate than ZIP, but does reduce the quality of the images embedded in the PDF.

Security: PDF allows you to protect your document with several options. You can set a "user" password that must be entered before the PDF file can be opened for viewing. There is also an "owner" password that provides some control over how the document can be handled once opened. For example, you can toggle whether or not the user can print the document, print only a low-resolution version, modify it (such as move or delete pages), extract images and text, or allow annotations/form filling to be performed. The security level can be set to 40 or 128 bits (128 bits is harder to "crack" and is compatible with Adobe Acrobate 5.0 or greater).

Digital Signature: this allows a PDF to be digitally signed. You need to create or have a Digital ID to use this function.

Watermarks: Add a text watermark to the page. For example, you might want to place the word DRAFT in the background of a document that is sent out for review but not yet finalized, or you might want your company name shown as transparent text on every page. The watermark tools usually allow you to select the font, font size, rotate the text, and so forth.

Stamp: Tools with this feature allow you to insert an image as a background either on the first page only, or on every page.

Document Properties or Metadata: Metadata is data about data. That is, it describes what is in a file. When you select Document Properties in Adobe Acrobat Reader, it will display the Title, Author, Subject and Keywords related to the document. Some PDF creation tools allow you to set this information, otherwise it will use default data or be blank.

Hyperlink Support: PDFs created from scratch or converted from HTML documents can support the included hyperlinks. Some PDF creator programs will only create hypertext from full "http://" or "www." links. Others will allow full hyperlinks when documents are created from within Microsoft Word.

BookMarks: PDF files can display a sort of Table of Contents or Bookmark listing in a left panel to help you navigate more quickly through the document. Some PDF printer drivers also support this, though usually only from documents created in Microsoft Word.

Thumbnails: PDF documents can insert a "thumbnail" image of each page, which can also act as a table of contents listing. By viewing the thumbnails in a left column, you may be able to more quickly maneuver to the page you are looking for.

Merging: Some of the programs allow you to merge multiple PDF documents into an existing one. I often use this to merge multiple pages from web-research into a single PDF for later review.

Preview: Some programs simply write the file and then open it in the default reader. Others provide you a preview window so you can see what the document looks like before printing, and allows you to make additional modifications to the document (such as setting the Document Properties).

Others: There are other properties not listed here that are available in some programs (such as the ability to delete or move pages before printing to the PDF), but this listing should give you a good starting point for making your own comparison. By trying the various programs, you will get a feel for what features are more important to you. Some people don't care at all about the security features of PDF, for example, and thus don't need the "deluxe" versions that add this capability.

2. Converting from PDF to other formats; Extracting text and images

One of the advantages of PDF over a format such as HTML is that you can package an entire "book" of information into one file. This is easy to store, e-mail, and move as needed. The disadvantage of PDF, however, is that is it difficult to edit without specialized software, difficult to use just a single image or text portion in another file, and difficult even to view in a web-browser without the use of a specialized plug-in.

There are many tools to convert PDF to other formats such as HTML or Microsoft Word. These programs will vary in their ability to preserve original layout, and even the text conversion will be dependent on the quality of the text in the original PDF. When you try to convert PDFs or extract text, you may be surprised in some cases to find that your resulting document has no readable text. Some PDF files that appear to be text are really just images of the text, and will not convert without a special Optical Character Recognition (OCR) application (these are covered in the next section).

Some PDF tools are limited to simple text and/or image extraction. They will lose the formatting of the PDF, but sometimes all you are looking for is a raw extract anyway. Even Adobe's Acrobat Reader offers the ability to save just the text from most PDFs.

Browse the
PDF Tools List page for many tools that allow you to convert PDFs or extract text and images from them.

3. Image PDF to Searchable PDF

You've got that nice pristine PDF file and the text and the graphics look great. You search on a word in the file, however, and find that it doesn't find it. You then try to copy and paste a few words from the article, and when you click the mouse button you see all the text in a block become highlighted instead! The PDF isn't protected by security, so what is going on?

Some applications let you scan an article right into the PDF format. The problem with this method is that the text is usually stored as a graphic rather than as searchable text. That is, it is like you took a photograph of the text rather than retyping it into the PDF. Thus, the text exists only as an image, and not as individual letters and characters.

The solution for this is to use conversion software that include Optical Character Recognition (OCR) capabilities. This software can read an image PDF, perform OCR on any included text, and save the output as either a new searchable PDF or perhaps an HTML or Microsoft Word document.

There are several OCR programs that let you work from an image file and these often come with Image Scanners. However, working right from the original PDF is more convenient. A few programs that let you perform this function right from an image PDF file include:

ABBYY Software from Russia (http://www.abbyy.com). Abbyy PDF Transformer Pro V2 provides the ability to convert image PDF files to scannable PDF, as well as converting PDF files to other formats such as Microsoft Word or HTML. I tried the 15-day/50 page demo, and the conversions were of excellent quality. The product costs around $100, and if you have the need to convert a lot of image PDF documents, this program seems to be an excellent choice.

Nuance Communications, Inc. (formerly ScanSoft,
http://www.nuance.com). Nuance produces PDF Converter Professional. This program includes OCR capability to convert image PDFs to Word or other formats along with many other features. PDF Converter Professional has more features than Abbyy's PDF Transformer Pro, but the OCR capabilities didn't seem to work as well, at least in the few demo documents I tried. The rendering of the converted fonts was not as pleasing, and as is typical of many OCR programs, many characters are improperly converted (such as a close "al" combination being read as a "d", for example). I would recommend trying to get a demonstration package before purchasing, or see if you can get a money back guarantee so you can try it yourself before buying (about $100).

Investintech.com, Inc. (
http://www.investintech.com). The only other program I'm aware of with image PDF OCR capabilities is Able2Doc and Able2Extract programs from Investintech. These don't convert directly back into searchable PDF, but rather convert into Word, HTML, and other formats. You would have to have another PDF conversion program to get this extracted (and OCR'd) information back into PDF format. I did not try either of these programs in the OCR version. (about $60 or $120 with OCR capabilities).

Part II Selective PDF Tools

Section 1 above lists many of the PDF properties that can be manipulated during creation. Many of these properties can also be manipulated after the fact, that is, with a PDF that you already own. The functions and tools listed in this section are designed to help you get more out of the PDF files in your collection. If you own a good creation program, such as Adobe Acrobat, many of the following tools will not be necessary as the functionality will already be in the PDF creator tool.

4. PDF Security

Adobe Acrobat provides some security features so that authors can protect their intellectual property. These features use passwords to protect access. One password is the "user" password. If the user password is set, then when you try to open the PDF in a reader, it will prompt you for the password and will not open the file until the correct password is entered. There is another password, known as the "owner" password that allows the author to restrict functions such as printing of the document, or the ability to copy and paste text out of the PDF file.

In general, if you have forgotten the user password, it is difficult to retrieve it without resorting to "brute force" techniques, which essentially amounts to trying every possible password. For the owner, password, however, utilities are available that can disable this password, and allow you access to features that were previously disabled. Thus, for most of these utilities to work you need to at least know the user password.

Also, many of the other tools in this section will not work if the PDF is password protected. While Adobe Acrobat Reader is designed to allow you to enter the appropriate password upon opening it, many of the following tools were created with unprotected PDFs in mind and simply don't recognize encrypted PDFs even if you know the user password!

If you search on the word "password" on the
PDF Tools List, you can locate some of the programs that provide encryption tools for PDF. Please note that I have not tried these programs and cannot make any claims as to functionality or reliability of these products.

5. Split and Merge PDF Files

I have some PDF files that are hundreds of megabytes in size and include over 1,000 pages. These documents can be a bit of bear to open and navigate in Adobe Acrobat Reader, so the option to be able to split these documents into more manageable sizes is very desirable. Also,  you may want to pass on just part of an existing PDF file to a friend. Rather than sending the entire document, you can extract the selected pages and thus reduce the file size.

Also, you may have some already existing PDF files that would work better for you as a single file. The Merge tools will help you do this. They will allow you to merge several PDF documents that may have been created at different times.

Search on the
PDF Tools List page for the terms "merge" or "split" to find a more complete listing of candidate tools. A few programs I tried out and seemed to work just fine were:

http://www.paologios.com: Paolo Gios
Gios PDF Spliiter And Merger. Free & Open Source Split & Merge (requires Microsoft .Net to be installed).

http://www.pdfsam.org: PDF Split and Merge by Andrea
Open Source. Written in Java using the iText Library with a launcher front-end for Windows users.

http://www.plotsoft.com; http://www.pdfill.com PlotSoft LLC
PDFill PDFTools: Free set of tools includes Merge, Split/Reorder, Encrypt/Decrypt, Reformat Multiple Pages into One Page, Header/Footer, Watermark Text/Image, Convert Images to PDF/PDF to Images, PDF Form Field Ops, and PS to PDF

6. Document Properties / Metadata:

As noted in the PDF creation section above, you can specify the Author, Title, Subject, and Keywords for a document upon creation. However, many authors fail to do this, or the documents were created using "defaults" that don't adequately describe the document. Using a document properties tool, you can go back to all those old PDF files on your hard drive and add or update the appropriate information.

Search on the PDF Tools List page for "properties" to locate programs that offer this feature.

7. PDF Speedup

One unique tool helps speed up the loading of Adobe Acrobat Reader by modifying all the plug-ins that are normally loaded at startup so that only the essential plug-ins are loaded. This greatly increases the speed of starting Acrobat Reader.

www.acropdf.com: AcroPDF Systems, France
PDF SpeedUp 1.42: freeware tweak Acobe Acrobat Reader so it doesn't load all the Plugins on startup. Can adjust what plug-ins you want to load.Works through Acrobat 7

8. PDF Page Numbering

You can add page numbers to your PDF files using these tools. A couple of stand-alone programs that allow this include:

http://www.a-pdf.com A-PDF.com
A-PDF Number: Freeware. Add page numbers to PDF files

http://www.bureausoft.com Bureausoft Corporation:  France
PDF Page Number: Add page numbers to PDF $49

http://www.coolpdf.com CoolPDF Software, Incorporated, Spain
CoolPDF Watermark Creator: Free. Add text watermark or number pages

9. Set PDF Viewer Startup Options

When you double click a PDF file to open in Acrobat Reader, the PDF file can be set to open in different views. These tools let you modify that default view.

http://www.bureausoft.com Bureausoft Corporation:  France
PDF Layout: Set startup options  $49

http://www.coolpdf.com CoolPDF Software, Incorporated, Spain
CoolPDF Tweak: Free. Tweak Reader View; Set version; Compress; Modify Info

10. Set PDF Compression.

As discussed in the first section, PDF files can become quite large if you incorporate Fonts and if the images in the file are large. The PDF format specification supports several methods of compression to make the resulting PDF file smaller, thus making it take up less space on your hard drive and making it easier to e-mail.

One tool even lets you uncompress a PDF so that PDF conversions might work better.

http://www.bureausoft.com Bureausoft Corporation:  France
PDF Compress: free, does not specify what it is doing; some tests resulted in larger files!

http://www.coolpdf.com CoolPDF Software, Incorporated, Spain
CoolPDF Tweak: Free. Tweak Reader View; Set version; Compress; Modify Info

http://www.nicepdf.com NicePDF Software, Inc., Italy
Free PDF Compressor 1.12: removes duplicates, uses compression options of Acrobat 1.6. Can also decompress for better conversion of PDF to other formats.

11. Form Filling

PDF files can be created as "forms" that you fill out, such as for a job application. In Adobe Acrobat Reader, however, the filled out form can only be printed, it cannot be saved.

Some PDF tools help you to create, save, or otherwise work with Form Fill PDFs.

http://www.bureausoft.com Bureausoft Corporation:  France
PDF Filler: fill in PDF forms without Acrobat $49

http://www.foxitsoftware.com : Foxit
Foxit PDF Reader: Free and Add-Ons. Free functions include View, Print, PDF Forms (w/save with watermark); View PDF as text (no save)

http://www.fytek.com FyTek PDF and XPS Software
PDF File Save: Save form filled PDFs for use later

http://www.pdfaction.com PDF Action (Australia)
PDF Action Reader V1.6: Free. Fill & Save forms, Delete Pages, Save PDFs.

http://www.plotsoft.com; http://www.pdfill.com PlotSoft LLC
PDFill PDF Editor 4.1 $19.99: PDF Form Filler Tool. Insert, fill, edit, save, etc.

12. Watermarks

Watermarks allow you to paste background (transparent) text on every page of your document. You might want to mark a document as draft, or place your company name or other ownership information on each page. Some programs provide more flexibility than others in this area.

http://www.coolpdf.com CoolPDF Software, Incorporated, Spain
CoolPDF Watermark Creator: Free. Add text watermark or number pages

http://www.plotsoft.com; http://www.pdfill.com PlotSoft LLC
PDFill PDFTools: Free set of tools includes Merge, Split/Reorder, Encrypt/Decrypt, Reformat Multiple Pages into One Page, Header/Footer, Watermark Text/Image, Convert Images to PDF/PDF to Images, PDF Form Field Ops, and PS to PDF

13. Thumbnails
The PDF specification provides for a table of contents listing in the left column. One form provides thumbnail images of each page of your document, allowing the reader to visually select the appropriate page.

http://www.coolpdf.com CoolPDF Software, Incorporated, Spain
CoolPDF Thumbnail Generator: Free. Set left column thumbnails

14. Attachments

A PDF file can also act as a "container" for other files, such as a movie. You can attach other files to the PDF so that your reader can double click these attachments and open them in the appropriate application.

http://www.coolpdf.com CoolPDF Software, Incorporated, Spain
CoolPDF Bundle: Free. Attach any file (Word, Excel, etc.)

15. Change Version

The PDF specification has been successively updated from 1.2, 1.3, 1.4, 1.5, 1.6 and so on. Some 3rd party programs are only compatible with certain versions of Adobe PDF, so if you want more universal access to your documents, you may want to save your document as an older version of PDF, especially if your document isn't using any of the "newer" features of later versions anyway.

Note that if you change the version number without actually converting the document to that version type, you might get unexpected results when trying to open the PDF in your selected application.

http://www.nicepdf.com NicePDF Software, Inc., Italy
Free PDF Version Converter V1.0: Spec ranges from 1.0 to 1.6. Does actual conversion, not just stamp

