1. Create
PDF
As the
creator of the PDF format, Adobe has developed Adobe Acrobat as the
primier tool for the creation of PDF files. It allows you full editing
capabilities to incorporate text, images, and multimedia into a
document complete with hyperlinks, thumbnails, and table of contents
listings, and more. Though full features, Acrobat is also expensive,
starting at $449. One of the main advantages of using the Adobe product
is that you will be up to date with the latest Adobe PDF format
version, which is currently up to PDF Specification 1.7.
Other
editors exist, some offering simple text editing into PDF, and others
providing more full featured tools. A few of these tools are even
provided free, though the feature set of these is quite less than that
of Acrobat!
A more
popular option is through the use of a specialized Microsoft Windows
Printer Driver. This driver sits in your "Printers and Faxes" area of
your start menu, and acts just like a printer. However, instead of
printing to a physical printer, the printer driver saves the output in
the format of a PDF file.
There are
tons of PDF printer drivers out there (see the PDF Tools listing for a
variety of them). They range from free products to versions over $100.
Some of the free versions print footers at the bottom of every page
advertising their product.
There are a
lot of Creation Properties that can go into creating a PDF file. Some
PDF creators are so simple they don't allow you to maniplate any of
these properties. Others allow setting up all of these parameters. In
general, most of the free programs are more limited in this area. I
recommend experimenting with the programs that look interesting to you,
to see how well they implement the following features.
Font Embedding: You may have written
a document in your word processing program using the COMIC font, and
then printed it to a PDF. When someone else who doesn't have COMIC on
thier system opens the PDF, however, the COMIC font appears as some
other font, ruining your planned presentation. To solve this problem,
the PDF specification allows for the embeding of fonts in the document.
This guarantees the reader will see the fonts you intended. The
downside to embedding fonts is that it makes the document larger.
To
compensate for this, PDF creation tools usually allow you some
flexibility on specifying what fonts (if any) you want to embed. A
couple of choices include:
Common Fonts: If most of your PDF
readers will be on Microsoft Windows, you can choose not to embed common fonts
that are found on most systems such as "Ariel" or "Times Roman". I
believe that these fonts are common on most other operating systems as
well, such as Linix and MacOS.
Embed Font
Subsets:
Some programs allow you to embed even a subset of a given Font. Most
fonts include many characters that you never use, such as a smiley face
or other extended characters. By specifying only to embed Font Subsets,
only Font characters associated with your document are embedded.
Image
Downsizing: Images can be very large and may greatly increase the size
of your PDF file. Rather than storing the original image in the PDF
file, some creation programs allow you to resample images so that a
smaller version is used within the PDF. For example, a image resolution
of 96 dots per inch (dpi) produces a PDF that will look good on the
screen only, but the overall PDF has a much smaller size than if the
original images were embedded. When this option is selected, images
that have resolution greater than 96dpi will be reduced to this
resolution. A 300dpi is suitable for normal printing and is
likely the best overall choice. If you plan on allowing high resolution
printing, however, or if you want to ensure maximum quality if the PDF
is converted to another format later on in its life, then you can turn
off image compression and allow images up to 4000dpi.
Compression: You can also use a
compression algorithm for reducing image size. ZIP is a lossless
compression, and works best on images with repeating patterns. JPEG is
a lossy compression method that achieves a greater compression rate
than ZIP, but does reduce the quality of the images embedded in the PDF.
Security: PDF allows you to
protect your document with several options. You can set a "user"
password that must be entered before the PDF file can be opened for
viewing. There is also an "owner" password that provides some control
over how the document can be handled once opened. For example, you can
toggle whether or not the user can print the document, print only a
low-resolution version, modify it (such as move or delete pages),
extract images and text, or allow annotations/form filling to be
performed. The security level can be set to 40 or 128 bits (128 bits is
harder to "crack" and is compatible with Adobe Acrobate 5.0 or
greater).
Digital
Signature: this allows a PDF to be digitally signed. You need to
create or have a Digital ID to use this function.
Watermarks: Add a text watermark
to the page. For example, you might want to place the word DRAFT in the
background of a document that is sent out for review but not yet
finalized, or you might want your company name shown as transparent
text on every page. The watermark tools usually allow you to select the
font, font size, rotate the text, and so forth.
Stamp: Tools with this
feature allow you to insert an image as a background either on the
first page only, or on every page.
Document
Properties or Metadata: Metadata is data about data. That
is, it describes what is in a file. When you select Document Properties
in Adobe Acrobat Reader, it will display the Title, Author, Subject and
Keywords related to the document. Some PDF creation tools allow you to
set this information, otherwise it will use default data or be blank.
Hyperlink
Support:
PDFs created from scratch or converted from HTML documents can support
the included hyperlinks. Some PDF creator programs will only create
hypertext from full "http://" or "www." links. Others will allow full
hyperlinks when documents are created from within Microsoft Word.
BookMarks: PDF files can
display a sort of Table of Contents or Bookmark listing in a left panel
to help you navigate more quickly through the document. Some PDF
printer drivers also support this, though usually only from documents
created in Microsoft Word.
Thumbnails: PDF documents can
insert a "thumbnail" image of each page, which can also act as a table
of contents listing. By viewing the thumbnails in a left column, you
may be able to more quickly maneuver to the page you are looking for.
Merging: Some of the programs
allow you to merge multiple PDF documents into an existing one. I often
use this to merge multiple pages from web-research into a single PDF
for later review.
Preview: Some programs simply
write the file and then open it in the default reader. Others provide
you a preview window so you can see what the document looks like before
printing, and allows you to make additional modifications to the
document (such as setting the Document Properties).
Others: There are other
properties not listed here that are available in some programs (such as
the ability to delete or move pages before printing to the PDF), but
this listing should give you a good starting point for making your own
comparison. By trying the various programs, you will get a feel for
what features are more important to you. Some people don't care at all
about the security features of PDF, for example, and thus don't need
the "deluxe" versions that add this capability.
2. Converting from PDF to
other formats; Extracting text and images
One of the advantages of PDF over a format such as HTML is that you can
package an entire "book" of information into one file. This is easy to
store, e-mail, and move as needed. The disadvantage of PDF, however, is
that is it difficult to edit without specialized software, difficult to
use just a single image or text portion in another file, and difficult
even to view in a web-browser without the use of a specialized plug-in.
There are many tools to convert PDF to other formats such as HTML or
Microsoft Word. These programs will vary in their ability to preserve
original layout, and even the text conversion will be dependent on the
quality of the text in the original PDF. When you try to convert PDFs
or extract text, you may be surprised in some cases to find that your
resulting document has no readable text. Some PDF files that appear to
be text are really just images of the text, and will not convert
without a special Optical Character Recognition (OCR) application
(these are covered in the next section).
Some PDF tools are limited to simple text and/or image extraction. They
will lose the formatting of the PDF, but sometimes all you are looking
for is a raw extract anyway. Even Adobe's Acrobat Reader offers the
ability to save just the text from most PDFs.
Browse the PDF Tools
List page for many
tools that allow you to convert PDFs or extract text and images from
them.
3. Image PDF to Searchable PDF
You've got
that nice pristine PDF file and the text and the graphics look great.
You search on a word in the file, however, and find that it doesn't
find it. You then try to copy and paste a few words from the article,
and when you click the mouse button you see all the text in a block
become highlighted instead! The PDF isn't protected by security, so
what is going on?
Some
applications let you scan an article right into the PDF format. The
problem with this method is that the text is usually stored as a
graphic rather than as searchable text. That is, it is like you took a
photograph of the text rather than retyping it into the PDF. Thus, the
text exists only as an image, and not as individual letters and
characters.
The solution
for this is to use conversion software that include Optical Character
Recognition (OCR) capabilities. This software can read an image PDF,
perform OCR on any included text, and save the output as either a new
searchable PDF or perhaps an HTML or Microsoft Word document.
There are
several OCR programs that let you work from an image file and these
often come with Image Scanners. However, working right from the
original PDF is more convenient. A few programs that let you perform
this function right from an image PDF file include:
ABBYY Software from Russia (http://www.abbyy.com). Abbyy PDF Transformer Pro V2 provides
the ability to convert image PDF files to scannable PDF, as well as
converting PDF files to other formats such as Microsoft Word or HTML. I
tried the 15-day/50 page demo, and the conversions were of excellent
quality. The product costs around $100, and if you have the need to
convert a lot of image PDF documents, this program seems to be an
excellent choice.
Nuance Communications, Inc.
(formerly ScanSoft, http://www.nuance.com). Nuance produces PDF Converter
Professional. This program includes OCR capability to convert image
PDFs to Word or other formats along with many other features. PDF
Converter Professional has more features than Abbyy's PDF Transformer
Pro, but the OCR capabilities didn't seem to work as well, at least in
the few demo documents I tried. The rendering of the converted fonts
was not as pleasing, and as is typical of many OCR programs, many
characters are improperly converted (such as a close "al" combination
being read as a "d", for example). I would recommend trying to get a
demonstration package before purchasing, or see if you can get a money
back guarantee so you can try it yourself before buying (about $100).
Investintech.com, Inc. (http://www.investintech.com). The only other program I'm aware of
with image PDF OCR capabilities is Able2Doc and Able2Extract programs
from Investintech. These don't convert directly back into searchable
PDF, but rather convert into Word, HTML, and other formats. You would
have to have another PDF conversion program to get this extracted (and
OCR'd) information back into PDF format. I did not try either of these
programs in the OCR version. (about $60 or $120 with OCR capabilities).
Part II Selective PDF Tools
Section 1 above lists many of the PDF properties that can be
manipulated during creation. Many of these properties can also be
manipulated after the fact, that is, with a PDF that you already own.
The functions and tools listed in this section are designed to help you
get more out of the PDF files in your collection. If you own a good
creation program, such as Adobe Acrobat, many of the following tools
will not be necessary as the functionality will already be in the PDF
creator tool.
4. PDF Security
Adobe Acrobat provides some security features so that authors can
protect their intellectual property. These features use passwords to
protect access. One password is the "user" password. If the user
password is set, then when you try to open the PDF in a reader, it will
prompt you for the password and will not open the file until the
correct password is entered. There is another password, known as the
"owner" password that allows the author to restrict functions such as
printing of the document, or the ability to copy and paste text out of
the PDF file.
In general, if you have forgotten the user password, it is difficult to
retrieve it without resorting to "brute force" techniques, which
essentially amounts to trying every possible password. For the owner,
password, however, utilities are available that can disable this
password, and allow you access to features that were previously
disabled. Thus, for most of these utilities to work you need to at
least know the user password.
Also, many of the other tools in this section will not work if the PDF
is password protected. While Adobe Acrobat Reader is designed to allow
you to enter the appropriate password upon opening it, many of the
following tools were created with unprotected PDFs in mind and simply
don't recognize encrypted PDFs even if you know the user password!
If you search on the word "password" on the PDF Tools List, you can locate some of the programs
that provide encryption tools for PDF. Please note that I have not
tried these programs and cannot make any claims as to functionality or
reliability of these products.
5. Split and Merge PDF Files
I have some PDF files that are hundreds of megabytes in size and
include over 1,000 pages. These documents can be a bit of bear to open
and navigate in Adobe Acrobat Reader, so the option to be able to split
these documents into more manageable sizes is very desirable.
Also, you may want to pass on just part of an existing PDF file
to a friend. Rather than sending the entire document, you can extract
the selected pages and thus reduce the file size.
Also, you may have some already existing PDF files that would work
better for you as a single file. The Merge tools will help you do this.
They will allow you to merge several PDF documents that may have been
created at different times.
Search on the PDF
Tools List page for
the terms "merge" or "split" to find a more complete listing of
candidate tools. A few programs I tried out and seemed to work just
fine were:
http://www.paologios.com: Paolo Gios
Gios PDF Spliiter And Merger.
Free & Open Source Split & Merge (requires Microsoft .Net to be
installed).
http://www.pdfsam.org: PDF Split and Merge by Andrea
Open Source. Written in Java
using the iText Library with a launcher front-end for Windows users.
http://www.plotsoft.com; http://www.pdfill.com PlotSoft LLC
PDFill PDFTools: Free set of
tools includes Merge, Split/Reorder, Encrypt/Decrypt, Reformat Multiple
Pages into One Page, Header/Footer, Watermark Text/Image, Convert
Images to PDF/PDF to Images, PDF Form Field Ops, and PS to PDF
6. Document Properties /
Metadata:
As noted in the PDF creation
section above, you can specify the Author, Title, Subject, and Keywords
for a document upon creation. However, many authors fail to do this, or
the documents were created using "defaults" that don't adequately
describe the document. Using a document properties tool, you can go
back to all those old PDF files on your hard drive and add or update
the appropriate information.
Search on the PDF Tools List page for "properties" to locate
programs that offer this feature.
7. PDF Speedup
One unique tool helps speed
up the loading of Adobe Acrobat Reader by modifying all the plug-ins
that are normally loaded at startup so that only the essential plug-ins
are loaded. This greatly increases the speed of starting Acrobat
Reader.
www.acropdf.com: AcroPDF Systems, France
PDF SpeedUp 1.42: freeware tweak
Acobe Acrobat Reader so it doesn't load all the Plugins on startup. Can
adjust what plug-ins you want to load.Works through Acrobat 7
8. PDF Page Numbering
You can add page numbers to
your PDF files using these tools. A couple of stand-alone programs that
allow this include:
http://www.coolpdf.com CoolPDF Software, Incorporated, Spain
CoolPDF Watermark Creator: Free.
Add text watermark or number pages
9. Set
PDF Viewer Startup Options
When you double click a PDF
file to open in Acrobat Reader, the PDF file can be set to open in
different views. These tools let you modify that default view.
http://www.coolpdf.com CoolPDF Software, Incorporated, Spain
CoolPDF Tweak: Free. Tweak Reader
View; Set version; Compress; Modify Info
As discussed in the first section, PDF files can become quite large if
you incorporate Fonts and if the images in the file are large. The PDF
format specification supports several methods of compression to make
the resulting PDF file smaller, thus making it take up less space on
your hard drive and making it easier to e-mail.
One tool even lets you uncompress a PDF so that PDF conversions might
work better.
http://www.bureausoft.com Bureausoft Corporation: France
PDF Compress: free, does not
specify what it is doing; some tests resulted in larger files!
http://www.coolpdf.com CoolPDF Software, Incorporated, Spain
CoolPDF Tweak: Free. Tweak Reader
View; Set version; Compress; Modify Info
http://www.nicepdf.com NicePDF Software, Inc., Italy
Free PDF Compressor 1.12: removes
duplicates, uses compression options of Acrobat 1.6. Can also
decompress for better conversion of PDF to other formats.
11. Form Filling
PDF files can be created as "forms" that you fill out, such as for a
job application. In Adobe Acrobat Reader, however, the filled out form
can only be printed, it cannot be saved.
Some PDF tools help you to create, save, or otherwise work with Form
Fill PDFs.
http://www.foxitsoftware.com : Foxit
Foxit PDF Reader: Free and
Add-Ons. Free functions include View, Print, PDF Forms (w/save with
watermark); View PDF as text (no save)
http://www.pdfaction.com PDF Action (Australia)
PDF Action Reader V1.6: Free.
Fill & Save forms, Delete Pages, Save PDFs.
Watermarks allow you to paste background (transparent) text on every
page of your document. You might want to mark a document as draft, or
place your company name or other ownership information on each page.
Some programs provide more flexibility than others in this area.
http://www.coolpdf.com CoolPDF Software, Incorporated, Spain
CoolPDF Watermark Creator: Free.
Add text watermark or number pages
http://www.plotsoft.com; http://www.pdfill.com PlotSoft LLC
PDFill PDFTools: Free set of
tools includes Merge, Split/Reorder, Encrypt/Decrypt, Reformat Multiple
Pages into One Page, Header/Footer, Watermark Text/Image, Convert
Images to PDF/PDF to Images, PDF Form Field Ops, and PS to PDF
13. Thumbnails
The PDF specification
provides for a table of contents listing in the left column. One form
provides thumbnail images of each page of your document, allowing the
reader to visually select the appropriate page.
http://www.coolpdf.com CoolPDF Software, Incorporated, Spain
CoolPDF Thumbnail Generator:
Free. Set left column thumbnails
14. Attachments
A PDF file can also act as a "container" for other files, such as a
movie. You can attach other files to the PDF so that your reader can
double click these attachments and open them in the appropriate
application.
http://www.coolpdf.com CoolPDF Software, Incorporated, Spain
CoolPDF Bundle: Free. Attach any
file (Word, Excel, etc.)
15. Change
Version
The PDF specification has
been successively updated from 1.2, 1.3, 1.4, 1.5, 1.6 and so on. Some
3rd party programs are only compatible with certain versions of Adobe
PDF, so if you want more universal access to your documents, you may
want to save your document as an older version of PDF, especially if
your document isn't using any of the "newer" features of later versions
anyway.
Note that if you change the
version number without actually converting the document to that version
type, you might get unexpected results when trying to open the PDF in
your selected application.
http://www.nicepdf.com NicePDF Software, Inc., Italy
Free PDF Version Converter V1.0:
Spec ranges from 1.0 to 1.6. Does actual conversion, not just stamp