Document Imaging and Processing Typically Go Together

Document Image Processing can be for differentpurpose of making the document images editable.
purposes.Once the images of text documents have been
For example, the processing might be nothing moremade machine-readable, the next, typical document
than cleaning up the document. Typical documentsimaging process is to index them. Indexing makes the
often contain punch holes, black borders, undesireddocuments searchable. Full-text indexing makes them
lines, and so on. There are document-cleaning toolssearchable by any word in the document.
that can remove these from the document imagesFull-text indexing takes up lots of storage space and
after they are scanned. Document cleaning softwarean alternative is to index by tags and meta
can also allow users to specify what to do aboutdescriptions. Tags are words that typify the
such elements in scanned images.document's content. Descriptions give short
Other kinds of cleaning up include:summaries of the content.
- Straightening askew imagesThe processing of the document images can go even
- Removing borders that exceed givenfurther. Based on programmed specifications, the
noise-tolerance specificationsdocuments can be categorized and stored in
- Smoothing nicks and bumps distorting scanned textappropriate repositories.
charactersIn short, document image processing can facilitate
- Converting white text on black to black text oncontent management by converting paper
whitedocuments into categorized content ready to be
These and other cleaning tools can be automated byqueried by users, all in a matter of minutes with
specifying minimum and/or maximum sizes of theminimal human intervention.
elements to be removed.There are mailroom processors that can extract
Major Image Processing Tasksdocuments from envelopes and then go on to
In the case of text documents, document imagingprocess the documents as above. With this kind of a
produces images that humans can read, but machinesfacility, a single operator can manage mail volumes
can't. For making these documents searchable byformerly handled by many clerks, and also go
using the typed words, the text characters on thesignificantly further in the content-management
images need to be converted into a machine-readableprocess than the clerks.
format.Conclusion
This conversion is done using technologies such asDocument imaging and processing typically go
OCR (Optical Character Recognition) and ICRtogether. The processing can be simple tasks like
(Intelligent Character Recognition). Even hand-printedremoving undesired elements such as distortions and
characters can be recognized to some extent byblack borders or complex tasks such as converting
these technologies.text images into machine-readable characters and
This kind of conversion is also needed for theindexing the content.