| Document Image Processing can be for different | | | | purpose of making the document images editable. |
| purposes. | | | | Once the images of text documents have been |
| For example, the processing might be nothing more | | | | made machine-readable, the next, typical document |
| than cleaning up the document. Typical documents | | | | imaging process is to index them. Indexing makes the |
| often contain punch holes, black borders, undesired | | | | documents searchable. Full-text indexing makes them |
| lines, and so on. There are document-cleaning tools | | | | searchable by any word in the document. |
| that can remove these from the document images | | | | Full-text indexing takes up lots of storage space and |
| after they are scanned. Document cleaning software | | | | an alternative is to index by tags and meta |
| can also allow users to specify what to do about | | | | descriptions. Tags are words that typify the |
| such elements in scanned images. | | | | document's content. Descriptions give short |
| Other kinds of cleaning up include: | | | | summaries of the content. |
| - Straightening askew images | | | | The processing of the document images can go even |
| - Removing borders that exceed given | | | | further. Based on programmed specifications, the |
| noise-tolerance specifications | | | | documents can be categorized and stored in |
| - Smoothing nicks and bumps distorting scanned text | | | | appropriate repositories. |
| characters | | | | In short, document image processing can facilitate |
| - Converting white text on black to black text on | | | | content management by converting paper |
| white | | | | documents into categorized content ready to be |
| These and other cleaning tools can be automated by | | | | queried by users, all in a matter of minutes with |
| specifying minimum and/or maximum sizes of the | | | | minimal human intervention. |
| elements to be removed. | | | | There are mailroom processors that can extract |
| Major Image Processing Tasks | | | | documents from envelopes and then go on to |
| In the case of text documents, document imaging | | | | process the documents as above. With this kind of a |
| produces images that humans can read, but machines | | | | facility, a single operator can manage mail volumes |
| can't. For making these documents searchable by | | | | formerly handled by many clerks, and also go |
| using the typed words, the text characters on the | | | | significantly further in the content-management |
| images need to be converted into a machine-readable | | | | process than the clerks. |
| format. | | | | Conclusion |
| This conversion is done using technologies such as | | | | Document imaging and processing typically go |
| OCR (Optical Character Recognition) and ICR | | | | together. The processing can be simple tasks like |
| (Intelligent Character Recognition). Even hand-printed | | | | removing undesired elements such as distortions and |
| characters can be recognized to some extent by | | | | black borders or complex tasks such as converting |
| these technologies. | | | | text images into machine-readable characters and |
| This kind of conversion is also needed for the | | | | indexing the content. |