The Technology of Digital Ink and Recognition
page 7 of 8
by Arindam Ghosh
Average Rating: 
Views (Total / Last 10 Days): 33037/ 63

Initial Processing

The PARC Book Scanner illustrates the wide range of early-stage image processing tools needed to support high quality image capture. Note the importance of image calibration and restoration specialized to the scanner. Image processing should, ideally, occur quickly enough for the operator to check each page image visually for consistent quality. Tools are needed for orienting the page so text is right side-up, desk wing the page, removing some of the pepper noise, and removing dark artifacts on or near the image edges. Software support for clerical functions such as page numbering and ordering, and the collection of metadata, are also crucial to maintaining high throughput.

In addition to these, it would be helpful to be able to check each page image for completeness and consistency. Has any text been unintentionally cropped? Are basic measures of image consistency — e.g. brightness, contrast, intensity histograms — stable from page to page, hour after hour? Are image properties consistent across the full page area for each image? Are the page numbers— located and read by OCR on the fly — in an unbroken ascending sequences, and do they correspond to the automatically generated metadata? Techniques likely to assist in these ways may require imaging models that are tuned to shapes or statistical properties of printed characters. Perhaps it will someday be possible to assess both human and machine legibility on the fly.


The principal purposes of document image restoration are to assist:

Fast & painless reading

OCR for textual content

DIA for improved human reading (e.g. format preservation)

Characterization of the document (age, source, etc)

To these ends, methods have been developed for contrast and sharpness enhancement, rectification (including skew and shear correction), superresolution, and shape reconstruction.


The DIA community has developed many algorithms for accurately correcting skew, shear, and other geometric deformations in document images. It is interesting how inconsistently these have been applied to document images provided by DL's.  Although, uncorrected they are easily detectable by eye and cause some users to complain, they do not affect legibility and reading comfort except in extreme cases (for example more than 3 degrees of skew). However, not all DIA toolkits that may later be run on these images will perform equally well, so it could be a significant contribution to rectify all document images before posting them on DL's. It is also possible — although it is seldom discussed in the DIA literature — to “recenter” text blocks automatically within a standard page area in a consistent manner. Again, it is not clear that this, although a clear improvement in aesthetics, matters much to either human or machine reading.

Analysis of Content

The analysis and recognition of the content of document images requires, of course, the full range of DIA R&D achievements: page layout analysis, text/non-text separation, printed/handwritten separation, text recognition, labeling of text blocks by function, automatic indexing and linking, table and graphics recognition, etc. Most of the DIA literature is devoted to these topics so I will not attempt a thorough survey in this short space.

However, it should be noted that images found in DL's, since they represent many nations, cultures, and historical periods, tend to pose particularly severe challenges to today’s DIA methods, which are not robust in the face of multilingual text and non-Western scripts, obsolete typefaces, old-fashioned page layouts, and low or variable image quality.

Accurate Transcriptions of Text

The central classical task of DIA research has been, for decades, to extract a full and perfect transcription of the textual content of document images. Although perfect transcriptions have been known to result, no existing OCR technology, whether experimental or commercially available, can guarantee high accuracy across the full range of document images of interest to users. Even worse, it is rarely possible to predict how badly an OCR system will fail on a given document.

Determining Reading Order of Sections

Determining the reading order among blocks of text is, of course, a DIA capability critically important for DL's; it would allow more fully automatic navigation through images of text. This, however, remains an open problem in general, in that a significant residue of cases cannot be ambiguities through physical layout analysis alone, but seem to require linguistic or even semantic analysis.

However, the number of ambiguous cases on one page is often small and might be made manageable in practice by a judiciously designed interactive GUI presenting ambiguities in a way that invites easy selection (or even correction). Such capabilities exist in specialized high-throughput scan-and conversion service bureaus, but are not now available to the users of any DL, allowing them to correct reading order and so improve their own navigation.

View Entire Article

User Comments

No comments posted yet.

Product Spotlight
Product Spotlight 

Community Advice: ASP | SQL | XML | Regular Expressions | Windows

©Copyright 1998-2024  |  Page Processed at 2024-02-25 9:17:20 PM  AspAlliance Recent Articles RSS Feed
About ASPAlliance | Newsgroups | Advertise | Authors | Email Lists | Feedback | Link To Us | Privacy | Search