The Technology of Digital Ink and Recognition
 
Published: 15 Mar 2007
Abstract
In this article Arindam discusses the Tablet PC Platform's ink analysis technology and explains when and how to use it.
by Arindam Ghosh
Feedback
Average Rating: 
Views (Total / Last 10 Days): 32949/ 50

Introduction

Free-form document annotation is a crucial part of every knowledgeable worker’s life. Despite the exponential improvement in computer performance, when it comes to reading and annotating documents people still turn to pen and paper. This is reasonable, as pen and paper offer many advantages. One key advantage is the ease with which the reader may sketch unstructured notes and drawings in response to document content.

There are definite advantages to emulating this annotation ability on a computer. While real ink annotations often end up in the recycle bin, digital annotations can persist throughout the lifetime of a document. They can be filtered and organized, and like digital documents, they can easily be shared.

Now that email and the World Wide Web are well established, the number of digital documents people interact with on a daily basis has increased dramatically. Unlike their paper counterparts, these documents are read in many different formats and they are displayed on diverse devices and in different-sized windows. They may also be edited, included in other documents, or they may even dynamically adapt their contents. All of this means that any given document may reflow to many different layouts throughout its lifetime.

The lack of a permanent layout poses a unique challenge in the adaptation of freeform pen-and-paper annotation to the digital domain. Each time the content of a digital document reflows to a new layout, any digital ink annotations must also reflow to keep up with it.

This represents a significant technological challenge. In order to meet it, we must follow three broad steps.

First, as a user is marking up a document, we must group and classify their ink strokes according to rough annotation categories (e.g. underline, circle, connector, margin comment, etc.).

Second, we must anchor each annotation to its surrounding context in the document. And third, when the layout of the underlying document changes, we must transform each annotation to agree with the new layout of its context.

This third and final step is the primary focus of this paper. We have implemented an initial, straightforward approach to the problem of reflowing ink annotations and there is much work left to do in refining it and developing it into a working solution. Before we develop our approach further, however, there are significant empirical questions we must answer in order to guide our future research.

For instance, what do people expect to happen to their annotations when the underlying document reflows? Does our initial approach achieve the most basic requirement of reflowable annotations, to preserve each annotation’s contextual meaning? And do users prefer their own original ink when viewing their own annotations or are more formalized versions (which are technologically easier to reflow) acceptable? Most people are not familiar with experience of having their ink reflow and so their reactions are largely unknown.

Many groups have addressed handwriting and diagram recognition issues, some have looked at annotation anchoring and some have even looked at modifications to existing ink (for instance “prettying” handwriting), but none to our knowledge has addressed the issue of how users react when their free-form digital ink annotations are automatically reflowed.

Ink-on-Paper versus Digital Displays

The physical properties of high-quality paper make certain helpful functions possible or easier for users, including the following.

Lightweight so usually easy to carry, hold, and position

Thin so easy to grasp

Flexible, thus convenient to position, bend, and fold

Reflective, able to be illuminated for a wide range of brightness and contrast

Markable by a variety of means in a simple and uniform manner

Allowing detailed high-resolution markings

Opaque and two-sided so efficiently legible on both sides

Unpowered so portable and "always-on"

Stable, self-conserving and maintenance-free for many years

Cheap and movable, so many can be used, e.g. spread out side by side

Simple, easily learned, and widely understood methods of use

Digital display technologies used by today’s digital libraries (DL's) to deliver document images—a rapidly evolving ecology of desktop, laptop, and handheld computers, plus eBook readers, tablet PCs, etc— offer often contrasting affordances:

automatically and rapidly rewritable

Interactive

Connected (e.g. wirelessly) to a network and so can deliver potentially unlimited data

Radiant/back-lit, and so legible in the dark, but often limited in range of brightness and contrast

Sensitive (to, e.g., touch, capacitance), and so markable

This catalogue is incomplete but long enough to suggest the multiplicity of ways in which information conveyed originally as ink-on-paper may, and may not, be better delivered by electronic means favored by DL's. One result is that, as some researchers report, “paper [remains at present] the medium of choice for reading, even when the most high-tech technologies are to hand.” They point to four principal reasons for this.

Paper allows “flexible [navigation] through documents”

Paper assists “cross-referencing” of several documents at one time

Paper invites annotation

Paper allows the “interweaving of reading and writing”

It is illuminating to bear these considerations in mind when identifying obstacles to the delivery of document images via DL's.

Of course, efforts are underway to commercialize electronic document displays offering even more of the affordances of paper including flexibility, low weight, low power, and low cost.

Capture of data

The capture of document images for use in DL's is often carried out in large-scale batch operations. The operations are almost always designed narrowly to meet the immediate purpose. For reasons of cost, only rarely will the documents ever be rescanned. In fact, documents can be damaged or destroyed in the process, sometimes deliberately, e.g. spines of books cut off to allow sheet-fed scanning. Even more drastically, many libraries have discarded large collections of books and documents after they have been scanned, triggering anguished charges of violations of the public trust.

Quality Control

Image quality is most often quantified through the technical specifications of the scanning equipment, e.g. depth/color, color gamut and calibration, lighting conditions, digitizing resolution, compression method, and image file format.

Scanner Specifications

In the recent past, most large-scale document scanning projects, constrained by the desirability of high throughput and low storage costs, produced only bilevel1 images; this is now yielding rapidly to multilevel and color scanning. Digitizing resolutions (spatial sampling frequency) for textual documents typically range today between 300 and 400 pixels/inch (ppi). 600 ppi is less common, but is rapidly taking hold as scanner speed and disk storage capacity increase

Tests of commercial OCR machines in 1995 showed that accuracy on bilevel images fell for documents digitized at less than 300 ppi, but did not appreciably rise at 400 ppi. They also showed that some systems were able to exploit the extra information in grey level images to cut error rates by 10%–40%. Researchers have suggested that grey level processing will allow OCR to read documents whose image quality is now well below par. Of course, many document images are printed in color and the costs of color scanning and of file storage and transmission are falling rapidly: the DIA research has, within the last five years, begun to take this challenge seriously – but, in my view, not as fast as it should. Some attempts have been made to issue refined scanning standards. The Association for Information and Image Management (AIIM) publishes standards for the storage of textual images, including ANSI/AIIM MS-44-1988 “Recommended Practice for Quality Control of Image Scanners” which defines “procedures for the ongoing control of quality within a digital document image management system.” It is designed predominantly for use in bilevel imaging. MS-44 test targets include:

IEEE Std 167A-1987, a facsimile machine test target that is produced by continuous-tone photography, with patterns and marks for a large range of measurements of moderate accuracy

AIIM Scanner Target, an ink-on-paper, halftoneprinted target

RIT Process Ink Gamut Chart, a four-color (cyan, magenta, yellow, and black), halftone-printed chart for low accuracy color sensitivity determinations

 

Initial Processing

The PARC Book Scanner illustrates the wide range of early-stage image processing tools needed to support high quality image capture. Note the importance of image calibration and restoration specialized to the scanner. Image processing should, ideally, occur quickly enough for the operator to check each page image visually for consistent quality. Tools are needed for orienting the page so text is right side-up, desk wing the page, removing some of the pepper noise, and removing dark artifacts on or near the image edges. Software support for clerical functions such as page numbering and ordering, and the collection of metadata, are also crucial to maintaining high throughput.

In addition to these, it would be helpful to be able to check each page image for completeness and consistency. Has any text been unintentionally cropped? Are basic measures of image consistency — e.g. brightness, contrast, intensity histograms — stable from page to page, hour after hour? Are image properties consistent across the full page area for each image? Are the page numbers— located and read by OCR on the fly — in an unbroken ascending sequences, and do they correspond to the automatically generated metadata? Techniques likely to assist in these ways may require imaging models that are tuned to shapes or statistical properties of printed characters. Perhaps it will someday be possible to assess both human and machine legibility on the fly.

Restoration

The principal purposes of document image restoration are to assist:

Fast & painless reading

OCR for textual content

DIA for improved human reading (e.g. format preservation)

Characterization of the document (age, source, etc)

To these ends, methods have been developed for contrast and sharpness enhancement, rectification (including skew and shear correction), superresolution, and shape reconstruction.

Rectification

The DIA community has developed many algorithms for accurately correcting skew, shear, and other geometric deformations in document images. It is interesting how inconsistently these have been applied to document images provided by DL's.  Although, uncorrected they are easily detectable by eye and cause some users to complain, they do not affect legibility and reading comfort except in extreme cases (for example more than 3 degrees of skew). However, not all DIA toolkits that may later be run on these images will perform equally well, so it could be a significant contribution to rectify all document images before posting them on DL's. It is also possible — although it is seldom discussed in the DIA literature — to “recenter” text blocks automatically within a standard page area in a consistent manner. Again, it is not clear that this, although a clear improvement in aesthetics, matters much to either human or machine reading.

Analysis of Content

The analysis and recognition of the content of document images requires, of course, the full range of DIA R&D achievements: page layout analysis, text/non-text separation, printed/handwritten separation, text recognition, labeling of text blocks by function, automatic indexing and linking, table and graphics recognition, etc. Most of the DIA literature is devoted to these topics so I will not attempt a thorough survey in this short space.

However, it should be noted that images found in DL's, since they represent many nations, cultures, and historical periods, tend to pose particularly severe challenges to today’s DIA methods, which are not robust in the face of multilingual text and non-Western scripts, obsolete typefaces, old-fashioned page layouts, and low or variable image quality.

Accurate Transcriptions of Text

The central classical task of DIA research has been, for decades, to extract a full and perfect transcription of the textual content of document images. Although perfect transcriptions have been known to result, no existing OCR technology, whether experimental or commercially available, can guarantee high accuracy across the full range of document images of interest to users. Even worse, it is rarely possible to predict how badly an OCR system will fail on a given document.

Determining Reading Order of Sections

Determining the reading order among blocks of text is, of course, a DIA capability critically important for DL's; it would allow more fully automatic navigation through images of text. This, however, remains an open problem in general, in that a significant residue of cases cannot be ambiguities through physical layout analysis alone, but seem to require linguistic or even semantic analysis.

However, the number of ambiguous cases on one page is often small and might be made manageable in practice by a judiciously designed interactive GUI presenting ambiguities in a way that invites easy selection (or even correction). Such capabilities exist in specialized high-throughput scan-and conversion service bureaus, but are not now available to the users of any DL, allowing them to correct reading order and so improve their own navigation.

Conclusion

Digital ink annotations are an important advance over other digital annotation technologies. Though there is still much work to be done, we have described a flexible framework for handling reflowable ink annotations.

 



User Comments

No comments posted yet.

Product Spotlight
Product Spotlight 





Community Advice: ASP | SQL | XML | Regular Expressions | Windows


©Copyright 1998-2024 ASPAlliance.com  |  Page Processed at 2024-03-28 8:33:16 AM  AspAlliance Recent Articles RSS Feed
About ASPAlliance | Newsgroups | Advertise | Authors | Email Lists | Feedback | Link To Us | Privacy | Search