Peripherals: Character-recognition devices

Character-recognition devices

One type of character-recognition device enables the user to input text and numerical data by handwriting it in capital letters on a pressure-sensitive surface using a special pen. An invisible grid of the fine wires below the surface detects the shape of the letters, converting them to electrical signals which the computer interprets using special software. Although this device is a genuine replacement for the keyboard, it has never really caught on, being overtaken by other developments, in particular the advances being made in speech recognition devices described later.

Much more useful are the optical character readers (OCRs) which scan text which has been typed or printed on paper, converting the characters to the binary code that the computer understands. These provide a way of passing information between machines which cannot communicate electronically. For example, they enable output from a typewriter to be passed to a word processor for editing and final printing, a technique that has been used in some typing pools. They also enable a business to convert its input documents to electronic form without the need to key them in.

Some modern image scanners, described later in this chapter, can also function as OCRs when used with special OCR software. These can recognize a reasonable range of typefaces, so enabling printed and typed text to be input to a computer. However, smeared characters and unusual typefaces may be beyond them. In place of a character that they can't recognize, they will substitute a special symbol. These symbols can be automatically picked out and replaced later on by spell-checking software.

The microphone

It is quite easy to convert the spoken word to a digital signal for computer input. The microphone converts audio signals to electrical waves, and these can be converted by electronic circuitry in the computer to digital form. What is difficult is the recognition, by the computer, of the signal, so that it can handle it in the same way as if it had been typed. Highly sophisticated speech-recognition software is required, able to match the sound uttered by the user with a vocabulary of sound signals stored in the computer, and to display the words on the screen as though they had been entered at the keyboard.

The development of viable speech-recognition systems for the English language has been a major goal of many researchers for a number of years. Recently, commercial systems have started to emerge. One major problem is the many inconsistencies between the written and spoken word in English. Japanese, in contrast, is phonetically very pre­cise, and so speech-recognition systems for that language were relatively easy to develop and have been used for some time. English language systems face the task of having to infer, from the context, what the words are likely to be.

A second problem is the fact that there can be wide variations between the speech patterns of one individual and another. To cope with this, the system has to be 'trained' to recognize the user's particular speech. Most systems require him to read a passage containing all the words stored in the computer's vocabulary on disk, so that it is able to match what's spoken with what's stored. In this way it constructs speech 'templates' for the user, which it stores for use in all subsequent dictation sessions.

Speech-recognition systems in the past have suffered from either having too limited a vocabulary to be of much use, or else, in the case of large vocabulary systems, taking far too long to match what was spoken with what was stored in the computer. Recent increases in computer power have greatly speeded things up, and voice systems on personal computers have now appeared. The system from Apricot is called 'Dictate', and it has a vocabulary of 30,000 words. IBM has also developed a system for PCs. At the heart of the IBM system is a digital signal processor (DSP) which uses parallel processing techniques and is able to perform 10 million instructions per second.

The system works by recognizing the 200 or more pho­netic elements of words, rather than by attempting to recognize a vast vocabulary of whole words. This means that the computer has to produce only a relatively small number of speech templates, and so the initial training session can be quite brief. To match the spoken word with what's stored in its vocabulary, the computer uses a statisti­cal approach based on an analysis by the IBM researchers of some 25 million words of office correspondence. This approach enables it to predict what words are likely to appear together, and so to select likely candidates from its vocabulary.

When the first word is spoken, the computer makes a tentative assessment but does not display the candidate word on the screen. When the next word is spoken, the initial candidate is reassessed and either accepted or rejected in favour of another, and the result displayed. The process continues through the dictation session.

Comments

Popular posts from this blog

The Conversion Cycle:The Traditional Manufacturing Environment

The Revenue Cycle:Manual Systems

HIPO (hierarchy plus input-process-output)