Authors
Publication
Publication Details
Volume: 3 Issue: 6
Date
Pages
Your ZX/TS computer can talk to you, with the help of a ‘Speech Synthesizer’ system. Speech or voice synthesis systems are combinations of hardware and software which, when tied in with your computer, can put electronically generated sounds and noises together into intelligible words and phrases. There are currently at least 16 semiconductor houses producing special LSI (Large Scale Integration) chips which can talk.
Voice Synthesis Techniques
These chips can all be computer controlled, and most use one of five principal synthesis techniques: Linear Predictive Coding, Allophone Synthesis, Pulse Code Modulation (PCM), Time Domain Synthesis, and PARCOR. The first two methods are the most popular and perhaps the easiest to obtain for your ZX/TS machine and will be the focus of this article.
Early attempts at recreating speech centered around digitally encoding actual spoken words. The problem with such methods was that prodigious amounts of memory (as much as 1M bit/word) were required for a microprocessor to speak in real time.
The PCM technique digitizes and compresses speech to the point where perhaps only 20 to 70 thousand bits are required for one second of speech. This is still a rather large requirement for a microcomputer. In addition, the entire vocabulary, just as it will be spoken, must be stored in memory (usually ROM) somewhere.
LPC uses an electronic model of the human vocal tract to produce sounds. In LPC, just as with PCM, the words we want the computer to say must be stored in ROM. In LPC, however, instead of a compressed duplicate of actual human speech being stored in ROM, only the parameters for producing the sounds are kept.
These parameters tell the “electronic mouth” when to perform the electronic analogue of exhaling fully, vibrating vocal cords, placing the tip of its tongue against the back of its teeth, etc. Straight LPC requires that the desired word be spoken by a human, into a special computer controlled filtering system and then stored in a ROM. Memory requirements are less than PCM, but so is speech quality. Straight LPC for your ZX/TS is perhaps best illustrated by the TI Speak and Spell interface article in Computers and Electronics, February 1983. TI’s TMS 5220 chip works well with Z80 processors and can be used, for example, with their VM 71003 ROM chip to create a “talking clock” (see Radio-Electronics, May and July 1983).
“Phoneme” or “Allophone” synthesizers start with as few as 64 basic sounds (the phonemes) or their variants (the allophones) which can be used to make up most of the words of a spoken language. These use a number of techniques, including LPC, to concatenate these fundamental sounds into words. In this case, there is virtually no off chip ROM requirement, as simple 8 bit codes representing the phonemes can be stored in the RAM of your computer and fed through the synthesizer one at a time. Speech quality is often not as high as ROM word based LPC or PCM, due to the limited number of phonemes or ways of combining them. The General Instrument/Voicetech units mentioned in Radio-Electronics, March 1983, and used in the R.I.S.T. Parrot, and Votrax’s SC-01 chips are of the LPC allophone type. G.I. also makes ROM-based LPC chips (SPO 250) (see Radio- Electronics, June 1983, on talking computer games).
Synthesizer Chips
The synthesizer chips themselves have been dropping in price faster than the TS1000 in recent months, with chips which used to sell for up to $100 now going to OEM’s for less than $10 and in some cases less than $5.
Complete synthesizer units consisting of the synthesizer itself, operating system, and ROM (if required) can now be purchased for from $30 (Cheaptalk) to $100 (Digi talker). Most of these can be easily interfaced with a ZX/TS through a Z80 PIO or other peripheral interface.
Uses of Speech Synthesis
What can you use speech synthesis for? In a security system, a synthesized voice can warn you of impending problems verbally. Other annunciator uses include overtemperature, hi-water level, “lights on,’* etc. All of these can warn you of situations requiring your attention. In education uses, a voiced response can be more “friendly” for young or novice students. Speech or visually handicapped people can even use their ZX/TS to communicate with the world. How about adding some interesting byplay to your favorite game, or make the “voice” your third eye when running complicated action/adventure games. The voice can describe your general circumstances, while you concentrate on the visual information presented on the immediate screen.
The Best Technique
“Which is the best technique for long term?” has been a big question in the field of voice synthesis for a long time. Generally, as we said, the more memory intensive systems sound better, but cost more, and are relatively inflexible. The allophone systems are cheaper and more versatile, but produce speech that is far from human sounding. The dividing line between the ROM-based and allophone systems seems to be blurring as hardware manufacturers strive to get the best of both worlds. As an example, consider that prefixes (e.g., the AT in ATTACK) of many words in some ROM-based systems can be addressed individually. We might be getting very close to using phonemes with such slicing. Similarly, with certain pairs of English letters, there is no specific combination of two individual letter sounds which produces the correct sound for both if they appear in a particular word (this is called coarticulation). The only way to get really accurate reproduction of these sounds is to add them to our basic list of allophones in ROM.
A judicious blend of hardware, software (e.g., in a small on-board ROM), and expandability should provide a system capable of realistic, infinitely variable speech. This is, we understand, the sort of approach which Votrax, one of the leaders in the field, is following with its second-generation systems.
One final note, while adequate hardware and quite a few word libraries exist today, there is very little adequate software for users and even OEM’s. The development of user friendly, comprehensive software packages for the various personal microcomputers will greatly enhance the usefulness of your “talking” computer.