Say What?

Authors

Publication

Publication Details

Date

Pages

See all articles from SYNC v3 n6

You have just completed entering a program containing almost 1,000 statements, yet during all that programming your fingers never touched a keyboard. Anticipating the best, you lean back in your chair and with a single spoken command you see what an evening’s worth of effort will bring. “RUN” you command . . . and RUN the program does.

Is this scenario a dream? Hardly. Take a look through any current technology related publication and you are likely to come across an article or advertisement concerning speech recognition. It is a technology which offers an alternative to the normal human-machine interface consisting of direct physical contact. Today you flip a switch, press a key, and turn a dial. Tomorrow you may only have to speak the command to perform these tasks.

But back to today, and in particular to the subject of this article: a simple speech recognition program for ZX/TS computers (with at least 16K RAM). The word “simple” should be emphasized. The program is relatively simple to enter and run. It is limited to recognizing only ten simple words (but ten words of your choice!). It is not designed to replace your keyboard (sorry), rather it is an experimental tool. With consistent pronounciation the program will recognize and display the correct word approximately 9 out of 10 times. Not bad . . . but I hope not to hear from someone who has interfaced the system to control the brake and accelerator in his automobile!

The speech recognition program has three major parts:

1) A speech input routine. This displays a “voice print” in the form of a histogram, and is actually a pseudo frequency spectrum of a vocalized word or sound.

2) A file system. Up to ten separate voice prints along with the corresponding word (string) are stored. These voice prints and strings are employed during speech recognition for comparison.

3) A speech recognition routine. This compares a newly spoken voice print to the prints stored in the file. A string corresponding to the best matching voice print entry is then displayed on the TV screen.

Each of these three parts consists of Basic statements with calls to appropriate machine code routines when fast and efficient program execution is required.

Hardware

To use the program, a small piece of hardware must be constructed. This provides an amplified voice signal to the computer’s ear input. Parts for the board can be readily obtained at most Radio Shack stores and G. Russell Electronics. Even for those with limited construction experience, assembly should take no more than a couple of hours.

Step by Step Construction

A complete parts list and required tools are given in Table 1.

1) Lay out the board. Photo 1 shows our prototype. We found this configuration comfortable to hold in the palm of a hand. Component layout is given in Figure 1 (Note: the copper pad side of the grid board is down).

2) Drill a 7 / 32 ” hole for the DPDT power switch at hole W-3 (W-3 are grid board coordinates). With the Vi 6 ” drill bit, enlarge the holes at positions C-13, D-13, V-13, and W-13 to allow the battery Miap leads to pass. Also with the Vie” drill bit, enlarge the hole at J-30 and drill two new holes to accommodate the 3 conductor mini lack. Make sure the threaded portion of the jack extends past the board edge so that the cassette cable mini plug can be fully inserted.

3) Install the DPDT switch in the Y32″ hole with the supplied hardware.

4) Install the 3 conductor mini jack into the proper holes. A small amount of silicon rubber sealer (or other available adhesive) will help to fix the jack onto the board. Solder the jack terminal at hole J-30 to its respective pad.

5) Solder a IV2″ piece of the 22 gauge single conductor wire to each of the crystal mike element terminals. Bend these wires perpendicular to the element, and insert the (-) terminal lead in hole K-6 and ( + ) terminal lead in hole N-6. Solder these wires to their respective pads. Let the excess wire remain unconnected temporarily.

6) Insert the 8 pin IC socket into the board. The socket should occupy holes K-10 through K-13 and holes N-10 through N-13. Pin 1 is located at hole N13; pin 8 is located at hole K-13. After insertion, bend the leads of the socket outward to hold it in place. Do not install the OP AMP at this time.

7) Insert resistor leads as in Table 2. After positioning a resistor, solder the protruding lead to its respective pad.

8) Bend the leads of the resistors to make the required connection(s). See Figure 2. Solder where necessary. As a check for completeness, trace over the schematic with high lighting pen after each connection has been made. Photo 2 shows the prototype backside.

9) Wire the connections to the mini jack. Connect IC socket pin 6 (hole K11) to the jack terminal extending through hole J-30. This is the amplifier output. It must be connected to the jack terminal which makes contact with the tip of the cassette cable plug. Connect either of the other two jack terminals to ground.

10) Complete the circuit by wiring the battery and switch leads. Ground is established by tying and soldering together the black lead of the right side battery snap (hole V-13) with the red lead of the left side battery snap (hole D13). Solder this junction to the resistor leads extending through holes T- 1 1 and T-12. The ground lead of the crystal mike (hole K-6) is also connected to this point. With a piece of the 22 gauge wire, connect one of the mini jack ground terminals to the same ground junction.

11) Connect and solder the remaining red and black battery snap leads to the lower two switch terminals. Solder a 2″ wire to each of the two center terminals of the switch. Taking note of the proper wire, solder the wire which makes contact through the switch with the red battery lead to IC socket pin 7 (V + ). Solder the other center switch terminal wire to IC socket pin 4 (V-).

12) Install the op amp. The recessed dot on top of the op amp indicates pin 1 , and should be away from the crystal mike element (approximately lined up with hole N-13).

13) Finally, making sure the DPDT switch is off (lever down), clip on the two 9 volt batteries to the battery snaps.

This completes amplifier board assembly.

Speech Recognition Program

Listing 1 provides the speech recognition Basic program. Line 1 is a REM statement which contains the machine code routines. Once entered, Line 1 must never be edited. Doing so may unintentionally alter the machine code routines. Enter the program as follows. Remember, SAVE frequently.

1) Type a REM statement containing 270 spaces. Although this statement does not have to be numbered Line 1, it must be the first statement in the program.

2) Enter Listing 2.

3) RUN this program. From Table 3 INPUT the appropriate decimal entry for the displayed address. For example, the entry for 16520 is 33, for 16566 is 200. After typing each entry, press ENTER. Continue until all entries have been made. To exit the program, input a non-numeric character (such as W). The program will abort, giving an error code.

4) Check your work by typing the following line (without a line number)

PRINT USR 16758

Press ENTER. If the result displayed is 64, skip step 5 and proceed with step 6.

5) If you did not get 64 in step 4, you must find the error in the machine code. Listing 3 gives a Basic routine which dumps forty sequential bytes of memory in decimal format, starting with the byte at address 16520. The data is displayed in two columns reading down without the addresses. The second column starts at an address which is 20 locations higher than the beginning of the first. The next 40 bytes can be examined by inputting any number and pressing ENTER. You must keep track of the number of screens which have been displayed (the first screen starts at 16520, the second at 16560, the third at 16600, etc.). When an error is found, determine the address of the error and abort the program by entering a non-numeric character. Then POKE the correct value into this location. Repeat step 4.

6) With the machine code implanted in the first REM statement, enter lines 5 through 1200 of Listing 1. This will overwrite Listing 2 which is not needed any more.

7) SAVE at least one copy on tape.

8) The program can now be RUN.

Program Operation

Before RUNning the program, connect the amplifier board to the ear input of the computer. Remove the plug from the ear jack of the tape recorder and place it into the amplifier board jack. For convenience, you may also want to disconnect the mic cable from the recorder. The amplifier power switch can now be turned on.

RUN the program. The screen should appear as shown below:

MENU

  1. VOICEPRINT DISPLAY
  2. VOICEPRINT FILE
  3. RECOGNITION
  4. CLEAR FILES
  5. DISPLAY STRING FILE
  6. STOP

INPUT SELECTION

Any selection can be made at any time by entering only the corresponding command number. However, we will discuss the commands in the numbered order.

Voiceprint Display

Option 1 provides a pseudo frequency spectrum of any vocalized word or sound. After selection, a machine code routine is entered which monitors the ear input. Since this routine is designed to wait indefinitely for an input signal (a sound), the time between command selection and the actual signal input is not critical.

The routine samples the input 255 times or until a pause (silence) of at least 0.75 seconds is detected. The acquired data is then manipulated to form the histogram.

The histogram consists of 255 individual frequency channels, although only 64 can be displayed at one time. The left and right arrow keys (5 and 8, respectively, without shifting) permit other channels to be observed by shifting the display. The histogram is plotted highest to lowest frequency going from left to right. The y axis is the number of occurrences of a particular frequency (or channel). In this manner, a voice print is created. Data similar to that displayed in the histogram makes the rest of the program work. Typical voice prints for the words “six” and “four” are illustrated in Figures 3 and 4. Major differences in the voice prints of the two words are readily apparent. Due to the limited amount of data which is acquired, the system is best suited to single syllable words.

After making sure the amplifier board is connected and turned on, bring the microphone approximately two to three inches away from your mouth. Type l for the command selection and press ENTER. The screen should go blank. Now say a word naturally, but firmly. The screen should immediately appear with a voice print histogram and the query “AGAIN? (Y/N)” If nothing appears, gently tap the microphone. An almost blank histogram should appear. If still no response, remove the plug from the amplifier board and insert it into the cassette player ear jack. Then play a previously recorded tape (program, music, voice, etc.) at maximum volume setting. If a histogram does not appear, the amplifier board has a problem. Recheck your work; look for solder bridges and “cold” solder joints.

In response to “AGAIN? (Y/N)” enter Y to input and display another voice print or N to return to the main menu. Try different words and sounds. Pure tones, such as a crisp whistle, produce sharp histograms. Noisy phonemes, e.g., the “f’ in four, produce a broader frequency spectrum.

After experimenting a while, you may have noticed the lack of data in the lower frequency channels. In fact, it is rare to find data below the 64th channel (where the 1st channel represents the highest frequency). This is due to the filtering characteristics of the computer’s ear input circuitry. Essentially, the circuit performs as a high pass filter which significantly attenuates low frequency signals (below approximately 300 Hz). It is not surprising then that the lower range of the histogram appears blank. We employ this knowledge further by using only the first 64 channels for voiceprint file creation and recognition. This significantly saves both processing time and memory.

Voiceprint File

Option 2 asks you to type in the word (up to 10 letters long) that you wish to have recognized. Then a file position is requested for storing the string. There are ten file positions, each one corresponding to a word (or sound). Any string and voice print file entry may be replaced at any time, with no effect on the others.

After you enter a file position (1 through 10), the program waits for you to situate the microphone comfortably, preferably so that you can see the screen. Pressing ENTER continues the program.

When the screen goes blank, pronounce the word which was input as a string. The screen displays a 1 and goes blank. Say the word again. The screen responds with a 2, and goes blank. Continue this repetition of the word for a total of eight times. Do not begin pronouncing the word until the screen goes blank. After the eighth entry, “AGAIN? (Y/N)” will appear. Entering Y permits you to create another string and voiceprint file entry, N returns you to the main menu.

The reason for repetition is to create an “average” voice print for a particular word (or sound). You are actually “teaching” the computer to recognize a word or phrase. This significantly enhances the recognition ability of the system, but it also makes the recognition dependent upon the speaker and, as I somewhat embarrassingly discovered, room acoustics. So do not attempt to demonstrate the system to your users group (which may meet in a large classroom) with a set of voice prints you made in your paneled and carpeted den. Rather, make a set of voice prints in the location where the demonstration is to be made.

Recognition

Immediately after you input 3 for the recognition command, the program waits for voice input. This is indicated by a blank screen. Upon sensing an input, voice prints are compared and the string corresponding to the “best” matching voice print is displayed. This display appears for a period of time determined by the PAUSE statement in line 625. After this PAUSE another voice (sound) input is awaited.

To exit the recognition routine, press any key (except BREAK). The program will return to the main menu.

Clear Files

Option 4 clears all entries in both the voice print and string files.

Display String File

Option 5 displays the string file. This provides assistance in locating particular file entries before replacement or reentry.

Stop

Option 6 is self explanatory.

General Comments

As to be expected, the more dissimilar sounding the words to be recognized, the more accurate the system is in selecting the correct word. In other words, homonyms are out. This is a problem for a language such as Basic where the commands “for” and “four” and “to” and “two” are frequently encountered. Context becomes important in these cases.

The DIM statement in line 10 serves no other purpose other than moving ELINE beyond the area where voice print files are created and stored. This permits a SAVE command to save existing voice print files on tape. To SAVE the voice print files, change line 10 to: 10 DIMC( 1412) then RUN the program. Voice print files will now be SAVEd with the program. After LOADing a program which contains voice prints, start with a GOTO 20 command. The RUN command will first clear the variable area A Few Last Words

After experimenting for a while, you may wonder how the program works. A complete assembly listing along with a detailed explanation of each machine code routine and a commercially reproduced tape of the speech recognition program are available from G. Russell Electronics.

I hope this article will stimulate you into making the simple amplifier and trying the program. A project like this can open your eyes as well as your computer’s ears.

Downloadable Media

Scroll to Top