Browsing by Subject "Speech synthesis"

Now showing 1 - 6 of 6

A microcomputer-based system for speech recognition and synthesis
(Texas Tech University, 1980-08) Yang, Ming-Yuan
Not available
Effects of repeated listening experiences on the perception of synthetic speech by individuals with mental retardation
(Texas Tech University, 1999-04) Lees, Kathryn Carla
This study evaluated the effects of training on the comprehension of specific words and sentences by individuals with intellectual disabilities (n=18) and a matched control group (n=10). More specifically, the effects of training on novel versus repeated words and sentences produced by the ETI Eloquence speech synthesizer were studied over three training sessions. Stimulus materials included four word lists of 20 words each and four sentence lists of 20 sentences each. One of the word and sentence lists was identified as repeated and the remaining three were identified as novel. The synthetic speech used was the ETI Eloquence (1998) adult male voice Wade. All stimuli were recorded onto a digital compact disc and were presented via a Sony Discman. To assess subject responses, 80, 8.5x11 inch cards were developed. Each card contained a black and white line drawing of the target word or sentence presented and three foil pictures. The cards used a grid of two rows by two columns, with one black and white drawing in each grid. The subjects were asked to point to the drawing depicting the stimulus item. A pretest was administered to eliminate subjects who could not obtain perfect scores when experimental stimuli were presented via natural speech. There were a total of three experimental sessions, each separated by a period of at least 24 hours. During each session, subjects were presented with a list of novel and repeated words and sentences. The repeated word and sentence lists were presented in each of the three sessions. All experimental group subjects met the following criteria: (a) a diagnosis of mild to moderate mental retardation; (b) reliable pointing skills to serve as an expressive response modality; (c) no uncorrected visual impairment and adequate visual discrimination skills; (d) ability to identify all pictures on a picture identification task. Thresholds of 25dB or better were obtained by all but two subjects who demonstrated mild hearing loss at 4000 Hz. All control group subjects were required to meet the same criteria as the experimental group subjects except they were not intellectually disabled as determined by the TONI-2 (Brown, Sherbenou, & Johnsen, 1990). In the experimental group, a significant main effect for stimulus complexity [F (1,136) = 42.37, p <.01] was found. A significant main effect was also noted for listening trials [F (2,136)= 88.12, p <.01]. There was no significant effect for stimulus type (i.e., repeated versus novel) on word identification and sentence comprehension accuracy [F (i. 136) = .008, p >.01]. Similarly, in the control group, analysis revealed a significant main effect for stimulus complexity [F (,,72) = 13.03, p <.01] and for listening trials [F ,2.72)= 45.94, p < 01 ]. Additionally, there was a significant two way interaction between listening trials and complexity [F (1,72) = 3.54, p< 05]. Because there were no empirical studies on the intelligibility of the ETI-Eloquence synthesizer (1998) with non-disabled individuals, the present study was designed to gather preliminary data on this synthesizer. Intelligibility of the ETI-Eloquence synthesizer was almost identical to that of the high quality DECtalk synthesizer which is used widely in research and clinical applications. Control group participants had a mean word identification accuracy score of 92% and a mean sentence comprehension accuracy score of 82% on the first training session. These results were comparable to results obtained in a large number of previous studies in which the DECtalk synthesizer was used. In conclusion, both experimental and control data revealed that perception of synthetic speech were enhanced as a result of repeated listening experiences. Synthetic speech comprehension was significantly superior (p < .05) across groups in the word identification task as opposed to sentence comprehension task. Repeated stimuli were not significantly (p < .05) more intelligible than novel stimuli across groups, which indicated that both experimental and control group subjects were able to generalize their knowledge of the acoustic-phonetic properties of synthetic speech to novel stimuli.
Perceptual learning of synthetic speech by individuals with severe mental retardation
(Texas Tech University, 2002-05) Hester, Kasey Lynne
The purpose of this study was to evaluate the magnitude and type of practice effects in individuals with severe mental retardation as a result of systematic exposure to synthetic speech. This study compared the performance of a group of individuals with severe mental retardation (n=14) with a matched control group (n=14) on word identification accuracy and latency tasks. Specifically, the effects of training on novel versus repeated words produced by the DECtalk synthesizer were analyzed. Stimulus materials included 4 lists of 10 words each. These words were selected from a list of the first 50 words used by typically developing preschoolers (Nelson, 1973) and a dictionary of symbol vocabulary used by youth with severe mental retardation (Adain.son, Romski, Deffebach, & Sevcik, 1992). One list was designated as repeated and the remaining three as novel. Within each list, 20% of the words were repeated to judge intra-subject reliability. The synthetic speech used was DECtalk Betty (i.e., simulated adult female voice). A Microsoft Visual Basic program was developed to present the stimuli and the prompts, and to record responses. The experimental stimuli were presented using a laptop computer and external speakers that were placed approximately 12 inches in front of the subject. The experimental stimuli were presented at 75 dB SPL as determined by a sound level meter. Subjects' were instructed that they would hear a series of words and that their job was to touch the picture on the computer screen depicting the stimulus item. A touch screen mounted on the computer screen in conjunction with the Visual Basic program automatically recorded responses. The touch screen was calibrated to ignore "miss hits" (i.e., the subject slid his hand across the screen and activated a wrong selection) by using a timed activation direct selection strategy. The computer screen displayed one target picture, a visual representation of the synthetic word, and three unrelated foils. The position of the pictures within each experiment was randomized to avoid position effects; the order of presentation of the lists was randomized to avoid order effects; and a constant inter-stimulus interval of 10 seconds was maintained during presentation of the words within each list. All subjects had to pass a pretest in order to participate in this study. This pretest was designed to exclude subjects who were unable to obtain 100% correct scores for experimental stimuli, presented via live natural speech. In the absence of perfect scores on the pretest, it would be difficult to determine whether the performance demonstrated by individuals with mental retardation was due to the difficulty in processing synthetic speech or due to lack of conceptual knowledge of the stimulus items. The pre-experimental procedures were conducted at least one week prior to the beginning of the experiment. There were a total of 3 experimental sessions, each separated by a period of at least 24 hours. During each session, subjects were presented with a list of novel words, and a list of repeated words. The same repeated word list was presented across all sessions while a new novel word list was presented in each session. Subjects were instructed that they would hear a series of words preceded by a carrier phrase and that they were to point to the drawing depicting the word. Additionally, they were told to make their best guess if they were uncertain. Immediately prior to each experimental session, practice items were run to ensure that subjects were familiar with the task. The practice items were different from those used in the experimental task. Data were analyzed using a repeated measure design. The two dependent measures were (1) word identification accuracy and (2) word identification latency. Data for word identification accuracy and latency were analyzed using a repeated measures (2X2X3) ANOVA in which group served as a between factor variable while type of task, type of stimuli, and listening sessions served as within subject variables. Analysis revealed a significant main effect for group [F (1, 52) = 7.523, p < .05] on the word identification accuracy task indicating that individuals with severe mental retardation had significantly lower word identification accuracy scores (mean = 80.95) than the control group (mean = 91.19). A non-significant trend toward improved word identification accuracy across sessions [F (2, 104) = 2.635, p > .0765] was noted. The most interesting finding of this study was the lack of a significant effect [F (1, 52) = 0.199, p > .05] for stimulus type (i.e., repeated vs. novel) across groups on the word identification accuracy task. The presence of a significant interaction between word identification latency and group [F (2, 104) = 8.53, p< .01] indicated that individuals with mental retardation were processing synthetic speech more quickly as a result of repeated exposure. In summary, current results indicated that perception of synthetic speech in individuals with mental retardation was enhanced (i.e., significant decrease in latency) as a result of systematic exposure to synthetic speech. Also, the absence of a significant effect for stimulus type indicated that individuals with mental retardation generalized their knowledge of the acoustic-phonetic properties of synthetic speech to novel stimuli. These results were significant because they indicated that individuals with mental retardation became more skilled at recognizing synthetic speech whh repeated exposure. This was an important finding in the context of increased use of VOCAs by individuals with significant communicative and cognitive impairments.
Speech data compression
(Texas Tech University, 1996-08) Ho, Chien-Te
The analysis-by-synthesis method is the most useful application for the parametric representation. The necessary components for the model are derived from signal analysis procedures while the output speech waveform is obtained from the synthetic procedure. This method, such as the Codebook Exited Coder (CELP) [1], is first implemented in the time domain. The basic approach is to model the correlation among the speech samples by using a linear time-varying filter. An excitation model can then be obtained by removing the correlation. Since the filter will not ignore the noise, the parametric representation does have problems with the noisy speech data. An alternative procedure is to implement the technique in the frequency domain. This leads to a flexible method for lower bit rate procedure transmission. Furthermore, it provides a suitable way to model the filter in a noisy environment. Methods such as the harmonic vocoder and Multiband Excitation Coder (MBE) [4] are all frequency domain techniques. Since the speech data is recovered from the parametric model, the output depends on the model parameters, which may greatly effect the quality of the speech. The objective of this thesis is to develop efficient algorithms for implementing the harmonic vocoder in the frequency domain. A reUable method is developed to realize the analysis procedure and to achieve the correct fundamental elements of speech signal. An efficient method is proposed to synthesize output speech signal and to improve speech quality. Also, the techniques of model refinement and enhancement will be described in this thesis. In practice, the analogue speech signal is sampled at 8000Hz and this rate is used throughout this research. The research is concentrated on the method for speech data compression and speech quality improvement rather than coding schemes.
Speech recognition system
(Texas Tech University, 1996-08) Mehta, Milan G.
Automatic Speech Recognition (ARS) has progressed considerably over the past several decades, but still has not achieved the potential imagined at its very beginning. Almost all of the existing applications of ASR systems are PC based. This thesis is an attempt to develop a speech recognition system that is independent of any PC support and is small enough in size to be used in a daily use consumer appliance. This system would recognize isolated utterances from a limited vocabulary, provide speaker independence, require less memory and be cost-efficient compared to present ASR systems. In this system, speech recognition is performed with the help of algorithms such as Vector Quantization and Zero Crossing. Several features of a Digital Signal Processor (DSP) have been utilized to generate and execute the algorithms for recognition. The final system has been implemented on Texas Instmments TMS320C30 DSP. The system, when implemented using the Vector Quantizer approach, achieved an accuracy of 94% for a vocabulary of 6 words and a recognition time of 6 seconds. The zero crossing approach resulted in an accuracy of 89% for the same vocabulary while the recognition time was 0.8 seconds.
Word identification and sentence comprehension of synthetic speech by individuals with mental retardation
(Texas Tech University, 1995-12) Hanners, Jennifer
The purpose of this study was to examine the performance of two text-to-speech systems (DECtalk and Real Voice) by individuals with mental retardation and matched controls. Each subject participated in two experimental sessions designed to measure word recognition, sentence verification accuracy, and sentence response latency. A pretest was administered to exclude the subjects who were unable to recognize the words or sentences when presented via natural speech. A total of 40 words was selected for evaluation of word recognition from a list of words provided by parents of nonspeaking children. Twenty three-word sentences were constructed to measure sentence verification accuracy and latency. The results indicated that both individuals with mental retardation and nondisabled individuals performed significantly better on DECtalk synthetic speech than on Real Voice. Additionally, performance of individuals with mental retardation was significantly poorer than that of nondisabled individuals on the sentence verification task. Across groups, subjects performed significantly better on the word identification task than on the sentence verification task. A non-significant trend towards greater response latencies was observed for individuals with mental retardation. In summary, the results of this study indicate that individuals with mental retardation have significant difficulty in identifying and comprehending synthetic speech. The results of this investigation raise several issues related to comprehension of synthetic speech by nonspeaking individuals who rely on voice output communication aids to achieve effective and efficient communication.

Browsing by Subject "Speech synthesis"

Results Per Page

Sort Options