Browsing by Subject "Speech perception"

Now showing 1 - 16 of 16

Audiovisual integration for perception of speech produced by nonnative speakers
(2013-08) Yi, Han-Gyol; Chandrasekaran, Bharath; Smiljanic, Rajka, 1967-
Speech often occurs in challenging listening environments, such as masking noise. Visual cues have been found to enhance speech intelligibility in noise. Although the facilitatory role of audiovisual integration for perception of speech has been established in native speech, it is relatively unclear whether it also holds true for speech produced by nonnative speakers. Native listeners were presented with English sentences produced by native English and native Korean speakers. The sentences were in either audio-only or audiovisual conditions. Korean speakers were rated as more accented in audiovisual than in the audio-only condition. Visual cues enhanced speech intelligibility in noise for native English speech but less so for nonnative speech. Reduced intelligibility of audiovisual nonnative speech was associated with implicit Asian-Foreign association, suggesting that listener-related factors partially influence the efficiency of audiovisual integration for perception of speech produced by nonnative speakers.
Cross-language speech perception in context : advantages for recent language learners and variation across language-specific acoustic cues
(2016-05) Blanco, Cynthia Patricia; Smiljanic, Rajka, 1967-; Bannard, Colin; Meier, Richard P; Quinto-Pozos, David; Echols, Catharine H; Chandrasekaran, Bharath
This dissertation explores the relationship between language experience and sensitivity to language-specific segmental cues by comparing cross-language speech perception in monolingual English listeners and Spanish-English bilinguals. The three studies in this project use a novel language categorization task to test language-segment associations in listeners’ first and second languages. Listener sensitivity is compared at two stages of development and across a variety of language backgrounds. These studies provide a more complete analysis of listeners’ language-specific phonological categories than offered in previous work by using word-length stimuli to evaluate segments in phonological contexts and by testing speech perception in listeners’ first language as well as their second language. The inclusion of bilingual children also allows connections to be drawn between previous work on infants’ perception of segments and the sensitivities of bilingual adults. In three experiments, participants categorized nonce words containing different classes of English- and Spanish-specific sounds as sounding more English-like or Spanish-like; target segments were either a phonemic cue, a cue for which there is no analogous sound in the other language, or a phonetic cue, a cue for which English and Spanish share the category but for which each language varies in its phonetic implementation. The results reveal a largely consistent categorization pattern across target segments. Listeners from all groups succeeded and struggled with the same subsets of language-specific segments. The same pattern of results held in a task where more time was given to make categorization decisions. Interestingly, for some segments the late bilinguals were significantly more accurate than monolingual and early bilingual listeners, and this was the case for the English phonemic cues. There were few differences in the sensitivity of monolinguals and early bilinguals to language-specific cues, suggesting that the early bilinguals’ exposure to Spanish did not fundamentally change their representations of English phonology, but neither did their proficiency in Spanish give them an advantage over monolinguals. The comparison of adult listeners with children indicates that the Spanish-speaking children who grow to be early bilingual adults categorize segments more accurately than monolinguals – a pattern that is neutralized in the adult results. These findings suggest that variation in listener sensitivity to language-specific cues is largely driven by inherent differences in the salience of the segments themselves. Listener language experience modulates the salience of some of these sounds, and these differences in cross-language speech perception may reflect how recently a language was learned and under what circumstances.
Effects of repeated listening experiences on the perception of synthetic speech by individuals with mental retardation
(Texas Tech University, 1999-04) Lees, Kathryn Carla
This study evaluated the effects of training on the comprehension of specific words and sentences by individuals with intellectual disabilities (n=18) and a matched control group (n=10). More specifically, the effects of training on novel versus repeated words and sentences produced by the ETI Eloquence speech synthesizer were studied over three training sessions. Stimulus materials included four word lists of 20 words each and four sentence lists of 20 sentences each. One of the word and sentence lists was identified as repeated and the remaining three were identified as novel. The synthetic speech used was the ETI Eloquence (1998) adult male voice Wade. All stimuli were recorded onto a digital compact disc and were presented via a Sony Discman. To assess subject responses, 80, 8.5x11 inch cards were developed. Each card contained a black and white line drawing of the target word or sentence presented and three foil pictures. The cards used a grid of two rows by two columns, with one black and white drawing in each grid. The subjects were asked to point to the drawing depicting the stimulus item. A pretest was administered to eliminate subjects who could not obtain perfect scores when experimental stimuli were presented via natural speech. There were a total of three experimental sessions, each separated by a period of at least 24 hours. During each session, subjects were presented with a list of novel and repeated words and sentences. The repeated word and sentence lists were presented in each of the three sessions. All experimental group subjects met the following criteria: (a) a diagnosis of mild to moderate mental retardation; (b) reliable pointing skills to serve as an expressive response modality; (c) no uncorrected visual impairment and adequate visual discrimination skills; (d) ability to identify all pictures on a picture identification task. Thresholds of 25dB or better were obtained by all but two subjects who demonstrated mild hearing loss at 4000 Hz. All control group subjects were required to meet the same criteria as the experimental group subjects except they were not intellectually disabled as determined by the TONI-2 (Brown, Sherbenou, & Johnsen, 1990). In the experimental group, a significant main effect for stimulus complexity [F (1,136) = 42.37, p <.01] was found. A significant main effect was also noted for listening trials [F (2,136)= 88.12, p <.01]. There was no significant effect for stimulus type (i.e., repeated versus novel) on word identification and sentence comprehension accuracy [F (i. 136) = .008, p >.01]. Similarly, in the control group, analysis revealed a significant main effect for stimulus complexity [F (,,72) = 13.03, p <.01] and for listening trials [F ,2.72)= 45.94, p < 01 ]. Additionally, there was a significant two way interaction between listening trials and complexity [F (1,72) = 3.54, p< 05]. Because there were no empirical studies on the intelligibility of the ETI-Eloquence synthesizer (1998) with non-disabled individuals, the present study was designed to gather preliminary data on this synthesizer. Intelligibility of the ETI-Eloquence synthesizer was almost identical to that of the high quality DECtalk synthesizer which is used widely in research and clinical applications. Control group participants had a mean word identification accuracy score of 92% and a mean sentence comprehension accuracy score of 82% on the first training session. These results were comparable to results obtained in a large number of previous studies in which the DECtalk synthesizer was used. In conclusion, both experimental and control data revealed that perception of synthetic speech were enhanced as a result of repeated listening experiences. Synthetic speech comprehension was significantly superior (p < .05) across groups in the word identification task as opposed to sentence comprehension task. Repeated stimuli were not significantly (p < .05) more intelligible than novel stimuli across groups, which indicated that both experimental and control group subjects were able to generalize their knowledge of the acoustic-phonetic properties of synthetic speech to novel stimuli.
Effects of Short-Term Dehydration and Rehydration on Acoustic Measures of Voice
(Texas Tech University, 1998-03) Dane, Rebecca L
The purpose of this study was to determine whether short-term dehydration (i.e., no fluid or food intake for a 10-hour period) and subsequent rehydration (i.e., 4-ounce fluid intake every 17 minutes over a period of 2 hours and 15 minutes) resulted in significant changes in acoustic measures of voice. Employing a within-subject, quasi-time series design, a total of 25 healthy subjects (3 males and 22 females) between the ages of 20 and 30 years participated in the study. Baseline data were established through speech samples obtained on four successive evenings and mornings. Speech samples consisted of phonating the randomly ordered vowels /a/, I'll, IvJ, and lol within the carrier phrase "Say /hAb_b/ again." During experimental procedures, speech samples were obtained at 17-minute intervals following intake of 4 ounces of water. It was hypothesized that the morning pretest samples would exhibit decreased fundamental frequency and greater values for jitter, shimmer, and harmonic-to-noise ratio across vowels due to decreased hydration levels. During post-tests, it was hypothesized that fundamental frequency values would increase and jitter, shimmer, and harmonic-to-noise ratio measures would decrease for all vowels over time as a function of increased hydration. Results indicated a significant (p < 0.01) main effect for vowel type across all acoustic measures; however, a significant (p < 0.01) main effect for time was only noted for fundamental frequency, jitter, and shimmer. Descriptive statistics revealed trends which supported the hypotheses for all vowels and acoustic measures with the exception of harmonic-to-noise ratio. Results of this study contribute to normative data and have implications for voice therapy and care of the professional voice.
Environment- and listener-oriented speaking style adaptations across the lifespan
(2014-08) Gilbert, Rachael Celia; Smiljanic, Rajka, 1967-
This dissertation examines how age affects the ability to produce intelligibility- enhancing speaking style adaptations in response to environment-related difficulties (noise-adapted speech) and in response to listeners’ perceptual difficulties (clear speech). Materials consisted of conversational and clear speech sentences produced in quiet and in response to noise by children (11-13 years), young adults (18-29 years), and older adults (60-84 years). Acoustic measures of global, segmental, and voice characteristics were obtained. Young adult listeners participated in word-recognition-in-noise and perceived age tasks. The study also examined relative talker intelligibility as well as the relationship between the acoustic measurements and intelligibility results. Several age-related differences in speaking style adaptation strategies were found. Children increased mean F0 and F1 more than adults in response to noise, and exhibited greater changes to voice quality when producing clear speech (increased HNR, decreased shimmer). Older adults lengthened pause duration more in clear speech compared to younger talkers. Word recognition in noise results revealed no age-related differences in the intelligibility of conversational speech. Noise-adapted and clear speech modifications increased intelligibility for all talker groups. However, the acoustic changes implemented by children when producing noise-adapted and clear speech were less efficient in enhancing intelligibility compared to the young adult talkers. Children were also less intelligible than older adults for speech produced in quiet. Results confirmed that the talkers formed 3 perceptually-distinct age groups. Correlation analyses revealed that relative talker intelligibility was consistent for conversational and clear speech in quiet. However, relative talker intelligibility was found to be more variable with the inclusion of additional speaking style adaptations. 1-3 kHz energy, speaking rate, vowel and pause durations all emerged as significant acoustic-phonetic predictors of intelligibility. This is the first study to investigate how clear speech and noise-adapted speech benefits interact with each other across multiple talker groups. The findings enhance our understanding of intelligibility variation across the lifespan and have implications for a number of applied realms, from audiologic rehabilitation to speech synthesis.
How auditory discontinuities and linguistic experience affect the perception of speech and non-speech in English- and Spanish-speaking listeners
(2005) Hay, Jessica Sari Fleming; Diehl, Randy L.
Speech perception results from a complex interplay between the operating characteristics of the auditory system (i.e., auditory discontinuities) and linguistic experience. Research in human infants and animals, and research using tone-onset-time (TOT) stimuli, a type of non-speech analogue of voice-onset-time (VOT) stimuli, has suggested that there is an underlying auditory basis for the perception of stop consonants based on a threshold for detecting temporal onset asynchronies in the vicinity of + 20 ms. Languages, however, differ in their reliance on temporal onset asynchrony-based auditory discontinuities in their [voice] categories. This dissertation sought to examine whether long-term linguistic experience with different [voice] categories (i.e., English or Spanish) affects the perception of non-speech stimuli that are analogous in their acoustic timing characteristics. This research was also designed to investigate the joint effects of linguistic experience and auditory mechanisms on phoneme structure and category learning. Three cross-linguistic studies were designed to look at (1) the production and perception of VOT and the perception of TOT, (2) the effects of stimulus range on the perception of VOT, and (3) the effects of auditory discontinuities on non-speech category learnability. Results indicate that linguistic experience does affect the perception of nonspeech stimuli, at least in certain circumstances. Thus, there is some commonality in the processes used to discriminate between non-speech sounds and those used to discriminate between speech sounds. Additionally, auditory discontinuities were found to influence both phoneme structure and category learning. It is suggested that English- and Spanishspeaking listeners use different cues to discriminate their [voice] categories. Results also suggest that there are perceptual asymmetries between the positive and the negative onset asynchrony-based auditory discontinuities. The relationships between auditory discontinuities, linguistic experience, discriminability, phoneme category structure, and learnability are discussed.
Informational masking of multi talker babble in English vowel identification for Spanish-English bilinguals
(2016-05) Estrella, Alexandra; Liu, C. (Chang), Ph.D.; Chandrasekaran, Bharath
Speech perception studies with bilinguals have demonstrated that bilinguals perform comparably to native speakers in listening conditions during quiet conditions. However, when the listening conditions included different types of noise, and different SNRs, bilinguals are seen to have difficulties and perform lower than native speakers when tested in their L2. With Spanish-English bilinguals becoming a large part of the U.S. population, the present study investigated their speech perception abilities using English vowels in different quiet and noise conditions. The participants were controlled for their age of acquisition of English in order to determine if the amount of exposure to the language affected their overall performance. In addition, the amount of informational masking was evaluated using comparisons with the babble and temporally modulated noise conditions. Results indicated that the later bilinguals experienced more difficulties throughout the different conditions when compared to the simultaneous and early bilinguals, but significance levels were only noted for a few of the conditions. Additionally, there were no major effects for informational masking.
Measuring phonetic convergence : segmental and suprasegmental speech adaptations during native and non-native talker interactions
(2013-12) Rao, Gayatree Nandan; Diehl, Randy L.; Smiljanic, Rajka, 1967-
Phonetic convergence (PC) is speech specific accommodation characterized by an increase in similarity in a dyad’s speech patterns due to an interaction. Previous research has demonstrated that PC occurs in dyads during various interactive tasks (e.g. map completion and picture matching) and in cross-linguistic conditions (e.g. dyads who speak the same or different native language) (Pardo, 2006; Kim et al., 2011). Studies suggest that speakers who are closer in linguistic distance (i.e. share the same native language) are more likely to converge than speakers who are far apart (i.e. speak different native languages) (Kim et al, 2011). However, Interdialectal conditions where speakers use different national dialects of the same language have been studied to a far lesser extent (Babel, 2010). Similarly, studies have examined both segmental and suprasegmental features that are susceptible to PC but rhythm has not been studied extensively (Krivokapic, 2013; Rao et al., 2011). Though initial studies postulated that PC is the result of either automatic or social processes, more current research suggests that a combination of both kinds of processes may be better able to account for PC (Goldinger, 1997; Shepard et al., 2001; Babel, 2009a). The current dissertation uses novel measures such as Interlocutor Similarity and EMS + centroid to implicate global properties of vowels and rhythm respectively as acoustic correlates of PC. Moreover, it finds that speakers showed both convergence and divergence in vowels and rhythm as moderated by their language background. Close interactions between native speakers of American English (AE) resulted in convergence whereas interdialectal interactions (between AE and Indian English speakers) and mixed language interactions (between native and non-native speakers of AE who are native speakers of SP) resulted in both convergence and divergence. The results from this study may shed light on how speakers attenuate the highly variable nature of speech by adapting speech patterns to aid intelligibility and information sharing (Shepard et al., 2001) and that this attenuation is moderated by social demands such as identity and cultural distinctiveness.
Perception of vowel quality in the F2/F3 plane
(2002) Molis, Michelle Renee; Diehl, Randy L.
Perceptual learning of synthetic speech by individuals with severe mental retardation
(Texas Tech University, 2002-05) Hester, Kasey Lynne
The purpose of this study was to evaluate the magnitude and type of practice effects in individuals with severe mental retardation as a result of systematic exposure to synthetic speech. This study compared the performance of a group of individuals with severe mental retardation (n=14) with a matched control group (n=14) on word identification accuracy and latency tasks. Specifically, the effects of training on novel versus repeated words produced by the DECtalk synthesizer were analyzed. Stimulus materials included 4 lists of 10 words each. These words were selected from a list of the first 50 words used by typically developing preschoolers (Nelson, 1973) and a dictionary of symbol vocabulary used by youth with severe mental retardation (Adain.son, Romski, Deffebach, & Sevcik, 1992). One list was designated as repeated and the remaining three as novel. Within each list, 20% of the words were repeated to judge intra-subject reliability. The synthetic speech used was DECtalk Betty (i.e., simulated adult female voice). A Microsoft Visual Basic program was developed to present the stimuli and the prompts, and to record responses. The experimental stimuli were presented using a laptop computer and external speakers that were placed approximately 12 inches in front of the subject. The experimental stimuli were presented at 75 dB SPL as determined by a sound level meter. Subjects' were instructed that they would hear a series of words and that their job was to touch the picture on the computer screen depicting the stimulus item. A touch screen mounted on the computer screen in conjunction with the Visual Basic program automatically recorded responses. The touch screen was calibrated to ignore "miss hits" (i.e., the subject slid his hand across the screen and activated a wrong selection) by using a timed activation direct selection strategy. The computer screen displayed one target picture, a visual representation of the synthetic word, and three unrelated foils. The position of the pictures within each experiment was randomized to avoid position effects; the order of presentation of the lists was randomized to avoid order effects; and a constant inter-stimulus interval of 10 seconds was maintained during presentation of the words within each list. All subjects had to pass a pretest in order to participate in this study. This pretest was designed to exclude subjects who were unable to obtain 100% correct scores for experimental stimuli, presented via live natural speech. In the absence of perfect scores on the pretest, it would be difficult to determine whether the performance demonstrated by individuals with mental retardation was due to the difficulty in processing synthetic speech or due to lack of conceptual knowledge of the stimulus items. The pre-experimental procedures were conducted at least one week prior to the beginning of the experiment. There were a total of 3 experimental sessions, each separated by a period of at least 24 hours. During each session, subjects were presented with a list of novel words, and a list of repeated words. The same repeated word list was presented across all sessions while a new novel word list was presented in each session. Subjects were instructed that they would hear a series of words preceded by a carrier phrase and that they were to point to the drawing depicting the word. Additionally, they were told to make their best guess if they were uncertain. Immediately prior to each experimental session, practice items were run to ensure that subjects were familiar with the task. The practice items were different from those used in the experimental task. Data were analyzed using a repeated measure design. The two dependent measures were (1) word identification accuracy and (2) word identification latency. Data for word identification accuracy and latency were analyzed using a repeated measures (2X2X3) ANOVA in which group served as a between factor variable while type of task, type of stimuli, and listening sessions served as within subject variables. Analysis revealed a significant main effect for group [F (1, 52) = 7.523, p < .05] on the word identification accuracy task indicating that individuals with severe mental retardation had significantly lower word identification accuracy scores (mean = 80.95) than the control group (mean = 91.19). A non-significant trend toward improved word identification accuracy across sessions [F (2, 104) = 2.635, p > .0765] was noted. The most interesting finding of this study was the lack of a significant effect [F (1, 52) = 0.199, p > .05] for stimulus type (i.e., repeated vs. novel) across groups on the word identification accuracy task. The presence of a significant interaction between word identification latency and group [F (2, 104) = 8.53, p< .01] indicated that individuals with mental retardation were processing synthetic speech more quickly as a result of repeated exposure. In summary, current results indicated that perception of synthetic speech in individuals with mental retardation was enhanced (i.e., significant decrease in latency) as a result of systematic exposure to synthetic speech. Also, the absence of a significant effect for stimulus type indicated that individuals with mental retardation generalized their knowledge of the acoustic-phonetic properties of synthetic speech to novel stimuli. These results were significant because they indicated that individuals with mental retardation became more skilled at recognizing synthetic speech whh repeated exposure. This was an important finding in the context of increased use of VOCAs by individuals with significant communicative and cognitive impairments.
Phonetic training for learners of Arabic
(2013-08) Burnham, Kevin Robert; Al-Batal, Mahmoud
This dissertation assesses a new technique intended to improve Arabic learning outcomes by enhancing the ability of learners to perceive a phoneme contrast in Arabic that is notoriously difficult for native speakers of English. Adopting a process approach to foreign language listening comprehension pedagogy, we identify and isolate an important listening subskill, phonemic identification, and develop a methodology for improving that skill. An online training system is implemented that is based upon known principles of speech perception and second language speech learning and has previously been used to improve phonemic perception in a laboratory setting. An empirical study investigating the efficacy of the training methodology was conducted with 24 2nd and 3rd year students of Arabic in several different intensive Arabic programs in American universities. The contrast under investigation was the Arabic pharyngeal (/h̄/) versus laryngeal (/h/) voiceless fricatives. Training participants completed 100 training modules, each consisting of a 24 item minimal pair test featuring the /h̄/-/h/ contrast in word initial position for a total of 2400 training trials over 4 weeks. The training website design was based on the high variability training protocol (Logan, Lively & Pisoni, 1991). The experiment finds significantly greater improvement (F₁,₂₂=8.89, p = .007, [mathematical symbol]₂ = .288) on a minimal pair test contrasting /h̄/ and /h/ for a group that received approximately 5 hours of phonetic training (n=10) compared to a control group (n=14) with no training. Critically, these perceptual improvements were measured with stimuli that were not part of the training set, suggesting language learning and not just stimulus learning. Qualitative data from participants suggested that these perceptual gains were not restricted to the simple minimal pair task, but carried over to listening activities and perhaps even pronunciation. The dissertation concludes with a discussion of phonemic perception and foreign language instruction and implementation of phonetic training within an Arabic curriculum.
Production and Perception of Affective Prosody by Adults with Autism Spectrum Disorder
(2016-12)
Recognition memory in noise for speech of varying intelligibility
(2013-05) Gilbert, Rachael Celia; Smiljanic, Rajka, 1967-
This study investigated the extent to which noise impacts speech processing of sentences that vary in intelligibility for normal-hearing young adults. Intelligibility and recognition memory in noise were examined for conversational and clear speech sentences recorded in quiet (QS) and in response to the environmental noise, i.e. noise adapted speech (NAS). Results showed that 1) increased intelligibility through conversational-to-clear speech modifications lead to improved recognition memory and 2) NAS presented a more naturalistic speech adaptation to noise compared to QS, leading to more accurate word recognition and better sentence recall. These results demonstrate that acoustic-phonetic modifications implemented in listener-oriented speech enhance speech processing beyond word recognition. The results are in line with the effortfulness hypothesis (McCoy et al., 2005), which states that speech perception in challenging listening environments requires additional processing resources that might otherwise be available for encoding speech in memory. This resource reallocation may be offset by speaking style adaptations on the part of the talker. In addition to enhanced intelligibility, a substantial improvement in recognition memory can be achieved through speaker adaptations to the environment and to the listener when in adverse conditions.
Speech recognition in individuals with dysarthria
(Texas Tech University, 2000-05) Acrey, Adrienne M.
The purpose of this study was to compare the effects of speech training on the recognition accuracy of a speech recognition system (i.e. DragonDictate) for three speakers with moderate dysarthria and three typical speakers. A pretest was administered to measure speech intelligibility and mental state. Each subject participated in training sessions with the computer that involved the repetition of 70 stimulus items. Stimulus items were selected from a word list which contained acoustic-phonetic contrasts. The results indicated superior recognition accuracy scores for typical speakers in contrast to speakers with dysarthria. Additionally, speakers with dysarthria required more sessions to achieve ceiling on recognition scores in comparison to the typical speakers. In summary, the speakers with dysarthria were able to obtain high recognition accuracy scores after training the system.
The relationship between hypernasality and vowel formants for English and Spanish speaking subjects
(Texas Tech University, 1977-08) Bell, Raylene
Not available
The relationship of speech discrimination to loudness discomfort level
(Texas Tech University, 1981-12) Wei, Hwe-wen
Not available

Browsing by Subject "Speech perception"

Results Per Page

Sort Options