Speech Analysis and Single Channel Enhancement to Improve Speech Intelligibility for Cochlear Implant Recipients

Date

2017-05

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Cochlear implant (CI) devices are able to help deaf individuals recover hearing ability by surgically inserting electrode arrays into the inner ear, to stimulate the auditory nerve and transmit the sound to the auditory cortex in the brain. CI listeners achieve high speech intelligibility in quiet environments, however their speech perception degrades dramatically when presented in noisy backgrounds. This is especially true in fluctuating noise, such as competing-speaker or babble noise, where CI users have difficulties in speech understanding. One of the reasons is that low spectral resolution provided by CI encoding strategies is insufficient to distinguish speech components from noise. In this dissertation, we propose a new speech enhancement solution to improve speech intelligibility for CI recipients in noise. Speech energy is primarily carried in the harmonic structure located at multiple integer harmonics of the fundamental frequency. In order to combat noise, we propose to use harmonic structure as the frequency domain cues to estimate the degraded noise. The proposed speech enhancement is based on harmonic structure estimation combined with a traditional statistical based leveraged solution. This dissertation has investigated robust fundamental frequency estimation in noise, along with integrating such novel in formulate to improve harmonic based speech enhancement in both stationary and non-stationary noise scenarios.

Noise-robust pitch estimation is proposed based on temporal harmonic structure in local time-frequency (TF) segments. To reduce the noise affect, we take advantage of the sparsity of speech signal that only the high signal to noise ratio (SNR) TF segments are used for pitch estimation. Robust harmonic features are investigated for neural network classification based pitch estimation. The harmonic features map the pitch candidates into a more separable space for classification. Experimental results show that the proposed pitch estimation method improves global pitch error in noise. Next, harmonic structure estimation is combined with the traditional statistical based method to speech enhancement. Noise estimation is performed based on harmonic structures. The estimated noise variance is employed in a traditional MMSE framework for a prior and posterior SNR estimation to obtain a gain function for the target speech. Listening experiments with CI subjects demonstrated improved speech intelligibility for both stationary and non-stationary noise.

Description

Citation