New insights into the noise reduction Wiener filter

更新时间:2023-04-28 11:45:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

1218IEEE TRANSACTIONS ON AUDIO,SPEECH,AND LANGUAGE PROCESSING,VOL.14,NO.4,JULY2006

New Insights Into the Noise Reduction Wiener Filter Jingdong Chen,Member,IEEE,Jacob Benesty,Senior Member,IEEE,Yiteng(Arden)Huang,Member,IEEE,

and Simon Doclo,Member,IEEE

Abstract—The problem of noise reduction has attracted a considerable amount of research attention over the past several decades.Among the numerous techniques that were developed, the optimal Wiener?lter can be considered as one of the most fundamental noise reduction approaches,which has been de-lineated in different forms and adopted in various applications. Although it is not a secret that the Wiener?lter may cause some detrimental effects to the speech signal(appreciable or even sig-ni?cant degradation in quality or intelligibility),few efforts have been reported to show the inherent relationship between noise reduction and speech distortion.By de?ning a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated,this paper studies the quantitative performance behavior of the Wiener?lter in the context of noise reduction.We show that in the single-channel case the a posteriori signal-to-noise ratio(SNR)(de?ned after the Wiener?lter)is greater than or equal to the a priori SNR(de?ned before the Wiener?lter), indicating that the Wiener?lter is always able to achieve noise reduction.However,the amount of noise reduction is in general proportional to the amount of speech degradation.This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion.Fortunately, we show that speech distortion can be better managed in three different ways.If we have some a priori knowledge(such as the linear prediction coef?cients)of the clean speech signal,this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion.When no a priori knowledge is available,we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener?lter,resulting in a suboptimal Wiener?lter.In case that we have multiple microphone sensors,the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion.

Index Terms—Microphone arrays,noise reduction,speech dis-tortion,Wiener?lter.

I.I NTRODUCTION

S INCE we are living in a natural environment where noise is inevitable and ubiquitous,speech signals are gener-ally immersed in acoustic ambient noise and can seldom be recorded in pure form.Therefore,it is essential for speech processing and communication systems to apply effective noise

Manuscript received December20,2004;revised September2,2005.The associate editor coordinating the review of this manuscript and approving it for publication was Prof.Li Deng.

J.Chen and Y.Huang are with the Bell Labs,Lucent Technologies,Murray Hill,NJ07974USA(e-mail:jingdong@934be84d9b6648d7c1c746c1;arden@934be84d9b6648d7c1c746c1).

J.Benesty is with the Universitédu Québec,INRS-EMT,Montréal,QC,H5A 1K6,Canada(e-mail:benesty@emt.inrs.ca).

S.Doclo is with the Department of Electrical Engineering(ESAT-SCD), Katholieke Universiteit Leuven,Leuven3001,Belgium(e-mail:simon.doclo@ esat.kuleuven.be).

Digital Object Identi?er10.1109/TSA.2005.860851reduction/speech enhancement techniques in order to extract the desired speech signal from its corrupted observations. Noise reduction techniques have a broad range of applica-tions,from hearing aids to cellular phones,voice-controlled sys-tems,multiparty teleconferencing,and automatic speech recog-nition(ASR)systems.The choice between using and not using a noise reduction technique may have a signi?cant impact on the functioning of these systems.In multiparty conferencing, for example,the background noise picked up by the microphone at each point of the conference combines additively at the net-work bridge with the noise signals from all other points.The loudspeaker at each location of the conference therefore repro-duces the combined sum of the noise processes from all other locations.Clearly,this problem can be extremely serious if the number of conferees is large,and without noise reduction,com-munication is almost impossible in this context.

Noise reduction is a very challenging and complex problem due to several reasons.First of all,the nature and the character-istics of the noise signal change signi?cantly from application to application,and moreover vary in time.It is therefore very dif?cult—if not impossible—to develop a versatile algorithm that works in diversi?ed environments.Secondly,the objective of a noise reduction system is heavily dependent on the spe-ci?c context and application.In some scenarios,for example,we want to increase the intelligibility or improve the overall speech perception quality,while in other scenarios,we expect to ame-liorate the accuracy of an ASR system,or simply reduce the listeners’fatigue.It is very hard to satisfy all objectives at the same time.In addition,the complex characteristics of speech and the broad spectrum of constraints make the problem even more complicated.

Research on noise reduction/speech enhancement can be traced back to40years ago with2patents by Schroeder[1], [2]where an analog implementation of the spectral magnitude subtraction method was described.Since then it has become an area of active research.Over the past several decades, researchers and engineers have approached this challenging problem by exploiting different facets of the properties of the speech and noise signals.Some good reviews of such efforts can be found in[3]–[7].Principally,the solutions to the problem can be classi?ed from the following points of view.

?The number of channels available for enhancement;i.e., single-channel and multichannel techniques.

?How the noise is mixed to the speech;i.e.,additive noise, multiplicative noise,and convolutional noise.?Statistical relationship between the noise and speech;i.e., uncorrelated or even independent noise,and correlated noise(such as echo and reverberation).

?How the processing is carried out;i.e.,in the time domain or in the frequency domain.

1558-7916/$20.00?2006IEEE

CHEN et al.:NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER1219

In general,the more microphones are available,the easier the task of noise reduction.For example,when multiple realizations of the signal can be accessed,beamforming,source separation,or spatio-temporal?ltering techniques can be applied to extract the desired speech signal or to attenuate the unwanted noise[8]–[13]. If we have two microphones,where the?rst microphone picks up the noisy signal,and the second microphone is able to measure the noise?eld,we can use the second microphone signal as a noise reference and eliminate the noise in the?rst microphone by means of adaptive noise cancellation.However, in most situations,such as mobile communications,only one microphone is available.In this case,noise reduction techniques need to rely on assumptions about the speech and noise signals, or need to exploit aspects of speech perception,speech produc-tion,or a speech model.A common assumption is that the noise is additive and slowly varying,so that the noise characteristics estimated in the absence of speech can be used subsequently in the presence of speech.If in reality this premise does not hold, or only partially holds,the system will either have less noise reduction,or introduce more speech distortion.

Even with the limitations outlined above,single-channel noise reduction has attracted a tremendous amount of re-search attention because of its wide range of applications and relatively low cost.A variety of approaches have been developed,including Wiener?lter[3],[14]–[19],spectral or cepstral restoration[17],[20]–[27],signal subspace [28]–[35],parametric-model-based method[36]–[38],and statistical-model-based method[5],[39]–[46].

Most of these algorithms were developed independently of each other and generally their noise reduction performance was evaluated by assessing the improvement of signal-to-noise ratio (SNR),subjective speech quality,or ASR performance(when the ASR system is trained in clean conditions and additive noise is the only distortion source).Almost with no exception,these algorithms achieve noise reduction by introducing some distor-tion to the speech signal.Some algorithms,such as the subspace method,are even explicitly formulated based on the tradeoff be-tween noise reduction and speech distortion.However,so far, few efforts have been devoted to analyzing such a tradeoff be-havior even though it is a very important issue.In this paper,we attempt to provide an analysis about the compromise between noise reduction and speech distortion.On one hand,such a study may offer us some insight into the range of existing algorithms that can be employed in practical noisy environments.On the other hand,a good understanding may help us to?nd new algo-rithms that can work more effectively than the existing ones. Since there are so many algorithms in the literature,it is ex-tremely dif?cult—if not impossible—to?nd a universal ana-lytical tool that can be applied to any algorithm.In this paper, we choose the Wiener?lter as the basis since it is one of the most fundamental approaches,and many algorithms are closely connected to this technique.For example,the minimum-mean-square-error(MMSE)estimator presented in[21],which be-longs to the category of spectral restoration,converges to the Wiener?lter at a high SNR.In addition,it is widely known that the Kalman?lter is tightly related to the Wiener?lter. Starting from optimal Wiener?ltering theory,we introduce a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated.We then show that for the single-channel Wiener?lter,the amount of noise re-duction is in general proportional to the amount of speech degra-dation,implying that when the noise reduction is maximized, the speech distortion is maximized as well.

Depending on the nature of the application,some practical noise-reduction systems require very high-quality speech,but can tolerate a certain amount of residual noise,whereas other systems require the speech signal to be as clean as possible, but may allow some degree of speech distortion.Therefore,it is necessary that we have some management scheme to control the compromise between noise reduction and speech distortion in the context of Wiener?ltering.To this end,we discuss three approaches.The?rst approach leads to a suboptimal?lter where a parameter is introduced to control the tradeoff between speech distortion and noise reduction.The second approach leads to the well-known parametric-model-based noise reduction technique, where an AR model is exploited to achieve noise reduction, while maintaining a low level of speech distortion.The third approach pertains to a multichannel approach where spatio-tem-poral?ltering techniques are employed to obtain noise reduction with less or even no speech distortion.

II.E STIMATION OF THE C LEAN S PEECH S AMPLES

We consider a zero-mean clean speech

signal contami-nated by a zero-mean noise

process[white or colored but uncorrelated

with],so that the noisy speech signal at the discrete time

sample

is

(1) De?ne the error signal between the clean speech sample at

time and its

estimate

(2) where

superscript denotes transpose of a vector or a

matrix,

is an FIR?lter of

length,

and

is a vector containing

the most recent samples of the observa-tion

signal.

We now can write the mean-square error(MSE)

criterion

(3)

where denotes mathematical expectation.The optimal es-

timate of the clean speech

sample tends to contain less noise than the observation

sample,and the optimal ?lter that

forms is the Wiener?lter which is obtained as

follows:

(4) Consider the particular?

lter

1220
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006
This means that the observed signal will pass this ?lter unaltered (no noise reduction), thus the corresponding MSE is
where has the same size as minimum MSE (MMSE) is
and consists of all zeros. The (15)
(5) In principle, for the optimal ?lter , we should have (6) In other words, the Wiener ?lter will be able to reduce the level . of noise in the noisy speech signal From (4), we easily ?nd the Wiener–Hopf equation (7) where (8)
We see clearly from the previous expression that ; therefore, noise reduction is possible. The normalized MMSE is (16) and . III. ESTIMATION OF THE NOISE SAMPLES In this section, we will estimate the noise samples from the . De?ne the error signal between the noise observations sample at time and its estimate (17)
is the correlation matrix of the observed signal
and (9)
where
is the cross-correlation vector between the noisy and clean is unobservable; as a result, an speech signals. However, estimation of may seem dif?cult to obtain. But
is an FIR ?lter of length . The MSE criterion associated with (17) is (18) in the MMSE sense will tend to attenuate The estimation of the clean speech. The minimization of (18) leads to the Wiener–Hopf equation
(10) Now depends on the correlation vectors and . The vector (which is also the ?rst column of ) can be easily estican be estimated during speech and noise periods while mated during noise-only intervals assuming that the statistics of the noise do not change much with time. , we obtain the Using (10) and the fact that optimal ?lter
(19) We have (20) (21) The MSE for the particular ?lter tion) is (no clean speech reduc-
(11) where (12) is the signal-to-noise ratio, is the identity matrix, and
(22) Therefore, the MMSE and the normalized MMSE are, respectively, (23) (24) , the Wiener ?lter will be able to reduce Since . As a result, the level of the clean speech in the signal . In Section IV, we will see that while the normalized MMSE, , of the clean speech estimation plays a key role in noise , of the noise process reduction, the normalized MMSE, estimation plays a key role in speech distortion.
We have (13) (14)

CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER
1221
IV. IMPORTANT RELATIONSHIPS BETWEEN NOISE REDUCTION AND SPEECH DISTORTION Obviously, there are some important relationships between the estimation of the clean speech and noise samples. From (11) and (19), we get a relation between the two optimal ?lters (25) In fact, minimizing or with respect to is or equivalent. In the same manner, minimizing with respect to is the same thing. At the optimum, we have
is feasible with the Wiener ?lter, expression (33) shows that the price to pay for this is also a reduction of the clean speech [by and this implies distora quantity equal to tion], since . In other words, the power of the attenuated clean speech signal is, obviously, always smaller than the power of the clean speech itself; this means that parts of the clean speech are attenuated in the process and as a result, distortion is unavoidable with this approach. We now de?ne the speech-distortion index due to the optimal ?ltering operation as
(26) From (15) and (23), we see that the two MMSEs are equal (27) However, the normalized MMSE’s are not, in general. Indeed, we have a relation between the two (34) Clearly, this index is always between 0 and 1 for the optimal ?lter. Also (35) (36) So when is close to 1, the speech signal is highly disis near 0, the speech signal is lowly torted and when distorted. We deduce that for low SNRs, the Wiener ?lter can have a disastrous effect on the speech signal. Similarly, we de?ne the noise-reduction factor due to the Wiener ?lter as
(28) So the only situation where the two normalized MMSE’s are equal is when the SNR is equal to 1. For , and for , . Also, and . It can easily be veri?ed that (29) which implies that . We already know that and . The optimal estimation of the clean speech, in the Wiener sense, is in fact what we call noise reduction (30) or equivalently, if the noise is estimated ?rst (31) we can use this estimate to reduce the noise from the observed signal (32)
(37) and . The greater is tion we have. Also , the more noise reduc(38) (39) Using (34) and (37), we obtain important relations between the speech-distortion index and the noise-reduction factor (40)
The power of the estimated clean speech signal with the optimal Wiener ?lter is
(41) Therefore, for the optimum ?lter, when the SNR is very large, there is little speech distortion and little noise reduction (which is not really needed in this situation). On the other hand, when the SNR is very small, speech distortion is large as well as noise reduction.
(33) which is the sum of two terms. The ?rst one is the power of the attenuated clean speech and the second one is the power of the residual noise (always greater than zero). While noise reduction

1222
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006
V. PARTICULAR CASE: WHITE GAUSSIAN NOISE In this section, we assume that the additive noise is white, so that, (47) From (16) and (24), we observe that the two normalized MMSEs are (48) (49) and are the ?rst components of the vectors where and , respectively. Clearly, and . Hence, the normalized MMSE is completely governed by the ?rst element of the Wiener ?lter . Now, the speech-distortion index and the noise-reduction factor for the optimal ?lter can be simpli?edFig. 1. Illustration of the areas where  (h ) and  (g ) take their values as a function of the SNR.  (h ) can take any value above the solid line while  (g ) can take any value under the dotted line.
(50)
Another way to examine the noise-reduction performance is to inspect the SNR improvement. Let us de?ne the a posteriori SNR, after noise reduction with the Wiener ?lter as
(51) We also deduce from (50) that and We know from linear prediction theory that [47] .
(52) where is the forward linear predictor and is the corresponding error energy. Replacing the previous equation in (11), we obtain (42) (53) It can be shown that the a posteriori SNR and the a priori SNR satisfy (see Appendix), indicating that the Wiener ?lter is always able to improve the SNR of the noisy speech signal. , we can now give the lower Knowing that . As a matter of fact, it follows from (42) that bound for (43) Since shown that , and , it can be easily where (54) Equation (53) shows how the Wiener ?lter is related to the forward predictor of the observed signal . This expression also gives a hint on how to choose the length of the optimal ?lter : required to it should be equal to the length of the predictor . Equation have a good prediction of the observed signal (54) contains some very interesting information. Indeed, if the clean speech signal is completely predictable, this means that and . On the other hand, if is not predictable, we have and . This implies that the Wiener ?lter is more ef?cient to reduce the level of noise for predictable signals than for unpredictable ones. VI. BETTER WAYS TO MANAGE NOISE REDUCTION AND SPEECH DISTORTION (46) The closer is to 1, the more noise reduction we get. This index will be helpful to use in Sections V–VII. For a noise-reduction/speech-enhancement system, we always expect that it can achieve maximal noise reduction without much speech distortion. From the previous section, however, it follows that while noise reduction is maximized with the
(44) Similarly, we can derive the upper bound for , i.e., (45) Fig. 1 illustrates expressions (44) and (45). We now introduce another index for noise reduction

CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER
1223
optimal Wiener ?lter, speech distortion is also maximized. One may ask the legitimate question: are there better ways to control the tradeoff between the con?icting requirements of noise reduction and speech distortion? Examining (34), one can see that to control the speech distortion, we need to . This can be achieved in minimize different ways. For example, a speech signal can be modeled as an AR process. If the AR coef?cients are known a priori or can be estimated from the noisy speech, these coef?cients can be exploited to minimize , while simultaneously achieving a reasonable level of noise attenuation. This is often referred to as the parametric-model-based technique [36], [37]. We will not discuss the details of this technique here. Instead, in what follows we will discuss two other approaches to manage noise reduction and speech distortion in a better way. A. A Suboptimal Filter Consider the suboptimal ?lter (55) where is a real number. The MSE of the clean speech estimais tion corresponding to
In order to have less distortion with the suboptimal ?lter than with the Wiener ?lter , we must ?nd in such a way that (62) hence, the condition on should be (63) Finally, the suboptimal ?lter can reduce the level of noise of but with less distortion than the Wiener the observed signal if is taken such as ?lter (64) For the extreme cases and we obtain respectively , no noise reduction at all but no additional distortion , maximum noise reduction with maximum added, and speech distortion. Since
(65) it follows immediately that the speech-distortion index and the noise-reduction factor due to are (66) (67) , which is From (61), one can see that , a function of only. Unlike does not only depend on , but on the characteristics of both the speech and noise signal as well. However, using (56) and (15), we ?nd that (68)
(56) , ; we have equality for and, obviously, . In order to have noise reduction, must be chosen in , therefore such a way that (57) We can check that (58) Let (59) denote the estimation of the clean speech at time is to . The power of with respect
(60) The speech-distortion index corresponding to the ?lter is
Fig. 2 plots and , both as a , the suboptimal function of . We can see that when of the noise reduction with the Wiener ?lter, ?lter achieves while the speech distortion is only 49% of that of the Wiener ?lter. In real applications, we may want the system to achieve maximal noise reduction, while keeping the speech distortion as low as possible. If we de?ne a cost function to measure the compromise between the noise reduction and the speech distortion as
(69) It is trivial to see that the (61) (70) The previous expression shows that the ratio of the speechdistortion indices corresponding to the two ?lters and depends on only. In this case, the suboptimal ?lter achieves 75% of the noise reduction with the Wiener ?lter, while the speech-distortion is that maximizes is

本文来源:https://www.bwwdw.com/article/ov7q.html

Top