This is page 1 Printer Opaque this Empirical Bayesian Spatial Prediction Using Wavelets

更新时间:2023-03-18 16:49:01 阅读量: 工程科技 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

This is page 1 Printer: Opaque this

Empirical Bayesian Spatial Prediction Using WaveletsHsin-Cheng Huang Noel CressieABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone (1994, 1995, 1998) and Donoho et al. (1995), are a powerful way to carry out signal denoising, especially when the underlying signal has a sparse wavelet representation. Wavelet shrinkage based on the Bayesian approach involves specifying a prior distribution for the wavelet coe cients. In this chapter, we consider a Gaussian prior with nonzero means for wavelet coe cients, which is di erent from other priors used in the literature. An empirical Bayes approach is taken by estimating the mean parameters using Q-Q plots, and the hyperparameters of the prior covariance are estimated by a pseudo maximum likelihood method. A simulation study shows that our empirical Bayesian spatial prediction approach outperforms the well known VisuShrink and SureShrink methods for recovering a wide variety of signals.

1 IntroductionWavelets in IR are functions with varying scales and locations obtained by dilating and translating a basic function (mother wavelet) that has (almost) bounded support. For certain functions 2 L2 (IR), the familyj;k (x)

2j=2 (2j x? k);

j; k 2 Z; Z

constitutes an (orthonormal) basis of L2 (IR) (Daubechies, 1992). Associated with each mother wavelet is a scaling function that together yield a multiresolution analysis in L2 (IR). That is, after choosing an initial scale J0, any f 2 L2(IR) can be expanded as:

f=

X

k2Z Z

cJ0;k J0;k+

1 X Xj=J0 k2Z Z

dj;k j;k:

Spatial wavelets in IRd are an easy generalization of one-dimensional wavelets (Mallat, 1989). For the purpose of illustration, we consider the case d= 2, where wavelet analysis of two-dimensional images is an important application. A two-dimensional scaling function can be de ned in the

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

2

Hsin-Cheng Huang, Noel Cressie

following separable form: (x; y)= (x) (y); x; y 2 IR; and there are three wavelet functions given by, (1) (x; y)= (x) (y); (2) (x; y)= (x) (y); (3) (x; y)= (x) (y): For j; k1; k2 2 Z, write Z 2j (2j x? k1; 2j y? k2 ); j;k1;k2 (x; y ) (m) 2j (m) (2j x? k1; 2j y? k2 ); m= 1; 2; 3: j;k1;k2 (x; y ) Then any function g 2 L2 IR2 can be expanded as g(x; y)=X?

k1;k2

ck1;k2 J0;k1;k2 (x; y)+

1 3 X X Xn

j=J0 k1;k2 m=1

o ) ) d(m1;k2 (m1;k2 (x; y): j;k j;k

Because of this direct connection between one-dimensional wavelets and spatial wavelets, we shall present most of the methodological development in IR. However, in a subsequent section, we do give an application of our wavelet methodology to two-dimensional spatial prediction of an image. Wavelets have proved to be a powerful way of analyzing complicated functional behavior because in wavelet space, most of the"energy" tends to be concentrated in only a few of the coe cients fcJ0;k g; fdj;k g. It is interesting to look at the statistical properties of wavelet expansions; that is, if f (

) is a random function in L2 (IR), what is the law of its wavelet coe cients? We shall formulate this question more speci cally in terms of the discrete wavelet transform, which we shall now discuss. Suppose that we observe Y ( ) at a discrete number n= 2J points; that is, we have data Y= (Y1;:::; Yn ), where Yi= Y (ti ) and ti= i=n; i= 1;:::; n. The discrete wavelet transform matrix Wn of Y is an orthogonal matrix such that (1) w?(wJ0 )0; wJ0 0;:::; w0J?1 0= WnY is a vector of scaling function coe cient at scale J0 and wavelet coe cients at scales J0;:::; J? 1 (Mallat, 1989). Thus, if Y is random, so too is w. In all that is to follow, we shall construct probability models directly for w, although it should be noted that if Y ( ) is a stationary process, then wJ0 and fwj: j= J0;:::; J? 1g are also stationary processes, except for some points near the boundary (Cambanis and Houdre, 1995). We assume the following Bayesian model: wj; 2 Gau(; 2 I ); (2)? j; Gau; ( ); (3)

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

1. Empirical Bayesian Spatial Prediction Using Wavelets

3

where ( ) is an n n covariance matrix with structure (depending on parameters ) to be speci ed. In a like manner to the de nition of w in (1),?? we write= ( J0 )0; 0J0;:::; 0J?1 0 and= ( J0 )0; 0J0;:::; 0J?1 0 . Notice that there are hyperparameters; 2; still to be dealt with in the Bayesian model. A couple of comments are worth making. The rst level of the Bayesian model is the so-called data model that incorporates measurement error; indeed, we can write (2) equivalently as,

w=

+;

(4)

where Gau(0; 2 I ). Hence is the signal, which we do not observe because it is convolved with the noise . Our goal is prediction of, which we assume has a prior distribution given by (3). This prior is di erent from other priors used in the literature, in that we assume it to have nonzero 0 ( J0;:::; 0J?1 ). We regard to be a prior parameter to be speci ed, which represents the large-scale variation in . Thus, we may write=+; (5) where is deterministic and Gau(0; ( )) is the stochastic component representing the small-scale variation. The optimal predictor of is E ( j w), which we would like to transform 0 back to the original data space. The inverse transform of Wn is Wn, since Wn is an orthogonal matrix. Hence (4) becomes0 0 0 Wn w= W n+ W n; 0 and because Wn is white-noise measurement error in the data space, 0 represents the signal that we would like to predict. Because S Wn of linearity, the optimal predictor of S (S1;:::; Sn ) is,

Thus, it is a simple matter to predict optimally the signal S once E ( j w) has been found. The Gaussian assumptions (1) and (2) make the calculation of^ (; 2; ) E ( j w) a very simple exercise (Section 3). Most of this chapter is concerned with speci cation of the hyperparameters; 2, and . Our approach is empirical, but we show that it o ers improvements over the a priori speci cation of ( 0J0;:::; 0J?1 )0=

0 and previous methods of estimating 2 . In Section 3, we outline our methodology for estimating based on Q-Q plots and for estimating 2 based on the variogram. Section 4 contains a brief

0 0 E (S j Y )= E (Wn j w)= Wn E ( j w):

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

4

Hsin-Cheng Huang, Noel Cressie

discussion of di erent covariance models for ( ), although in all the applications of this chapter we use a simple model that assumes independence and homoskedasticity within a scale but heteroskedasticity (and independence) across scales. Estimation of is also discussed in Section 4. Based on estimates^ (Section 3),^2 and^ (Section 4), we use the empirical Bayes spatial predictor, 0^ S Wn^?^;^ 2;^ to make inference on the unobserved signal S . Section 5 contains a small simulation study showing the value of our empirical Bayesian spatial prediction approach applied to a few test functions and an application to a two-dimensional image. Discussion and conclusions are given in Section 6.

2 Wavelet ShrinkageIn a series of papers, Donoho and Johnstone (1994, 1995, 1998), and Donoho et al. (1995) developed the wavelet shrinkage method for reconstructing signals from noisy data, where the noise is assumed to be Gaussian white noise. The wavelet shrinkage method proceeds as follows. First, the data Y are transformed using a discrete wavelet transform, yielding the empirical wavelet coe cients w. Next, to suppress the noise, the empirical wavelet coe cients are\shrunk" toward zero based on a shrinkage rule. Usually, wavelet shrinkage is carried out by thresholding the wavelet coe cients; that is, the wavelet coe cients that have an absolute value below a prespeci ed threshold are replaced by zero. Finally, the processed empirical wavelet coe cients are transformed back to the original domain using the inverse wavelet transform. In practice, the discrete wavelet transform and its inverse transform can be computed very quickly in only O(n) operations using the pyramid algorithm (Mallat, 1989). With a properly chosen shrinkage method, Donoho and Johnstone (1994, 1995, 1998), and Donoho et al. (1995) show that the resulting estimate of the unknown function is nearly minimax over a large class of function spaces and for a wide range of loss functions. More importantly, it is computationally fast, and it is automatically adaptive to the smoothness of the corresponding true function, without the need to adjust a\bandwidth" as in the kernel smoothing method. Of course, the crucial step of this procedure is the choice of a thresholding (or shrinkage) method. A number of approaches have been proposed including minimax (Donoho and Johnstone, 1994, 1995, 1998), cross-validation (Nason, 1995, 1996), hypotheses testing (Abramovich and Benjamini, 1995, 1996; Ogden and Parzen, 1996a, 1996b), and Bayesian methods (Vidakovic, 1998; Clyde et al., 1996, 1998; Chipman et al., 1997; Crouse et al., 1998; Abramovich et al., 1998; Ruggeri and Vidakovic, 1999).

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

1. Empirical Bayesian Spatial Predict

ion Using Wavelets

5

Donoho and Johnstone (1994) proposed the hard-thresholding and softthresholding strategies. For a wavelet coe cient wj;k and a threshold, the hard-thresholding value is given by

TH (wj;k )=8<

wj;k; if jwj;k j>; 0; if jwj;k j;

and the soft-thresholding value is given by

wj;k?; if wj;k>; if jwj;k j; TS (wj;k )=: 0; wj;k+; if wj;k<?:For these thresholding rules, the choice of the threshold parameter is important. The VisuShrink method proposed by Donoho and Johnstone p (1994) uses the universal threshold= 2 log n for all levels. Donoho and Johnstone (1995) also proposed the SureShrink method by selecting the threshold parameter in a level-by-level fashion. For a given resolution level j, the threshold is chosen to minimize Stein's unbiased risk estimate (SURE), provided the wavelet representation at that level is not too sparse; the sparsity condition is given by? 1 2X1 (wj;k )2 2j k=0 2j

1+ j j=2: 23 2

=

Otherwise, the threshold j= 2 log(2j ) is chosen. In practice, as suggested by Donoho and Johnstone (1994, 1995), the scaling-function coe cients wJ0 are not shrunk. Usually the noise parameter is unknown, in which case Donoho et al. (1995) proposed a robust estimator, the median of absolute deviations (MAD) of wavelet coe cients at the highest resolution: median j wJ?1;k? median (wJ?1;k )j: (6)~= MAD fwJ?1;k g 0:6745 A Bayesian wavelet shrinkage rule is obtained by specifying a certain prior for both and 2 based on (4). Vidakovic (1998) assumes that f j;k g are independent and identically t-distributed with n degrees of freedom and is independent of f j;k g with an exponential distribution. However, their wavelet shrinkage rule, either based on the posterior mean or via a Bayesian hypotheses testing procedure, requires numerical integration. Chipman et al. (1997) also assume an independent prior for f j;k g. Since a signal is likely to have a sparse wavelet distribution with a heavy tail, they consider a mixture of two zero-mean normal components for f j;k g; one has a very small variance and the other has a large variance. Treating 2 as a hyperparameter, their shrinkage rule based on the posterior mean??

p

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

6

Hsin-Cheng Huang, Noel Cressie

has a closed-form representation. Both Clyde et al. (1998) and Abramovich et al. (1998) consider a mixture of a normal component and a point mass at zero for wavelet coe cients f j;k g. Clyde et al. (1998) assume that the prior distribution for 2 is inverse gamma, and f j;k g are independent, conditioned on 2 . They use the stochastic search variable selection (SSVS) algorithm (George and McCulloch, 1993, 1997) to search for nonzero wavelet coe cients of the signal, and use the Markov chain Monte Carlo technique to obtain the posterior mean by averaging over all selected models. Moreover, closed-form approximations to the posterior mean and the posterior variance are also provided. Abramovich et al. (1998) consider a sum of weighted absolut

e errors as their loss function, resulting in a thresholding rule (i.e., coe cients with an absolute value below a certain threshold level are replaced by zero) that is Bayes, rather than a shrinkage rule, which is obtained from a Bayesian approach using squared error loss. Their thresholding rule based on the posterior median also has a closed-form representation, under the assumption that 2 is known. Lu et al. (1997) apply a nonparametric mixed-e ect model, where the scaling-function coe cients are assumed to be xed and the wavelet coe cients are assumed to be random with zero mean. Their empirical Bayes estimator is shown to have a Gauss-Markov type optimality. Though the discrete wavelet transform is an excellent decorrelator for a wide variety of stochastic processes, it does not yield completely uncorrelated wavelet coe cients. In practice, the wavelet coe cients of an observed process are still somewhat correlated, especially for coe cients that are closer together in the same scale, or at nearby scales around the same locations. To describe this structure, Crouse et al. (1998) consider m-state Gaussian mixture models for wavelet coe cients. The state variables are linked via a tree structure in a Markovian manner. An empirical Bayes approach is taken and the prior parameters are estimated using an EM algorithm. Vidakovic and Muller (1995) consider a conjugate-normal-inverse-gamma? prior on; 2 . That is, they assume (2) holds,

jand2

2

Gau 0;?

?

2

;

(7)

has an inverse-gamma distribution. The resulting shrinkage rule is, E ( j w)=

I+

?1?1 w:

(8)

Note that di erent choices of can lead to various shrinkage procedures. Comparing (3) with (7), the main di erence of our method from their method is that we apply a prior with a nonzero mean for the signal . We also provide several parametric model classes for . Details of our method will be given in the next two sections.

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

1. Empirical Bayesian Spatial Prediction Using Wavelets

7

3 The DecompShrink MethodRecall that our approach is empirical Bayesian, which requires estimation of hyperparameters of the Bayesian model. First, we estimate the noise parameter by using the variogram method proposed by Huang and Cressie (1997a). Speci cally, we estimate by^=:8<

f2^(1)?^(2)g=; if 2^(1)^(2)^(1); f(^(1)+^(2))=2g=; if^(2)<^(1);1 2 1 2

0;

otherwise;

(9)

where?^(k) MAD fZt+k? Zt: t= 1;:::; n? kg 2 2; k= 1; 2; is a robust estimator of the semi-variogram at lag? . k 0 We then estimate the deterministic signal ( J0 )0; J0;:::; 0J?1 0 . Since the scaling-function coe cients wJ0 correspond to large-scale features of the signal, we estimate J0 by^J0= wJ0: (10) Further, we shall assume that J0 0 for modeling the stochastic part of the scaling function. That is, we declare the scaling-function coe cients wJ0 to be purely deterministic. For the wavelet coe cients at the j -th level, the deterministic trend j could be considered coming from components

that are potential outliers in the normal probability plot of wj . For k= 0; 1;:::; 2j? 1, let qj;k be the corresponding normal quantile of wj;k . We estimate the slope of the tted line in the normal probability plot by^j max MAD(wj );^; and estimate j;k for k= 0; 1;:::; 2j? 1 by a soft-thresholding function:^j;k=: 0;8<

wj;k? j; if wj;k> j; if jwj;k j j; wj;k+ j; if wj;k<? j;

(11)

where the threshold parameter j is determined by j^j max fjqj;k j: jwj;k j<^j jqj;k jg: Note that if wj are actually normally distributed, the threshold value j will become large. Therefore, only few points will be estimated as deterministic trend components but, importantly, these values will be small. The nal set of parameters to estimate is, which is the vector of prior covariance parameters in (3). In the next section, we propose using a pseudo maximum likelihood estimator^.

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

8

Hsin-Cheng Huang, Noel Cressie

The wavelet shrinkage rule, which we call DecompShrink, is given by^=^+ E( j w;^;^2;^)?=^+ (^) (^)+^ 2 I?1 (w?^ ); (12) where^ can be obtained from (10) and (11). Several parametric models for ( ) are given in the next section. Hence the empirical Bayes spatial 0 predictor of S Wn can be written as:0^ S Wn?^+ (^) (^ )+^ 2 I?1 (w?^ ):

4 Prior Covariance ( )In this section, we consider multiscale models for the prior covariance ( ) or, equivalently, the corresponding stochastic components,0 ( J0 )0; J0;:::; 0J?1 0: But rst we shall discuss brie y estimation of the prior covariance parameters .

?=

?

For a given prior covariance ( ), the hyperparameter can be estimated by a pseudo maximum likelihood estimator, based on the distribution p(wj^;^ 2; ). That is,^= arg sup p(w j^;^ 2; )o n?= arg inf log ( )+^ 2 I+ (w?^ )0 ( )+^2 I?1 (w?^ ):

4.1 Estimation of Prior Covariance Parameters

This method of pseudo maximum likelihood estimation was made popular by Gong and Samaniego (1981).

4.2 Scale-Independent Models

First, we consider scale-independent models, which correspond to block diagonal matrices ( ). Speci cally, we assume that the wavelet coe cients j; j= J0;:::; J? 1, are statistically independent (i.e., scale independence) with zero means. Therefore, we can model j at each scale or level j separately. Note that J0 0, which corresponds to the earlier assumption that all the scaling-function coe cients wJ0 are attributed to the deterministic trend component J0 (Section 3).

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

1. Empirical Bayesian Spatial Prediction Using Wavelets

9

for j= J0;:::; J? 1; k= 0;:::; 2j? 1, where? 1? 2 (14)^j= max 2j wj?^ j 0 wj?^ j?^ 2; 0; and^2 is given by (9). Note that the wavelet coe cients are usually sparse (i.e., most of the coe cients are essentially zero), and have a distribution which is highly non-Gaussian with heavy tails. We are able to consider a Gaussian distribution in (3) because the non-Gaussian components are accounted

for by the mean .

If comes from a temporal process (i.e., d= 1), it is natural to assume a Gaussian autoregressive moving average (ARMA) model, independently for each j; j= J0;:::; J? 1. If is a d-dimensional process, we could specify a Gaussian Markov random eld model, independently for each j; j= J0;:::; J? 1. If one further assumes that the wavelet coe cients are also independent within each scale with var( j )? j I; j= J0;:::; J? 1, then ( ) becomes= 2 2 2 a diagonal matrix and= J0;:::; J?1 0 . Therefore, from (12), the DecompShrink rule based on this simple model can be written as: 2^jk=^jk+ 2^j 2 (wjk?^jk ); (13)^j+^

Though the discrete wavelet transform is an excellent decorrelator for a wide variety of stochastic processes, it does not yield completely uncorrelated wavelet coe cients. A natural way to describe this structure is to use scale-dependent multiscale models for covariances ( ). These models take the dependencies, both within scales and across scales, into account. Moreover, if is convolved with noise, the optimal predictor of can be computed e ciently using the change-of-scale Kalman- lter algorithm (Huang and Cressie, 1997b). A multiscale model consists of a series of processes, j; j= J0;:::; J? 1, with the following Markovian structure: j= Aj j?1+ j; j= J0+ 1;:::; J? 1; where? 2 N 0; J0 I; J0? 2 N 0; j I; j= J0+ 1;:::; J? 1; j all random vectors are indepentent from one another, and AJ0+1;:::; AJ?1 are deterministic matrices describing the causal relations between scales.

4.3 Scale-Dependent Models

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

10

Hsin-Cheng Huang, Noel Cressie

For more details on these models, the reader is referred to Huang and Cressie (1997b). The performance of multiscale models as priors for wavelet coe cients will be considered elsewhere. In all that is to follow, we assume the simplest model of independence, both between and within scales. That is, ( ) is? 2 2 a diagonal matrix with= J0;:::; J?1 0, to be estimated in a manner described in Section 4.1.

5 ApplicationsWe consider three test signals S with sample size n= 2J= 2048. First, consider the\blocks" signal, created by Donoho and Johnstone (1994), rescaled so that the sample standard deviation SD(S )= 7 (see Figure 1 (a1)); second, consider a Gaussian AR(1) stationary process with the autoregressive parameter equal to 0:95 and SD(S )= 7 (see Figure 2 (a1)); and third, consider a mixture of the two above with the sample variance of the\blocks" component equal to 35 and the variance of the AR(1) component equal to 14 (see Figure 3 (a1)). To each signal, standard Gaussian white noise is added, which yields Y . Since the noise variance 2= 1, we have SD(S )== 7. Our goal is to reconstruct the original signal S (S1;:::; Sn )0 from data Y (Y1;:::; Yn)0, and we assume that the noise parameter is unknown. We apply our DecompShrink method with the thresholding function given by (13), and c

ompare it with two commonly used wavelet shrinkage methods, VisuShrink and SureShrink, described in Section 2. For these two methods, the noise parameter is estimated by (6) based on the nestscale wavelet coe cients, as proposed by Donoho et al. (1995). We chose J0= 5 and a nearly symmetric wavelet with 4 vanishing moments from a family of wavelets called symmlets (Daubechies, 1992) for all cases. The noisy signals and the reconstructions from the three methods for the"blocks", AR(1), and\blocks"+ AR(1) signals are shown in Figure 1, Figure 2, and Figure 3, respectively. From the gures, it can be seen that the DecompShrink method performs the best, with SureShrink second, and VisuShrink third in terms of smaller prediction errors. For example, in Figure 1, it is clear that the DecompShrink reconstruction is able to catch the jumps better than the reconstructions from the other two methods. The performance of the various wavelet methods for recovering the signal S is compared using the mean-squared error (MSE) criterion:?^ MSE S

5.1 Simulation Study

n 1 X?S? S 2:^i i n i=1

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

2

01

1-

0.00.20.40.60.81.0

(a1)

2

1

01

-

0.00.20.40.60.81.0

(b1)

2

01

1-

0.00.20.40.60.81.0

(c1)

2

01

1-

0.00.20.40.60.81.0

(d1)02

01001-0.00.20.40.60.81.0(a2)02

01001-0.00.20.40.60.81.0(b2)02

01001-0.00.20.40.60.81.0(c2)02

01001-0.00.20.40.60.81.0(d2)

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

2

2

-

0.00.20.40.60.81.0

(a1)

2

2-

0.00.20.40.60.81.0

(b1)

2

02

-

0.00.20.40.60.81.0

(c1)

02

02

-

0.00.20.40.60.81.0

(d1)02

002-0.00.20.40.60.81.0(a2)02

002-0.00.20.40.60.81.0(b2)02

002-0.00.20.40.60.81.0(c2)02

002-0.00.20.40.60.81.0(d2)

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

2

1

1-

0.00.20.40.60.81.0

(a1)

02

1

1-

0.00.20.40.60.81.0

(b1)

2

01

01

-

0.00.20.40.60.81.0

(c1)

2

1

01

-

0.00.20.40.60.81.0

(d1)02

01001-0.00.20.40.60.81.0(a2)02

01001-0.00.20.40.60.81.0(b2)02

01001-0.00.20.40.60.81.0(c2)02

01001-0.00.20.40.60.81.0(d2)

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

14

Hsin-Cheng Huang, Noel Cressie1.0 0.2 0.4 0.6 0.8

VisuShrink

SureShrink

DecompShrink

FIGURE 4. Boxplots of MSE performance of the\blocks" signal using various wavelet shrinkage techniques based on 100 replications (n= 2048, SD(S )= 7,= 1).8 2 4 6

VisuShrink

SureShrink

DecompShrink

FIGURE 5. Boxplots of MSE performance of the AR(1) signal using various wavelet shrinkage techniques based on 100 replications (n= 2048, SD(S )= 7,= 1).

Based on the random variation in the stochastic signal and noise, 100 repli?^ cations of Y were obtained. Each replicate gives a MSE S . Figure 4, Figure 5, and Figure 6 show the boxplots of these MSE values for the\blocks", AR(1), and\blocks+ AR(1) signals, respectively, based on the 100 replications. The simulation results are also summarized in Table 1 by averaging over the 100 replications for each shrinkage method and for each test signal. From Figure 4, Figure 5, Figure 6, and Table 1, we see that the MSE values obtained from the DecompShrink method have a distribution that is closer to zero than those from the other two methods. It is quite clear that the DecompShrink method is superior to the VisuShrink and the SureShrink methods for recovery of these three signals. Table 2 shows the estimation of the noise parameter for n= 2048,

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

VisuShrinkSureShrinkDecompShrink

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

16

Hsin-Cheng Huang, Noel Cressie

TABLE 2. Estimation of the noise parameter using MAD and the variogram methods based on 100 replications (n= 2048, SD(S )= 7,= 1).

n= 2048, SD(S )= 7,MAD Bias MSE 0:008 0:0012 0:655 0:4324 0:244 0:0618

Mean Blocks 1:008 AR(1) 1:655 Blocks+ AR(1) 1:244

Variogram Mean Bias 0:978?0:022 1:054 0:054 1:019 0:019

=1

MSE 0:0022 0:0425 0:0094

We use the 256 256 pixel\Boat" image, shown in Figure 7 (a1), as our test image. The signal-to-noise ratio (SNR) is used as a quantitative measure of image quality, and is expressed as SNR 10 log10 variance of signal; variance of noise in\db" unit. Figure 7 (b1) and Figure 7 (c1) show two boat images, degraded by adding Gaussian white noise (SNR= 5:5 db and 2 db, respectively). The images reconstructed from Figure 7 (a1), (b1), (c1) based on the SureShrink method are shown in Figure 7 (a2), (b2), (c2), respectively, and the reconstructions based on our DecompShrink method are shown in Figure 7 (a3), (b3), (c3), respectively. Here, J= 8, and we chose J0= 3 for both methods. The results show that for DecompShrink, the image reconstructed from the original image with no noise added is exactly the same as the original image (compare Figure 7 (a3) to Figure 7 (a1)), as we would hope for but would not always expect. Further, the SNR of the reconstructed images increases substantially over that of the noisy images. For example, for the boat images shown in Figure 7 (b1) and Figure 7 (b3), we see an increase in SNR from 5:5 db to 12:04 db and, for Figure 7 (c1) and Figure 7 (c3), we see an increase from 2 db to 10:51 db. Notice that the DecompShrink method performs somewhat better than the SureShrink method for recovering the noisy images, both in terms of SNR and the visual quality.

6 Discussion and ConclusionsIn this chapter, we have investigated an empirical Bayesian spatial prediction approach for recovering signals proposed in Huang and Cressie (1997a). The DecompShrink method has an advantage over the current shrinkage methods in that it is adaptive to the smoothness of the signal regardless of whether the signal has a sparse wavelet representation, since our nonzeromean Gaussian prior for the signal not only catches the signal components

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

05

2

2

5

1

1

5

5010015020025005

2

2

5

1

1

5

5010015020025005

2

2

5

1

1

5

501001502002500520020510010500

501001502002500520020510010500

501001502002500520020510010500

50100150200250052002051001050050100150

200250052002051001050050100150

200250052002051001050050100150200250

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

18

Hsin-Cheng Huang, Noel Cressie

with larger wavelet coe cients, but also catches the signal components with smaller wavelet coe cients based on a prior covariance ( ). It is interesting to note that when the signal is piecewise smooth with sparse wavelet 2 representation (e.g.,\blocks"), we have^j^ 2; j= J0;:::; J? 1 in (14). In this case, the DecompShrink function given by (13) is approximately equal to a soft-thresholding function. Our simulation study shows that the method based on a simple independence model has superior MSE performance over the VisuShrink and the SureShrink methods for both deterministic and stochastic signals. It is not surprising that the VisuShrink and the SureShrink methods do not perform well for recovering stochastic signals, since shrinkage rules based on simple hard-thresholding or soft-thresholding functions rely heavily on a sparse wavelet representation, which typically is not the case for stochastic signals. It is encouraging that DecompShrink does well not only for stochastic signals but also for deterministic signals with spase wavelet representation. It may be advantageous to put a further prior on, or even apply a fully Bayesian analysis. It would also be interesting to see whether more general multiscale models that consider dependencies of wavelet coe cients both within scales and across scales (Huang and Cressie, 1997b) will yield better predictors. These are subjects to be explored in the future.

AcknowledgmentsThis research was supported by the O ce of Naval Research under grant no. N00014-93-1-0001, and was partially carried out at Iowa State University.7 References Abramovich, F. and Benjamini, Y. (1995). Thresholding of wavelet coefcients as multiple hypotheses testing procedure. In Antoniadis, A. and Oppenheim, G., editors, Wavelets and Statistics, volume 103 of Lecture Notes in Statistics, pages 5{14. Springer-Verlag, New York.

Abramovich, F. and Benjamini, Y. (1996). Adaptive thresholding of wavelet coe cients. Computational Statistics and Data Analysis, 22:351{361. Abramovich, F., Sapatinas, T., and Silverman, B. W. (1998). Wavelet thresholding via a Bayesian approach. Journal of the Royal Statistical Society B, 60:725{749. Cambanis, S. and Houdre, C. (1995). On the continuous wavelet transform of second-order random processes. IEEE Transactions on Information Theory, 41:628{642.

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

1. Empirical Bayesian Spatial Prediction Using Wavel

ets

19

Chipman, H. A., Kolaczyk, E. D., and McCulloch, R. E. (1997). Adaptive Bayesian wavelet shrinkage. Journal of the American Statistical Association, 92:1413{1421. Clyde, M., Parmigiani, G., and Vidakovic, B. (1996). Bayesian strategies for wavelet analysis. Statistical Computing and Statistical Graphics Newsletter, 7:3{9. Clyde, M., Parmigiani, G., and Vidakovic, B. (1998). Multiple shrinkage and subset selection in wavelets. Biometrika, 85:391{401. Crouse, M., Nowak, R., and Baraniuk, R. (1998). Wavelet-based statistical signal processing using hidden Markov models. IEEE Transactions on Signal Processing, 46:886{902. Daubechies, I. (1992). Ten Lectures on Wavelets. SIAM, Philadelphia. Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81:425{455. Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association, 90:1200{1224. Donoho, D. L. and Johnstone, I. M. (1998). Minimax estimation via wavelet shrinkage. Annals of Statistics, 26. Forthcoming. Donoho, D. L., Johnstone, I. M., Kerkyacharian, G., and Picard, D. (1995). Wavelet shrinkage: asymptopia? (with discussion). Journal of the Royal Statistical Society B, 57:301{369. George, E. I. and McCulloch, R. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88:881{889. George, E. I. and McCulloch, R. (1997). Approaches to Bayesian variable selection. Statistica Sinica, 7:339{373. Gong, G. and Samaniego, F. J. (1981). Pseudo maximum likelihood estimation: theory and applications. Annals of Statistics, 9:861{869. Huang, H.-C. and Cressie, N. (1997a). Deterministic/stochastic wavelet decomposition for recovery of signal from noisy data. Technical Report 97-23, Department of Statistics, Iowa State University, Ames, IA. Huang, H.-C. and Cressie, N. (1997b). Multiscale spatial modeling. In ASA 1997 Proceedings of the Section on Statistics and the Environment, pages 49{54, Alexandria, VA.

ABSTRACT Wavelet shrinkage methods, introduced by Donoho and Johnstone

20

Hsin-Cheng Huang, Noel Cressie

Lu, H. H.-C., Huang, S.-Y., and Tung, Y.-C. (1997). Wavelet shrinkage for nonparametric mixed-e ects models. Technical report, Institute of Statistics, National Chiao-Tung University. Mallat, S. (1989). A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11:674{693. Nason, G. P. (1995). Choice of the threshold parameter in wavelet function estimation. In Antoniadis, A. and Oppenheim, G., editors, Wavelets and Statistics, volume 103 of Lecture Notes in Statistics, pages 261{280. Springer-Verlag, New York. Nason, G. P. (1996). Wavelet shrinkage using cross-validation. Journal of the Royal Statistical Society B, 58:463{479. Ogden, R. T. and Parzen, E. (1996a). Change-point approach to data analytic wavelet thresholding. Statistics and Computing, 63:93{99. Ogden, R. T. and Parzen, E. (1996b). Data dependent wavelet t

hresholding in nonparametric regression with change-point applications. Computational Statistics and Data Analysis, 22:53{70. Ruggeri, F. and Vidakovic, B. (1999). A Bayesian decision theoretic approach to wavelet thresholding. Statistica Sinica, 9. To appear. Vidakovic, B. (1998). Nonlinear wavelet shrinkage with Bayes rules and Bayes factors. Journal of the American Statistical Association, 93:173{ 179. Vidakovic, B. and Muller, P. (1995). Wavelet shrinkage with a ne Bayes rules with applications. Discussion Paper 95-24, ISDS, Duke University, Durham, NC.

本文来源:https://www.bwwdw.com/article/5yfj.html

Top