Kernel k-means, Spectral Clustering and Normalized Cuts
更新时间:2023-04-09 14:32:01 阅读量: 实用文档 文档下载
- kernel推荐度:
- 相关推荐
Kernel k-means,Spectral Clustering and Normalized Cuts
Inderjit S.Dhillon Dept.of Computer Sciences University of Texas at Austin Austin,TX78712 inderjit@8db44e052379168884868762caaedd3383c4b5db
Yuqiang Guan
Dept.of Computer Sciences
University of Texas at Austin
Austin,TX78712
yguan@8db44e052379168884868762caaedd3383c4b5db
Brian Kulis
Dept.of Computer Sciences
University of Texas at Austin
Austin,TX78712
kulis@8db44e052379168884868762caaedd3383c4b5db
ABSTRACT
Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space.Despite signi?cant research,these methods have re-mained only loosely related.In this paper,we give an ex-plicit theoretical connection between them.We show the generality of the weighted kernel k-means objective func-tion,and derive the spectral clustering objective of normal-ized cut as a special case.Given a positive de?nite similarity matrix,our results lead to a novel weighted kernel k-means algorithm that monotonically decreases the normalized cut. This has important implications:a)eigenvector-based algo-rithms,which can be computationally prohibitive,are not essential for minimizing normalized cuts,b)various tech-niques,such as local search and acceleration schemes,may be used to improve the quality as well as speed of kernel k-means.Finally,we present results on several interest-ing data sets,including diametrical clustering of large gene-expression matrices and a handwriting recognition data set. Categories and Subject Descriptors
H.3.3[Information Search and Retrieval]:Information Search and Retrieval;I.5.3[Pattern Recognition]:Clus-tering
General Terms
Algorithms,Theory
Keywords
Spectral Clustering,Kernel k-means,Graph Partitioning 1.INTRODUCTION
Clustering has received a signi?cant amount of attention in the last few years as one of the fundamental problems in data mining.k-means is one of the most popular clustering algorithms.Recent research has generalized the algorithm
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro?t or commercial advantage and that copies bear this notice and the full citation on the?rst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speci?c permission and/or a fee.
KDD’04,August22–25,2004,Seattle,Washinton,USA
Copyright2004ACM1-58113-888-1/04/0008...$5.00.in many ways;for example,similar algorithms for clustering
can be obtained using arbitrary Bregman pergences as the
distortion measure[2].Other advances include using local
search to improve the clustering results[5]and using the
triangle inequality to speed up the computation[4].
A major drawback to k-means is that it cannot separate
clusters that are non-linearly separable in input space.Two
recent approaches have emerged for tackling such a prob-
lem.One is kernel k-means,where,before clustering,points
are mapped to a higher-dimensional feature space using a
nonlinear function,and then kernel k-means partitions the
points by linear separators in the new space.The other
approach is spectral clustering algorithms,which use the
eigenvectors of an a?nity matrix to obtain a clustering of
the data.A popular objective function used in spectral clus-
tering is to minimize the normalized cut[12].
On the surface,kernel k-means and spectral clustering
appear to be completely di?erent approaches.In this pa-
per we?rst unite these two forms of clustering under a sin-
gle framework.By generalizing the k-means objective func-
tion to use both weights and kernels,we show how the two
approaches to clustering are related.Speci?cally,we can
rewrite the weighted kernel k-means objective function as a
trace maximization problem whose relaxation can be solved
with eigenvectors.The result shows how a particular kernel
and weight scheme is connected to the spectral algorithm of
Ng,Jordan,and Weiss[10].However,the advantage to our
approach is that we can generalize the clustering algorithm
to use arbitrary kernels and weights.
Further,we show that by choosing the weights in particu-
lar ways,the weighted kernel k-means objective function is
identical to the normalized cut.Thus far,only eigenvector-based algorithms have been employed to minimize normal-
ized cuts in spectral clustering and image segmentation.
However,software to compute eigenvectors of large sparse
matrices(often based on the Lanczos algorithm)can have
substantial computational overheads,especially when a large
number of eigenvectors are to be computed.In such situa-
tions,our equivalence has an important implication:we can
use k-means-like iterative algorithms for directly minimizing
the normalized-cut of a graph.
We show the usefulness of our approach to the application
of clustering gene expression data by applying a quadratic
kernel(squared correlation)to obtain anti-correlated gene
clusters and we illustrate the scalability of our algorithms in
terms of computation time by applying it to a large hand-
writing recognition data set.
A word about notation.Capital letters such as A,X,Y
Polynomial Kernel κ(a ,b )=(a ·b +c )d
Gaussian Kernel κ(a ,b )=exp(?||a ?b ||2/2σ2)Sigmoid Kernel
κ(a ,b )=tanh(c (a ·b )+θ)
Table 1:Examples of kernel functions
and Φdenote matrices;lower-case bold letters such as a ,b
denote column vectors;script letters such as A ,B ,V ,E rep-resent sets;||a ||denotes the L 2norm of a vector;and ||X ||F denotes the Frobenius norm of a matrix,and is given by ||X ||F =(P i,j X 2ij )1/2.
2.THE ESSENTIALS In this section,we summarize the seemingly di?erent ap-proaches of weighted kernel k -means and spectral clustering.
2.1
Weighted Kernel k-means
The k -means clustering algorithm can be enhanced by the use of a kernel function;by using an appropriate nonlin-ear mapping from the original (input)space to a higher-dimensional feature space,one can extract clusters that are non-linearly separable in input space.Furthermore,we can generalize the kernel k -means algorithm by introducing a weight for each point a ,denoted by w (a ).As we shall see later,this generalization is powerful and encompasses the normalized cut of a graph.
Let us denote clusters by πj ,and a partitioning of points as {πj }k j =8db44e052379168884868762caaedd3383c4b5dbing the non-linear function φ,the objective function of weighted kernel k -means is de?ned as:
D ({πj }k j =1)=
k X j =1X a ∈πj
w (a ) φ(a )?m j 2
(1)
where
m j
=
P
b ∈πj
w (b )φ(b )P
b ∈πj
w (b )
.
Note that m j is the “best”cluster representative since
m j =argmin z
X
a ∈πj
w (a ) φ(a )?z 2.The Euclidean distance from φ(a )to center m j is given by
φ(a )?P b ∈πj w (b )φ(b )P b ∈πj
w (b )2
=φ(a )·φ(a )?2P b ∈πj w (b )φ(a )·φ(b )P b ∈πj w (b )+
P b,c ∈πj w (b )w (c )φ(b )·φ(c )
(P b ∈πj w (b ))2.(2)The dot products φ(a )·φ(b )are computed using kernel func-tion κ(see Table 1for examples of popular kernel functions),
and are contained in the kernel matrix K .All computation is in the form of such inner products,hence we can replace all inner products by entries of the kernel matrix.
The weighted kernel k -means algorithm (Algorithm 1)shares many properties of standard k -means;for example,the objective function value de?ned in (1)monotonically de-creases with each iteration.
Assuming we are able to store the whole a?nity matrix in main memory,we can analyze the time complexity of Al-gorithm 1.It is clear that the bottleneck is Step 3,i.e.,the computation of distances.The ?rst term in (2),φ(a )·φ(a ),
Algorithm 1:Weighted Kernel k -means.Weighted Kernel kmeans (K ,k ,w ,C 1,...,C k )
Input:K :kernel matrix,k :number of clusters,w :weights for each point
Output:C 1,....,C k :partitioning of the points
1.Initialize the k clusters:C (0)1,...,C (0)
k .2.Set t =0.
3.For each point a ,?nd its new cluster index as
j ?(a )=argmin j φ(a )?m j 2,using (2).
8db44e052379168884868762caaedd3383c4b5dbpute the updated clusters as
C t +1j ={a :j ?(a )=j }.
5.If not converged,set t =t +1and go to Step 3;Otherwise,stop.
need not be computed since it is a constant for a and thus
does not a?ect the assignment of a to clusters.The second term is calculated once per data point,and costs O (n )each time it is computed,leading to a cost of O (n 2)per iteration.For the third term,notice that
P b,c ∈πj
w (b )w (c )φ(b )·φ(c )
(
P
b ∈πj
w (b ))2
is
?xed for cluster j ,so in each iteration it is computed once and stored.Thus the complexity is O (n 2)scalar operations per iteration.Initially,we must compute the kernel matrix K ,which usually takes time O (n 2m ),where m is the dimen-sion of the original points.If the total number of iterations is τ,then the time complexity of Algorithm 1is O (n 2(τ+m )).
2.2Spectral clustering
Spectral clustering has emerged recently as a popular clus-tering method that uses eigenvectors of a matrix derived from the data.Several algorithms have been proposed in the literature [9,10,12],each using the eigenvectors in slightly di?erent ways.In this paper,we will focus on the normalized cut spectral algorithm.
2.2.1Normalized Cuts
In [13],the authors consider the k -way normalized cut
problem.We are given a graph G =(V ,E ,A ),where V is the set of vertices,E is the set of edges connecting vertices,and A is an edge a?nity matrix,assumed to be nonnegative and symmetric.Suppose A ,B ?V ,we de?ne
links(A ,B )=X
i ∈A ,j ∈B
A (i,j ).
Then the normalized linkratio of A ,B is:
normlinkratio(A ,B )=
links(A ,B )
links(A ,V )
.
The k -way normalized cut problem is to minimize the links that escape a cluster relative to the total “weight”of the cluster.For a k -way partitioning of the vertices,we are interested in solving the following problem:
minimize
1k k X
j =1
normlinkratio(V j ,V \V j ).The authors of [13]obtain the following spectral relaxation to this problem:let D be the diagonal matrix whose (i,i )entry is the sum of the entries of row i in matrix A .Then
the normalized cut criterion is equivalent to the following trace maximization problem:
maximize
1
k
trace(Z T AZ ),where Z =X (X T DX )?1/2,and X is an n ×k indicator matrix for the partitions.Note that Z T DZ =I k .
Letting ?Z
=D 1/2Z and relaxing the constraint that X is an indicator matrix results in the following problem:maxi-mize the trace of ?Z
T D ?1/2AD ?1/2?Z ,where the constraints on ?Z
are relaxed such that ?Z T ?Z =I k .A well-known so-lution to this problem is obtained by setting the matrix ?Z
to be the top k eigenvectors of the matrix D
?1/2AD ?1/2
.These eigenvectors are then used to compute a discrete par-titioning of the points.
3.THE SPECTRAL CONNECTION
At ?rst glance,weighted kernel k -means and normalized cuts using spectral clustering appear to be quite di?erent.After all,spectral clustering uses eigenvectors to help deter-mine the partitions,whereas eigenvectors do not appear to ?gure in kernel k -means.However,we saw that the normal-ized cut problem can be expressed as a trace maximization problem,and in this section,we show how we can express weighted kernel k -means as a trace maximization problem as well.This will show how to connect the two methods of clustering.
For ease in presentation,let us denote the “distortion”of a cluster πj to be d (πj )=P a ∈πj w (a ) φ(a )?m j 2.Then
we have that D ({πj }k j =1)=P k
j =1d (πj ).Moreover,let us
denote,for a cluster πj ,the sum of the w weights of the points in πj to be s j ;in other words,s j =P
a ∈πj w (a ).Finally,let us denote W to be the diagonal matrix of all the w weights,and W j to be the diagonal matrix of the weights in πj .Then we can rewrite the mean vector m j as
m j =Φj
W j e
s j
,where Φj is the matrix of points associated with cluster πj (after the φmapping),i.e.,Φ=[φ(a 1,φ(a 2),...,φ(a n )],and e is the vector of all ones of appropriate size.
We can rewrite the distortion of cluster πj to be:
d (πj )=
X
a ∈πj
w (a ) φ(a )?m j 2=X
a ∈πj
w (a ) φ(a )?Φj
W j e j
2
= (Φj ?Φj W j ee T s j
)W 1/2
j 2F
=
(Φj W 1/2
j (I
?
W 1/2
j
ee T W 1/2
j s j
) 2F .
Using the fact that trace(AA T )=trace(A T A )= A 2F ,and noting that I ?
W 1/2
j
ee T W 1/2
j s j
=P is an orthogonal
projection,i.e.P 2=P since s j =e T W j e ,we get that
d (πj )=trac
e Φj W 1/2j
I ?W 1/2j ee T W 1/2j s j
2
W 1/2j ΦT
j =trace Φj W 1/2j I ?W 1/2j ee T W 1/2j s j
W 1/2j ΦT
j
=
trace(W 1/2
j
ΦT j Φj W 1/2
j
)?e T W j √s j ΦT j Φj W j e
√
s j
.If we represent the full matrix of points as Φ=[Φ1,Φ2,...,Φk ],
then we have that
D ({πj }k j =1)=trace(W
1/2ΦT
ΦW 1/2)?trace(Y T W 1/2ΦT ΦW 1/2Y ),where
Y =26
666
64
W 1/2
1e
√
s 1
W 1/2
2e √s 2
···
W 1/2
k e √s k
3777775
.Note that Y is an n ×k orthonormal matrix,i.e.,Y T Y =I .Since trace(ΦW ΦT )is a constant,the minimization of the objective function in (1)is equivalent to the maximization of trace(Y T W 1/2ΦT ΦW 1/2Y ).The matrix ΦT Φis simply the kernel matrix K of the data,so we can rewrite it as the maximization of trace(Y T W 1/2KW 1/2Y ).
A standard result in linear algebra [8]provides a global solution to a relaxed version of this problem.By allowing Y to be an arbitrary orthonormal matrix,we can obtain an optimal Y by taking the top k eigenvectors of W 1/2KW 1/2.Similarly,the sum of the top k eigenvalues of W 1/2KW 1/2gives the optimal trace value.
4.IMPLICATIONS
The previous two sections show that the seemingly unre-lated graph cut problem and weighted kernel k -means prob-lem can both be written as trace maximization problems.This hints at a connection between these problems.We now make this connection precise,and discuss its implications.
4.1
Normalized Cuts using Weighted Kernel k-means
As discussed in Section 2.2.1,the normalized cut problem
can be recast as a trace maximization problem,where we
attempt to maximize trace(?Z
T D ?1/2AD ?1/2?Z ).A simple calculation reveals that ?Z
is analogous to the matrix Y from the previous section.
We now show a direct relationship between the trace max-imizations of the normalized cut and kernel k -means prob-lems.Consider weighted kernel k -means with W =D and K =D ?1AD ?1.The trace maximization of weighted kernel k -means is then trace(Y T D ?1/2AD ?1/2Y ),which is equiv-alent to the trace maximization for normalized cut.If the a?nity matrix K is positive de?nite,we can use the weighted kernel k -means procedure described in Algorithm 1in order to minimize the normalized cut (positive de?niteness allows us to factor K into ΦT Φ,and allows us to prove conver-gence).Indeed,any starting partition can potentially be improved by Algorithm 1.
One advantage to our use of an iterative algorithm for these graph problems is that we can use di?erent improve-
ment methods,such as local search,to increase the quality
of the results.In situations where eigenvector computation
is di?cult,for example,when the a?nity matrix is large
and sparse,and many eigenvectors are desired,our iterative
algorithm is particularly useful.
4.2Kernel k-means using Eigenvectors
The reformulation of the kernel k-means objective func-
tion allows us to solve a relaxed problem using the eigen-
vectors of the matrix W1/2KW1/2.This yields a spectral approach to minimizing the objective function:we?rst com-
pute the top k eigenvectors of the matrix W1/2KW1/2.This maps the original points to a lower-dimensional space.Once postprocessing is performed and a discrete clustering solu-tion has been attained,one can treat the resulting parti-tioning as a good initialization to kernel k-means on the full data set.This two-layer approach–?rst running spectral clustering to get an initial partitioning and then re?ning the partitioning by running kernel k-means on the partitioning
–typically results in a robust partitioning of the data.
4.3Interpreting NJW As Kernel k-means
The results from the previous sections give us novel ways
to interpret the spectral algorithm of Ng,Jordan,and Weiss[10].
Their algorithm?rst computes the kernel matrix K,where
the kernel that is used is the Gaussian Kernel.They com-
pute a diagonal matrix D such that the diagonal entries of
D are the sums of the rows of K.Then they compute the eigenvectors of the matrix D?1/2KD?1/2,and form a dis-crete clustering using these eigenvectors.
Hence,we see that the NJW algorithm can be viewed as
either a spectral relaxation to the weighted kernel k-means
objective function or as a normalized cut problem.The
connection to normalized cuts is clear:we view the a?n-
ity matrix K in the spectral algorithm as de?ning the edge
weights of a graph,and their algorithm attempts to mini-
mize the normalized cut in this graph.
5.SCALABILITY ISSUES
In this section,we discuss methods for scaling the kernel
k-means algorithm to large data sets.
To speed up the distance computation in our weighted ker-
nel k-means algorithm,we can adapt the pruning procedure
used in[4].The idea behind the acceleration scheme is that
we can use the triangle inequality to avoid unnecessary com-
putation.We compute the distances between corresponding
new and old centers, m n j?m o j for all j,and store the information in a k×k matrix D.Similarly,we keep a k×n matrix L that contains lower bound for the distance from each point to each center.The distance from a point to its cluster center is exact in L.After the centers are updated,
we estimate the lower bound from each point a to new clus-
ter center,say m n j,to be the di?erence between the lower
bound from a to m o j and m n j?m o j .We actually compute the distance from a to m n j only if the estimation is smaller than distance from a to its cluster center.Figure3shows signi?cant computation savings due to this estimation.
6.EXPERIMENTAL RESULTS
We now provide experimental results to validate the use-
fulness of the results presented in the previous sections.We
?rst illustrate“diametric clustering”of genes with degree-
2polynomial kernel k-means.Then,with the handwriting recognition data set,we show that using eigenvectors to ini-
tialize kernel k-means gives better initial and?nal objective
function values and better clustering results.Thus the theo-
retical connection between spectral clustering and kernel k-
means helps in obtaining higher quality results.Finally,we
show that our distance estimation techniques save a consid-
erable amount of computation time,verifying the scalability
of our approach.
6.1Data sets
The human?broblast gene expression has517genes across
12conditions and the yeast dataset of Rosetta Inpharmatics
has5246genes across300conditions.They are used and
preprocessed as in[6].
The Pendigits is downloaded from UCI machine learn-
ing repository(ftp://8db44e052379168884868762caaedd3383c4b5db/pub/machine-learning-
databases/pendigits),which contains(x,y)coordinates of
hand-written digits.This dataset contains7494training
digits and3498testing digits.Each digit is represented as
a vector in16-dimensional space.
6.2Implementation details
Our kernel k-means algorithm is implemented in C++and
all experiments are done on a PC(Linux,two AMD1.19GHz
processors,1GB main memory).In our implementation,we
store the kernel matrix in main memory.All the plots are
generated using Matlab.
6.3Results
Diametrical Clustering of Gene Expression Data
In gene expression clustering,identifying anti-correlated
relationship among genes is important,as it has been ob-
served that genes whose expression patterns are strongly
anti-correlated may also be functionally similar.We show
that degree-2polynomial kernel k-means can identify anti-
correlated genes as was done in[6].We cluster human?-
broblast genes into5clusters.Then for each cluster A i,i= 1,...,5,we compute the dot product of each gene vector
and the leading eigenvector of A i A T i and plot genes across experiments in red or blue depending on whether the dot product value is positive or negative.The?rst3plots of Figure1show some sample results.Then we cluster5246 yeast genes into40clusters.This took approximately4.5 minutes,including forming the kernel matrix and cluster-ing.The last3plots in Figure1correspond to one cluster of yeast genes.We magnify three parts of one cluster plot across300experiments in order to show the details.From the plots we see that degree-2polynomial kernel k-means captures the anti-correlation similar to those captured by the diametric clustering algorithm.For example,cluster2of the human?broblast dataset includes a number of genes in-volved in inter-cellular signaling,in?ammation,angiogenesis and re-epithelialization,such as IL1beta,thrombomodulin, IL8,etc.,corresponding to Figure3d in[6].
Clustering Handwriting Recognition Data Set
Since the eigenvectors of the kernel matrix are the opti-
mizers of a relaxed trace maximization problem,using the
output of spectral clustering to initialize kernel k-means can
often produce better clustering results than pure random ini-
tialization.To illustrate this,we run sigmoid kernel k-means
on the Pendigits.tes dataset10times with random initial-
ization,then average the initial and?nal objective function
values.Then we run sigmoid kernel k-means10times again,
Figure1:In each?gure are plotted the mean ex-
pression pro?les of two opposed clusters obtained
on the human?broblast dataset(?rst3plots)and
the Rosetta dataset(last3plots).The clustering al-
gorithm used is degree2polynomial kernel k-means.
Figure2:Two sample runs of sigmoid kernel k-
means clustering of Pendigits.tes dataset;a=0.0045
and b=0.11are used.
initial?nal NMI
random.0213.0062.666
spectral.0081.0059.698
Table2:Average initial and?nal objective func-
tion values and normalized mutual information val-
ues over10runs for the Pendigits.tes dataset.
Figure3:Computation savings due to triangle in-
equality estimation.
Figure4:Computation time required to cluster the
whole Pendigits dataset.
but initialized with the output of the spectral clustering al-
gorithm.We also compute the average of the initial and?nal
objective function values.For this dataset,since we know
the underlying class labels we can evaluate clustering results
by forming a confusion matrix,where entry(i,j),n(j)
i
gives
the number of points in cluster i and class j.From such a
confusion matrix,we compute normalized mutual informa-
tion(NMI)as
2
P
k
l=1
P
c
h=1
n(h)
l
n
log n
(h)
l n
P k
i=1n
(h)
i
P c
i=1n
(i)
l,
where c is the number of classes,H(π)=?
P
k
i=1
n i
n
log n i
n
,
and H(ζ)=?
P
c
j=1
n(j)log n(j).High NMI value indi-
cates that the clustering and true class labels match well.In
Table2,we compare the average initial and?nal objective
function values as well as the average normalized mutual in-
formation values of10runs each with random initialization
and spectral initialization.Clearly,using spectral initial-
ization can improve clustering quality.Figure2shows two
sample runs of sigmoid kernel k-means clustering,one with
random initialization and the other with spectral initializa-
tion.
The bottleneck in Algorithm1is the computation of Eu-
clidean distances in kernel space.In order to avoid unnec-
essary computation,we incorporate the triangle inequality
estimation mentioned in Section5into our kernel k-means
software.Figure3shows the considerable savings in the
number of Euclidean distance computations as the iteration
count increases in a typical run of Algorithm1on the whole
Figure5:Objective function value of kernel k-means
and normalized cut values monotonically decrease
in Algorithm1.Corresponding objective function
value and cut value at each iteration di?er by a con-
stant.
Pendigits dataset,which contains10992digits.Without
using the estimation,in every iteration nk distance calcu-
lations,109920in this case,need to be performed.How-
ever,after incorporating the estimation,we save consider-
able computation;for example,in the ninth iteration only
621distances need to be computed.Figure4gives the com-
putation time taken to cluster10992digits into a varied
number of clusters.Times for both the original clustering
algorithm and the one with distance estimation are shown.
The distance estimation technique yields more savings in
computation time as the number of clusters grows.
Figure5shows the objective function values of kernel k-
means and the corresponding normalized cut values at each
iteration of Algorithm1on the human?broblast gene data.
Again,a degree-2polynomial kernel is used and5diametri-
cal clusters are generated.We see that cut values decrease
monotonically along with the objective function value of the
kernel k-means algorithm.As can be seen,the di?erence
between the corresponding cut value and objective func-
tion value at each iteration is a constant,which is equal
to k?trace(D?1A).
7.RELATED WORK
Our work has been largely inspired by recent results on
spectral clustering and relaxation methods presented in[10,
13,1,14].In[14],the authors show that the traditional
k-means objective function can be recast as a trace maxi-mization problem of the Gram matrix for the original data.
We generalize their work to the case when non-linear ker-
nels are used,plus we treat the weighted version of the kernel
k-means algorithm,which allows us to encompass spectral algorithms that use normalized cuts.
Although focusing on a di?erent issue,[13,1]also dis-
cusses a relation to normalized cuts,as well as a method for
?nding a good discrete clustering from the eigenvector ma-
trix.In[1],they hint at a way to run an iterative algorithm
for normalized cuts but their algorithm considers the fac-
torization of a semi-de?nite matrix K such that K=GG T,
which takes O(n3)time,and thus is computationally worse
than our kernel k-means approach.
The notion of using a kernel to enhance the k-means ob-
jective function was?rst described in[11].Kernel-based
learning methods have appeared in a number of other areas,
especially in the area of Support Vector Machines[3].
In[7],the objective function was recast as a trace maxi-mization,but they developed an EM-style algorithm to solve
the kernel k-means problem.
8.CONCLUSION AND FUTURE WORK
In this paper,we have presented a theoretical connection
between weighted kernel k-means and spectral clustering.
We show that the weighted kernel k-means formulation is
very general,and that the normalized cut objective can be
recast as a special case of the weighted kernel k-means ob-
jective function.We also show that,given weights and a
kernel matrix,it is possible to derive a spectral algorithm
that solves a relaxed version of the objective function.We
also provide new interpretations of the spectral algorithm of
Ng,Jordan,and Weiss,while generalizing them to use other
kernels,such as the degree-2kernel for diametric clustering.
In future work,we would like to incorporate alternate
objectives,such as ratio cut and ratio association,into our
framework.So far,we have assumed that the a?nity matrix
is positive de?nite.In the future,we would like to be able
to handle inde?nite matrices. Acknowledgments.This research was supported by NSF CAREER Award No.ACI-0093404,Texas Advanced
Research Program grant003658-0431-2001and NSF ITR
Award No.IIS-0325116.
9.REFERENCES
[1] F.Bach and M.Jordan.Learning spectral clustering.In
Proc.of NIPS-16.MIT Press,2004.
[2] A.Banerjee,S.Merugu,I.Dhillon,and J.Ghosh.
Clustering with Bregman pergence.Proceeding of SIAM
Data Mining conference,pages234–245,2004.
[3]N.Cristianini and J.Shawe-Taylor.Introduction to
Support Vector Machines:And Other Kernel-Based
Learning Methods.Cambridge University Press,
Cambridge,U.K.,2000.
[4]I.S.Dhillon,J.Fan,and Y.Guan.E?cient clustering of
very large document collections.In Data Mining for
Scienti?c and Engineering Applications,pages357–381.
Kluwer Academic Publishers,2001.
[5]I.S.Dhillon,Y.Guan,and J.Kogan.Iterative clustering
of high dimensional text data augmented by local search.
In Proceedings of The2002IEEE International Conference on Data Mining,pages131–138,2002.
[6]I.S.Dhillon,E.M.Marcotte,and U.Roshan.Diametrical
clustering for identifying anti-correlated gene clusters.
Bioinformatics,19(13):1612–1619,September2003.
[7]M.Girolami.Mercer kernel based clustering in feature
space.IEEE Transactions on Neural Networks,
13(4):669–688,2002.
[8]G.Golub and C.Van Loan.Matrix Computations.Johns
Hopkins University Press,1989.
[9]R.Kannan,S.Vempala,and A.Vetta.On clusterings–
good,bad,and spectral.In Proceedings of the41st Annual Symposium on Foundations of Computer Science,2000. [10] A.Y.Ng,M.Jordan,and Y.Weiss.On spectral clustering:
Analysis and an algorithm.In Proc.of NIPS-14,2001. [11] B.Sch¨o lkopf,A.Smola,and K.-R.M¨u ller.Nonlinear
component analysis as a kernel eigenvalue problem.Neural Computation,10:1299–1319,1998.
[12]J.Shi and J.Malik.Normalized cuts and image
segmentation.IEEE Trans.Pattern Analysis and Machine Intelligence,22(8):888–905,August2000.
[13]S.X.Yu and J.Shi.Multiclass spectral clustering.In
International Conference on Computer Vision,2003. [14]H.Zha,C.Ding,M.Gu,X.He,and H.Simon.Spectral
relaxation for k-means clustering.In Neural Info.
Processing Systems,2001.
正在阅读:
Kernel k-means, Spectral Clustering and Normalized Cuts04-09
《大学计算机基础》习题答案-201502-29
无限的爱作文600字06-19
对徐志摩诗歌的看法03-21
浅述闽南红砖民居的建筑类型及其特点10-22
名人故事小故事02-18
得不偿失作文600字06-24
学校的四季作文400字07-12
党委中心学习组《行政监察法》讲座讲稿01-27
雨雪的四季作文500字07-01
- 1REAL-TIME HAND GESTURE TELEROBOTIC SYSTEM USING FUZZY C-MEANS CLUSTERING
- 2k-means++:The advantages of careful seeding(full text)
- 3改进K—Means算法的探讨与分析
- 4Polynomial time approximation schemes for geometric k-clustering
- 5Dynamical evolution of clustering in complex network of earthquakes
- 6Clustering using firefly algorithm Performance study
- 7Feature Reduction for Document Clustering and Classification
- 8Improving Support Vector Clustering with Ensembles
- 9Improving Support Vector Clustering with Ensembles
- 10vxworks_kernel_shell_users_guide_6.9
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- Clustering
- Normalized
- Spectral
- Kernel
- means
- Cuts
- 2022年二年级语文下15“黑板”跑了教案反思作业题(苏教版)
- 写给大学老师的感谢信3篇
- 采矿毕业设计开题报告
- 如何培养集体荣誉感
- 最详细的融资租赁系统操作说明
- 湖南长沙车辆道闸门禁识别系统哪家好多少钱
- 2022年哈尔滨工业大学生命科学与技术学院338生物化学考研导师圈
- 公司职员个人年终工作总结万能模板范文【5篇】
- IEC62056技术文档--通信架构与协议
- YYW-M-6572 电子模块说明书
- 中国农业银行营业网点信息系统基础环境设计规范试行
- 如何编制分割牛羊肉上市募投项目可行性研究报告(立项+招股书底稿
- 2022年苏教版语文九年级上册十三 散文家谈散文 关于散文《白鹭》
- 人教版初中数学实数易错题汇编附答案解析
- 王德高《公共管理学》配套题库【章节题库】第一章~第三章 【圣才
- 2016-2022年中国婴幼儿奶粉行业现状调研分析及市场前景预测报告
- 2022年中考道德与法治真题分类汇编八年级下第四单元崇尚法治精神
- 高中地理_人口分布与人口合理容量教学设计学情分析教材分析课后
- 三合入门教材_中国正统风水学
- 第五章-植物组织培养技术实验方案