function [W,Covar,priors]=mlgmm(x,c) % Usage: [W,Covar,priors]=mlgmm(x,c) % Interactively develop Gaussian mixture model for a class of training data % x: K x N matrix. K samples, N dimensional feature vectors % W: c x N with c clusters % Covar: N x N x c covariance matrices % use netlab routines: gmm__.m, consist.m % use mfiles: kmeansf.m, kmeantest.m, csm.m % c: if omitted, or c = 0, means # of cluster interactively determined (default) % else, c clusters will be used. % (C) 2001 by Yu Hen Hu % created: 9/26/2001 % modified: 10/22/2001 add input argument c disp('mlgmm is running ...'); [K,N]=size(x); er = 1e-5; % assume N > 1 xmean=mean(x); xstd=std(x); itmax=min(K,30); count=1; if nargin < 2 | c == 0, % interactively determine # of clusters % preliminary estimate of possible number of clusters covx=(x-ones(K,1)*xmean)'*(x-ones(K,1)*xmean)/K; % covariance matrix of x [vx,dx]=eig(covx); % eigenvalue decomposition of covx dx0=diag(dx); % make dx a vector [tmp,idx]=sort(-dx0); % sort dx from large to small ex=dx0(idx); vx=vx(:,idx); % sort column orders of eigenvectors too figure(1),clf, subplot(211),plot([1:N],xmean,'-',[1:N],xstd,':'), legend('Mean','Std Dev'), subplot(212), stem([1:N],ex),title('eigenvalues of covariance matrix'), disp(['# of larger eigenvalues, if < ' int2str(N) ' may hint on # of clusters']); figure(2), if N > 2, imagesc(x'),colormap('gray'),title('display of vectors'), elseif N==2, plot(x(:,1),x(:,2),'.g'),title('distribution of data'), end end while count==1, % while not done yet, if nargin<2|c==0, c=input('Enter number of cluster to use: '); if isempty(c)|c<=0, c=input('# of cluster must be a positive integer. Please enter again: '); end W0 = vx(:,1:c)'; % initilize with eigenvectors else W0=0.1*randn(c,N)+ones(c,1)*xmean; % intialize cluster center end [W,iter,Sw,Sb,Covar]=kmeansf(x,W0,er,itmax); if iterPrints out error values. options(14) = 1000; % Max. number of iterations. options(5) = 1; % prevent covariance collapse [mix, options, errlog] = gmmem(mix, x, options); NumIts=size(errlog,1); if NumIts == options(14) warning(' EM ran out of iterations!',1); end W=mix.centres; Covar=mix.covars; priors=mix.priors; % Validity of clustering using different criteria % (1) if c > 1, display center distance table, and variance table if c > 1, cdtable=dist(W,W); disp('the distance between cluster centers are: ') cdtable disp(['averge cluster-cluster center distance = ' ... num2str(sum(sum(cdtable))/(c*c-c))]); stdtable=[]; for i=1:c, stdtable=[stdtable sqrt(sort(eig(Covar(:,:,i))))]; end disp('standard deviations of each Gaussian cluster are (each column per cluster):') disp('last row is the ratio between smallest to largest std dev for each cluster') stdtable=[stdtable; stdtable(1,:)./stdtable(N,:)], % N+1 X C end % (2) visualize the clustered data for up to 6 clusters [dd,member]=kmeantest(x,W); % display each cluster if c <=6 % if feature dimension <=2, plot the points, otherwise % visualize the feature vector using imagesc if c <=6, figure(1),clf, if N > 2, for i=1:c, if c > 1, eval(['subplot(' int2str(c) '2' int2str(i) '),']), end imagesc(x(find(member==i),:)'), title(['cluster # ' int2str(i)]), end elseif N==2, % figure out the extent of x in the current class xmax=max(x); xmin=min(x); axis=[xmin(1)-.1 xmax(1)+.1 xmin(2)-.1 xmax(2)+.1]; hold on, for i=1:c, idx=find(member==i); plot(x(idx,1),x(idx,2),'.'), end hold off, elseif N==1, xmax=max(x); xmin=min(x); axis=[xmin(1)-.1 xmax(1)+.1 -.1 .1]; hold on, for i=1:c, idx=find(member==i); plot(x(idx,1),0,'.'), end hold off, end % if N > 2 end % if c < 6 % (3) evaluate clustering separations measure (CSM) if c > 1, std=zeros(1,c); for i=1:c, std(i)=mean(sqrt(sort(eig(Covar(:,:,i))))); % std is 1 x c end gsmeasure=csm(W,std'); disp(['Cluster separation measure: CSM(' int2str(c) ') = ' num2str(gsmeasure)]); end if nargin < 2 | c == 0, disp('Enter 0 (default) if current # of clusters is satisfactory.'); chos=input('Otherwise, enter 1 or other number to continue: '); if isempty(chos)|chos==0, count=0; % done else count=1; % not done end end end