|Table of Contents|

[1] Zeng Hong, Lu Wei, Song Aiguo,. Gaussian mixture model clusteringwith completed likelihood minimum message length criterion [J]. Journal of Southeast University (English Edition), 2013, 29 (1): 43-47. [doi:10.3969/j.issn.1003-7985.2013.01.009]
Copy

Gaussian mixture model clusteringwith completed likelihood minimum message length criterion()
基于完整似然最短信息长度准则的高斯混合模型聚类
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
29
Issue:
2013 1
Page:
43-47
Research Field:
Automation
Publishing date:
2013-03-20

Info

Title:
Gaussian mixture model clusteringwith completed likelihood minimum message length criterion
基于完整似然最短信息长度准则的高斯混合模型聚类
Author(s):
Zeng Hong1 Lu Wei2 Song Aiguo1
1 School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China
2 College of Engineering, Nanjing Agricultural University, Nanjing 210031, China
曾洪1 卢伟2 宋爱国1
1东南大学仪器科学与工程学院, 南京 210096; 2南京农业大学工学院, 南京 210031
Keywords:
Gaussian mixture model non-Gaussian distribution model selection expectation-maximization algorithm completed likelihood minimum message length criterion
高斯混合模型 非高斯分布 模型选择 期望最大化算法 完整似然最短信息长度准则
PACS:
TP181
DOI:
10.3969/j.issn.1003-7985.2013.01.009
Abstract:
An improved Gaussian mixture model(GMM)-based clustering method is proposed for the difficult case where the true distribution of data is against the assumed GMM. First, an improved model selection criterion, the completed likelihood minimum message length criterion, is derived. It can measure both the goodness-of-fit of the candidate GMM to the data and the goodness-of-partition of the data. Secondly, by utilizing the proposed criterion as the clustering objective function, an improved expectation-maximization(EM)algorithm is developed, which can avoid poor local optimal solutions compared to the standard EM algorithm for estimating the model parameters. The experimental results demonstrate that the proposed method can rectify the over-fitting tendency of representative GMM-based clustering approaches and can robustly provide more accurate clustering results.
针对数据真实的概率分布不符合事先假设的高斯混合模型的情形, 提出了一种鲁棒的基于高斯混合模型的聚类方法.首先, 提出了一种新的模型选择准则, 即完整似然最短信息长度准则.该准则不仅能衡量模型对数据的拟合优度, 还能度量该模型对数据分组的性能.然后, 将该准则作为聚类的代价函数, 提出了一种新的期望最大化算法来估计模型参数.与标准的期望最大化算法相比, 新算法能较好地避免不理想的局部最优解.实验结果表明:当数据概率分布模型不符合假设的高斯混合模型时, 所提方法可克服现有的基于高斯混合模型聚类方法过拟合的缺点, 鲁棒地得到准确的聚类结果.

References:

[1] Zeng H, Cheung Y M. Feature selection and kernel learning for local learning based clustering [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8):1532-1547.
[2] Jain A K. Data clustering: 50 years beyond K-means [J]. Pattern Recognition Letters, 2010, 31(8):651-666.
[3] Bouguila N, Almakadmeh K, Boutemedjet S. A finite mixture model for simultaneous high-dimensional clustering, localized feature selection and outlier rejection [J]. Expert Systems with Applications, 2012, 39(7): 6641-6656.
[4] Law M H C, Figueiredo M A T, Jain A K. Simultaneous feature selection and clustering using mixture models [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(9):1154-1166.
[5] Markley S C, Miller D J. Joint parsimonious modeling and model order selection for multivariate Gaussian mixtures [J]. IEEE Journal of Selected Topics in Signal Processing, 2010, 4(3):548-559.
[6] Li Y, Dong M, Hua J. Localized feature selection for clustering [J]. Pattern Recognition Letters, 2008, 29(1):10-18.
[7] Allili M S, Ziou D, Bouguila N, et al. Image and video segmentation by combining unsupervised generalized Gaussian mixture modeling and feature selection [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2010, 20(10):1373-1377.
[8] Fan W, Bouguila N, Ziou D, Unsupervised hybrid feature extraction selection for high-dimensional non-Gaussian data clustering with variational inference [J]. IEEE Transactions on Knowledge and Data Engineering, 2012, in press.
[9] Figueiredo M A F, Jain A K. Unsupervised learning of finite mixture models [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(3): 381-396.
[10] Wallace C S, Dowe D L. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions [J]. Statistics and Computing, 2000, 10(1): 73-83.

Memo

Memo:
Biography: Zeng Hong(1981—), male, doctor, lecturer, hzeng@seu.edu.cn.
Foundation items: The National Natural Science Foundation of China(No.61105048, 60972165), the Doctoral Fund of Ministry of Education of China(No.20110092120034), the Natural Science Foundation of Jiangsu Province(No.BK2010240), the Technology Foundation for Selected Overseas Chinese Scholar, Ministry of Human Resources and Social Security of China(No.6722000008), and the Open Fund of Jiangsu Province Key Laboratory for Remote Measuring and Control(No.YCCK201005).
Citation: Zeng Hong, Lu Wei, Song Aiguo.Gaussian mixture model clustering with completed likelihood minimum message length criterion[J].Journal of Southeast University(English Edition), 2013, 29(1):43-47.[doi:10.3969/j.issn.1003-7985.2013.01.009]
Last Update: 2013-03-20