|Table of Contents|

[1] Lu Yansheng, Hu Rong, Zou Lei, Zhou Chong, et al. Mining maximal pattern-based subspace clustersin high dimensional space [J]. Journal of Southeast University (English Edition), 2006, 22 (4): 490-495. [doi:10.3969/j.issn.1003-7985.2006.04.010]
Copy

Mining maximal pattern-based subspace clustersin high dimensional space()
高维空间基于样式相似性的最大子空间聚类
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
22
Issue:
2006 4
Page:
490-495
Research Field:
Computer Science and Engineering
Publishing date:
2006-12-30

Info

Title:
Mining maximal pattern-based subspace clustersin high dimensional space
高维空间基于样式相似性的最大子空间聚类
Author(s):
Lu Yansheng Hu Rong Zou Lei Zhou Chong
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
卢炎生 胡蓉 邹磊 周翀
华中科技大学计算机科学与技术学院, 武汉 430074
Keywords:
subspace clustering pattern similarity maximal pattern-based subspace clusters
子空间聚类 样式相似性 基于样式相似性的最大子空间聚类
PACS:
TP311
DOI:
10.3969/j.issn.1003-7985.2006.04.010
Abstract:
The problem of pattern-based subspace clustering, a special type of subspace clustering that uses pattern similarity as a measure of similarity, is studied.Unlike most traditional clustering algorithms that group the close values of objects in all the dimensions or a set of dimensions, clustering by pattern similarity shows an interesting pattern, where objects exhibit a coherent pattern of rise and fall in subspaces.A novel approach, named EMaPle to mine the maximal pattern-based subspace clusters, is designed.The EMaPle searches clusters only in the attribute enumeration spaces which are relatively few compared to the large number of row combinations in the typical datasets, and it exploits novel pruning techniques.EMaPle can find the clusters satisfying coherent constraints, size constraints and sign constraints neglected in MaPle.Both synthetic data sets and real data sets are used to evaluate EMaPle and demonstrate that it is more effective and scalable than MaPle.
研究了基于样式相似性的子空间聚类问题, 使用样式相似性作为相似性度量.与在所有维或者子维集上聚集距离相近的对象的传统聚类方法不同的是, 样式相似性寻找的是这样一种有趣的样式:对象在子维上呈现出相同起伏的一致变化.提出了一种挖掘基于样式相似性的最大子空间聚类的方法EMaPle.一般情况下数据集属性数目远小于对象数目, 因此仅在属性计数空间查找簇, 然后运用一些修剪策略.该方法能够找到同时满足一致性约束、大小约束和被MaPle忽视了的符号约束的聚类.在合成和实际数据集上的实验结果表明该算法优于原来的MaPle算法.

References:

[1] Han J, Kamber M.Data mining:concepts and techniques[M].San Francisco:Morgan Kaufmann, 2001.
[2] Cheng Y, Church G M.Biclustering of expression data [A].In:Proceedings of the 8th International Conference on Intelligent System for Molecular Biology[C].San Diego, CA, 2000.93-103.
[3] Han J, Ng R T.Efficient and effective clustering method for spatial data mining [A].In:Proceedings of the 8th International Conference on Very Large Data Bases[C].Santiago, Chile, 1994.144-155.
[4] Ester M, Kriegel H P, Sander J, et al.A density-based algorithm for discovering clusters in large spatial databases with noise [A].In:Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining[C].Portland, Oregon, 1996.226-231.
[5] Guha S, Rastogi R, Shim K.CURE:an efficient clustering algorithm for large databases.[A].In:Proceedings of ACM SIGMOD International Conference on Management of Data[C].Seattle, USA, 1998.73-84.
[6] Beyer K, Goldstein J, Ramakrishnan R, et al.When is “nearest neighbor” meaningful [A].In:Proceedings of the 7th International Conference on Database Theory[C].Jerusalem, Israel, 1999.217-235.
[7] Agrawal R, Gehrke J, Gunopulos D, et al.Automatic subspace clustering of high dimensional data for data mining applications[A].In:Proceedings of the ACM SIGMOD International Conference Management of Data[C].Seattle, USA, 1998.94-105.
[8] Aggarwal C C, Procopiuc C, Wolf J L, et al.Fast algorithms for projected clustering [A].In:Proceedings of the ACM SIGMOD International Conference on Management of Data[C].Pennsylvania, USA, 1999.61-72.
[9] Aggarwal C C, Yu P S.Finding generalized projected clusters in high dimensional spaces [A].In:Proceedings of the ACM SIGMOD International Conference on Management of Data[C].Dallas, Texas, 2000.70-81.
[10] Goil S, Nagesh H, Choudhary A.Mafia:efficient and scalable subspace clustering for very large data sets[R].Evanston:Northwestern University, 1999.
[11] Yang J, Wang W, Wang H, et al.Delta-clusters:capturing subspace correlation in a large data set [A].In:Proceedings of the 18th International Conference on Data Engineering [C].San Jose, CA, 2002.517-528.
[12] Wang H, Wang W, Yang J, et al.Clustering by pattern similarity in large data sets [A].In:Proceedings of the ACM SIGMOD International Conference on Management of Data[C].Madison, Wisconsin, 2002.394-405.
[13] Pei J, Zhang X, Cho M, et al.Maple:a fast algorithm for maximal pattern-based clustering [A].In:Proceedings of the Third IEEE International Conference on Data Mining[C].Florida, USA, 2003.259-266.

Memo

Memo:
Biography: Lu Yansheng(1949—), male, professor, lys@mail.hust.edu.cn.
Last Update: 2006-12-20