«Previous Article|Table of Contents|Next Article»

[1] Lu Yansheng, Hu Rong, Zou Lei, Zhou Chong, et al. Mining maximal pattern-based subspace clustersin high dimensional space [J]. Journal of Southeast University (English Edition), 2006, 22 (4): 490-495. [doi:10.3969/j.issn.1003-7985.2006.04.010]
Copy

Mining maximal pattern-based subspace clustersin high dimensional space()

高维空间基于样式相似性的最大子空间聚类

Share：

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:: 22
Issue:: 2006 4

Page:: 490-495

Research Field:: Computer Science and Engineering

Publishing date:: 2006-12-30

Info

Title:: Mining maximal pattern-based subspace clustersin high dimensional space

: 高维空间基于样式相似性的最大子空间聚类

Author(s):: Lu Yansheng, Hu Rong, Zou Lei, Zhou Chong; School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

: 卢炎生, 胡蓉, 邹磊, 周翀; 华中科技大学计算机科学与技术学院, 武汉 430074

Keywords:: subspace clustering; pattern similarity; maximal pattern-based subspace clusters

: 子空间聚类; 样式相似性; 基于样式相似性的最大子空间聚类

PACS:: TP311

DOI:: 10.3969/j.issn.1003-7985.2006.04.010

Abstract:: The problem of pattern-based subspace clustering, a special type of subspace clustering that uses pattern similarity as a measure of similarity, is studied.Unlike most traditional clustering algorithms that group the close values of objects in all the dimensions or a set of dimensions, clustering by pattern similarity shows an interesting pattern, where objects exhibit a coherent pattern of rise and fall in subspaces.A novel approach, named EMaPle to mine the maximal pattern-based subspace clusters, is designed.The EMaPle searches clusters only in the attribute enumeration spaces which are relatively few compared to the large number of row combinations in the typical datasets, and it exploits novel pruning techniques.EMaPle can find the clusters satisfying coherent constraints, size constraints and sign constraints neglected in MaPle.Both synthetic data sets and real data sets are used to evaluate EMaPle and demonstrate that it is more effective and scalable than MaPle.

: 研究了基于样式相似性的子空间聚类问题, 使用样式相似性作为相似性度量.与在所有维或者子维集上聚集距离相近的对象的传统聚类方法不同的是, 样式相似性寻找的是这样一种有趣的样式:对象在子维上呈现出相同起伏的一致变化.提出了一种挖掘基于样式相似性的最大子空间聚类的方法EMaPle.一般情况下数据集属性数目远小于对象数目, 因此仅在属性计数空间查找簇, 然后运用一些修剪策略.该方法能够找到同时满足一致性约束、大小约束和被MaPle忽视了的符号约束的聚类.在合成和实际数据集上的实验结果表明该算法优于原来的MaPle算法.

References:

[1] Han J, Kamber M.Data mining:concepts and techniques[M].San Francisco:Morgan Kaufmann, 2001.
[2] Cheng Y, Church G M.Biclustering of expression data [A].In:Proceedings of the 8th International Conference on Intelligent System for Molecular Biology[C].San Diego, CA, 2000.93-103.
[3] Han J, Ng R T.Efficient and effective clustering method for spatial data mining [A].In:Proceedings of the 8th International Conference on Very Large Data Bases[C].Santiago, Chile, 1994.144-155.
[4] Ester M, Kriegel H P, Sander J, et al.A density-based algorithm for discovering clusters in large spatial databases with noise [A].In:Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining[C].Portland, Oregon, 1996.226-231.
[5] Guha S, Rastogi R, Shim K.CURE:an efficient clustering algorithm for large databases.[A].In:Proceedings of ACM SIGMOD International Conference on Management of Data[C].Seattle, USA, 1998.73-84.
[6] Beyer K, Goldstein J, Ramakrishnan R, et al.When is “nearest neighbor” meaningful [A].In:Proceedings of the 7th International Conference on Database Theory[C].Jerusalem, Israel, 1999.217-235.
[7] Agrawal R, Gehrke J, Gunopulos D, et al.Automatic subspace clustering of high dimensional data for data mining applications[A].In:Proceedings of the ACM SIGMOD International Conference Management of Data[C].Seattle, USA, 1998.94-105.
[8] Aggarwal C C, Procopiuc C, Wolf J L, et al.Fast algorithms for projected clustering [A].In:Proceedings of the ACM SIGMOD International Conference on Management of Data[C].Pennsylvania, USA, 1999.61-72.
[9] Aggarwal C C, Yu P S.Finding generalized projected clusters in high dimensional spaces [A].In:Proceedings of the ACM SIGMOD International Conference on Management of Data[C].Dallas, Texas, 2000.70-81.
[10] Goil S, Nagesh H, Choudhary A.Mafia:efficient and scalable subspace clustering for very large data sets[R].Evanston:Northwestern University, 1999.
[11] Yang J, Wang W, Wang H, et al.Delta-clusters:capturing subspace correlation in a large data set [A].In:Proceedings of the 18th International Conference on Data Engineering [C].San Jose, CA, 2002.517-528.
[12] Wang H, Wang W, Yang J, et al.Clustering by pattern similarity in large data sets [A].In:Proceedings of the ACM SIGMOD International Conference on Management of Data[C].Madison, Wisconsin, 2002.394-405.
[13] Pei J, Zhang X, Cho M, et al.Maple:a fast algorithm for maximal pattern-based clustering [A].In:Proceedings of the Third IEEE International Conference on Data Mining[C].Florida, USA, 2003.259-266.

Memo

Memo:: Biography: Lu Yansheng(1949—), male, professor, lys@mail.hust.edu.cn.

Last Update: 2006-12-20

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Info

References:

Memo

Common functions

Navigate

Tools

Statistics