|Table of Contents|

[1] Xu Junling, Xu Baowen, Zhang Weifeng, et al. Heuristic feature selection method for clustering [J]. Journal of Southeast University (English Edition), 2006, 22 (2): 169-175. [doi:10.3969/j.issn.1003-7985.2006.02.006]
Copy

Heuristic feature selection method for clustering()
一种启发式聚类特征选择方法
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
22
Issue:
2006 2
Page:
169-175
Research Field:
Computer Science and Engineering
Publishing date:
2006-06-30

Info

Title:
Heuristic feature selection method for clustering
一种启发式聚类特征选择方法
Author(s):
Xu Junling1 Xu Baowen1 2 Zhang Weifeng1 3 Cui Zifeng1
1 School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
2 State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China
3College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
徐峻岭1 徐宝文1 2 张卫丰1 3 崔自峰1
1 东南大学计算机科学与工程学院, 南京 210096; 2 武汉大学软件工程国家重点实验室, 武汉 430072; 3南京邮电大学计算机学院, 南京 210003
Keywords:
feature selection clustering unsupervised learning
特征选择 聚类 无监督学习
PACS:
TP391
DOI:
10.3969/j.issn.1003-7985.2006.02.006
Abstract:
In order to enable clustering to be done under a lower dimension, a new feature selection method for clustering is proposed.This method has three steps which are all carried out in a wrapper framework.First, all the original features are ranked according to their importance.An evaluation function E(f)used to evaluate the importance of a feature is introduced.Secondly, the set of important features is selected sequentially.Finally, the possible redundant features are removed from the important feature subset.Because the features are selected sequentially, it is not necessary to search through the large feature subset space, thus the efficiency can be improved.Experimental results show that the set of important features for clustering can be found and those unimportant features or features that may hinder the clustering task will be discarded by this method.
为了使聚类可以在低维数据空间中进行, 提出了一种新的聚类特征选择方法.该方法分3个步骤, 每个步骤都在一个wrapper框架中执行.首先, 将所有原始特征都按照重要性进行排序, 引入一个特征重要性评价函数E(f);然后, 顺序地选择特征组成重要特征子集;最后, 去除重要特征子集中可能存在的冗余特征.由于是顺序选择特征而不是在巨大的特征子集空间中进行搜索, 因此算法效率很高.实验结果表明该方法可以找出有助于聚类的重要特征子集, 并且可以去掉那些不利于聚类的特征.

References:

[1] Dy J G, Brodley C E.Feature subset selection and order identification for unsupervised learning [A].In:Proceedings of the 17th International Conference on Machine Learning [C].Stanford, 2000.247-254.
[2] Han J, Kamber M.Data mining:concepts and techniques[M].San Francisco:Morgan Kaufmann, 2001.
[3] Agrawal R, Gehrke J, Gunopulos D, et al.Automatic subspace clustering of high dimensional data for data mining applications [A].In:Proceedings of ACM SIGMOD International Conference on Management of Data [C].Seattle, Washington, DC, 1998.94-105.
[4] Ganti V, Gehrke J, Ramakrishnan R.CACTUS-clustering categorical data using summaries [A].In:Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [C].San Diego, 1999.73-83.
[5] Dash M, Liu H.Feature selection for clustering [A].In:Proceedings of the Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining [C].Kyoto, Japan, 2000.110-121.
[6] Kohavi R, John G H.Wrappers for feature subset selection [J].Artificial Intelligence, 1997, 97(1, 2):273-324.
[7] Talavera L.Dependency-based feature selection for clustering symbolic data [J].Intelligent Data Analysis, 2000, 4(1):19-28.
[8] Dash M, Choi K, Scheuermann P, et al.Feature selection for clustering—a filter solution [A].In:Proceedings of the 2002 IEEE International Conference on Data Mining [C].Maebashi City, Japan, 2002.115-122.
[9] Aggarwal C C, Procopiuc C M, Wolf J L, et al.Fast algorithms for projected clustering [A].In:Proceedings of ACM SIGMOD International Conference on Management of Data [C].Philadelphia, 1999.61-72.
[10] Agrawal R, Srikant R.Fast algorithms for mining association rules in large databases [A].In:Proceedings of the 20th International Conference on Very Large Data Bases [C].Santiago, Chile, 1994.487-499.
[11] Kim Y S, Street W N, Menczer F.Feature selection in unsupervised learning via evolutionary search [A].In:Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [C].Boston, 2000.365-369.
[12] Dy J G, Brodley C E.Visualization and interactive feature selection for unsupervised data [A].In:Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [C].Boston, 2000.360-364.
[13] Devaney M, Ram A.Efficient feature selection in conceptual clustering [A].In:Proceedings of the 14th International Conference on Machine Learning [C].Nashville, Tennessee, USA, 1997.92-97.
[14] Talavera L.Feature selection as a preprocessing step for hierarchical clustering [A].In:Proceedings of the 16th International Conference on Machine Learning [C].Bled, Slovenia, 1999.389-397.
[15] Fukunaga K.Statistical pattern recognition. 2nd ed [M].San Diego:Academic Press, 1990.
[16] Blake C, Merz C.UCI repository of machine learning database [EB/OL].http://www.ics.uci.edu/~mlearn/MLRepository.html.1998/2005-10-05.

Memo

Memo:
Biographies: Xu Junling(1984—), male, graduate;Xu Baowen(corresponding author), male, doctor, professor, bwxu@seu.edu.cn.
Last Update: 2006-06-20