|Table of Contents|

[1] Xu Junling, Xu Baowen, Zhang Weifeng, et al. Heuristic feature selection method for clustering [J]. Journal of Southeast University (English Edition), 2006, 22 (2): 169-175. [doi:10.3969/j.issn.1003-7985.2006.02.006]
Copy

Heuristic feature selection method for clustering()
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
22
Issue:
2006 2
Page:
169-175
Research Field:
Computer Science and Engineering
Publishing date:
2006-06-30

Info

Title:
Heuristic feature selection method for clustering
Author(s):
Xu Junling1 Xu Baowen1 2 Zhang Weifeng1 3 Cui Zifeng1
1 School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
2 State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China
3College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
Keywords:
feature selection clustering unsupervised learning
PACS:
TP391
DOI:
10.3969/j.issn.1003-7985.2006.02.006
Abstract:
In order to enable clustering to be done under a lower dimension, a new feature selection method for clustering is proposed.This method has three steps which are all carried out in a wrapper framework.First, all the original features are ranked according to their importance.An evaluation function E(f)used to evaluate the importance of a feature is introduced.Secondly, the set of important features is selected sequentially.Finally, the possible redundant features are removed from the important feature subset.Because the features are selected sequentially, it is not necessary to search through the large feature subset space, thus the efficiency can be improved.Experimental results show that the set of important features for clustering can be found and those unimportant features or features that may hinder the clustering task will be discarded by this method.

References:

[1] Dy J G, Brodley C E.Feature subset selection and order identification for unsupervised learning [A].In:Proceedings of the 17th International Conference on Machine Learning [C].Stanford, 2000.247-254.
[2] Han J, Kamber M.Data mining:concepts and techniques[M].San Francisco:Morgan Kaufmann, 2001.
[3] Agrawal R, Gehrke J, Gunopulos D, et al.Automatic subspace clustering of high dimensional data for data mining applications [A].In:Proceedings of ACM SIGMOD International Conference on Management of Data [C].Seattle, Washington, DC, 1998.94-105.
[4] Ganti V, Gehrke J, Ramakrishnan R.CACTUS-clustering categorical data using summaries [A].In:Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [C].San Diego, 1999.73-83.
[5] Dash M, Liu H.Feature selection for clustering [A].In:Proceedings of the Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining [C].Kyoto, Japan, 2000.110-121.
[6] Kohavi R, John G H.Wrappers for feature subset selection [J].Artificial Intelligence, 1997, 97(1, 2):273-324.
[7] Talavera L.Dependency-based feature selection for clustering symbolic data [J].Intelligent Data Analysis, 2000, 4(1):19-28.
[8] Dash M, Choi K, Scheuermann P, et al.Feature selection for clustering—a filter solution [A].In:Proceedings of the 2002 IEEE International Conference on Data Mining [C].Maebashi City, Japan, 2002.115-122.
[9] Aggarwal C C, Procopiuc C M, Wolf J L, et al.Fast algorithms for projected clustering [A].In:Proceedings of ACM SIGMOD International Conference on Management of Data [C].Philadelphia, 1999.61-72.
[10] Agrawal R, Srikant R.Fast algorithms for mining association rules in large databases [A].In:Proceedings of the 20th International Conference on Very Large Data Bases [C].Santiago, Chile, 1994.487-499.
[11] Kim Y S, Street W N, Menczer F.Feature selection in unsupervised learning via evolutionary search [A].In:Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [C].Boston, 2000.365-369.
[12] Dy J G, Brodley C E.Visualization and interactive feature selection for unsupervised data [A].In:Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [C].Boston, 2000.360-364.
[13] Devaney M, Ram A.Efficient feature selection in conceptual clustering [A].In:Proceedings of the 14th International Conference on Machine Learning [C].Nashville, Tennessee, USA, 1997.92-97.
[14] Talavera L.Feature selection as a preprocessing step for hierarchical clustering [A].In:Proceedings of the 16th International Conference on Machine Learning [C].Bled, Slovenia, 1999.389-397.
[15] Fukunaga K.Statistical pattern recognition. 2nd ed [M].San Diego:Academic Press, 1990.
[16] Blake C, Merz C.UCI repository of machine learning database [EB/OL].http://www.ics.uci.edu/~mlearn/MLRepository.html.1998/2005-10-05.

Memo

Memo:
Biographies: Xu Junling(1984—), male, graduate;Xu Baowen(corresponding author), male, doctor, professor, bwxu@seu.edu.cn.
Last Update: 2006-06-20