|Table of Contents|

[1] Ji Xianghua, Chen Chao, Shao Zhengrong, Yu Nenghai(MOE-MS Key Laboratory of Multimedia Computing and Communication, et al. Fuzzy c-means text clustering based on topic concept sub-space [J]. Journal of Southeast University (English Edition), 2007, 23 (3): 439-442. [doi:10.3969/j.issn.1003-7985.2007.03.028]

Fuzzy c-means text clustering based on topic concept sub-space()

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

2007 3
Research Field:
Computer Science and Engineering
Publishing date:


Fuzzy c-means text clustering based on topic concept sub-space
Ji Xianghua1 Chen Chao2 Shao Zhengrong2 Yu Nenghai1(1MOE-MS Key Laboratory of Multimedia Computing and Communication University of Science and Technology of China Hefei 230027 China)
2Library, University of Science and Technology of China, Hefei 230027, China
TCS2FCM topic concept space fuzzy c-means clustering text clustering
To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space(TCS2FCM)is introduced for classifying texts.Five evaluation functions are combined to extract key phrases.Concept phrases, as well as the descriptions of final clusters, are presented using WordNet○R origin from key phrases.Initial centers and membership matrix are the most important factors affecting clustering performance.Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces.The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision.


[1] Zeng Huajun, He Qicai, Chen Zheng, et al.Learning to cluster web search results[C]//Proc of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM Press, 2004:210-217.
[2] Hearst M A, Pedersen J O.Reexamining the cluster hypothesis:scatter/gather on retrieval results[C]//Proc of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Zurich, Switzerland, 1996:76-84.
[3] Jain A K, Murty M N, Flynn P J.Data clustering:a review[J].ACM Computing Surveys, 1999, 31(3):264-323.
[4] Aiello Marco, Pegoretti Andrea.Textual article clustering in newspaper pages[R].Trento:University of Trento, 2004.
[5] Ferragina Paolo, Gulli Antonio.A personalized search engine based on web-snippet hierarchical clustering[C]//Special Interest Tracks and Posters of the 14th International Conference on World Wide Web.Chiba, Japan, 2005:801-810.
[6] Leouski Anton V, Croft W Bruce.An evaluation of techniques for clustering search results[R].Amherst:Computer Science Department of University of Massachusetts, 1996.
[7] Pal N R, Bezdek J C.On cluster validity for the fuzzy c-means model[J].IEEE Trans on Fuzzy Systems, 1995, 3(3):370-379.
[8] Chai Shengsan.Application of content words and co-citation clustering analysis to science structure studies [J].Journal of the China Society for Scientific and Technical Information, 1997, 16(1):69-74.(in Chinese)
[9] Fan Jiulun, Wu Chengmao.The new explanation of membership degree in FCM and its applications[J].Journal of Electronics, 2004, 32(2):350-352.(in Chinese)
[10] Xue Zhong, Xie Weixin.A initialization method of the fuzzy C-means clustering algorithm[J].Systems Engineering and Electronics, 1995, 17(11):64-69.(in Chinese)
[11] Hotho A, Staab S, Stumme G.Wordnet improves text document clustering[C]//Proc of the SIGIR 2003 Semantic Web Workshop.Toronto, Canada, 2003.
[12] Shehata Shady, Karray Fakhri, Kamel Mohamed.Enhancing text clustering using concept-based mining model[C]//Proc of the Sixth International Conference on Data Mining.Washington, DC:IEEE Computer Society, 2006:1043-1048.


Biographies: Ji Xianghua(1982—), male, graduate;Yu Nenghai(corresponding author), male, doctor, professor, ynh@ustc.edu.cn.
Last Update: 2007-09-20