|Table of Contents|

[1] Ji Xianghua, Chen Chao, Shao Zhengrong, Yu Nenghai(MOE-MS Key Laboratory of Multimedia Computing and Communication, et al. Fuzzy c-means text clustering based on topic concept sub-space [J]. Journal of Southeast University (English Edition), 2007, 23 (3): 439-442. [doi:10.3969/j.issn.1003-7985.2007.03.028]
Copy

Fuzzy c-means text clustering based on topic concept sub-space()
基于主题概念空间的文本模糊c-均值聚类方法
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
23
Issue:
2007 3
Page:
439-442
Research Field:
Computer Science and Engineering
Publishing date:
2007-09-30

Info

Title:
Fuzzy c-means text clustering based on topic concept sub-space
基于主题概念空间的文本模糊c-均值聚类方法
Author(s):
Ji Xianghua1 Chen Chao2 Shao Zhengrong2 Yu Nenghai1(1MOE-MS Key Laboratory of Multimedia Computing and Communication University of Science and Technology of China Hefei 230027 China)
2Library, University of Science and Technology of China, Hefei 230027, China
吉翔华1 陈超2 邵正荣2 俞能海1
1中国科学技术大学多媒体计算与通信教育部-微软重点实验, 合肥 230027; 2中国科学技术大学图书馆, 合肥 230027
Keywords:
TCS2FCM topic concept space fuzzy c-means clustering text clustering
TCS2FCM 主题概念空间 模糊c-均值聚类 文本聚类
PACS:
TP391
DOI:
10.3969/j.issn.1003-7985.2007.03.028
Abstract:
To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space(TCS2FCM)is introduced for classifying texts.Five evaluation functions are combined to extract key phrases.Concept phrases, as well as the descriptions of final clusters, are presented using WordNet○R origin from key phrases.Initial centers and membership matrix are the most important factors affecting clustering performance.Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces.The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision.
为了改善文本聚类的准确度, 提出用基于主题概念子空间的模糊c-均值聚类(TCS2FCM)方法来分类文本.采用5个评估函数的加权值来提取关键短语;利用WordNet○R对相应的关键短语提取概念短语并生成最后的类别描述.初始中心和初始隶属度矩阵的建立是决定模糊c-均值聚类效果的关键, 使用能够代表文本主题的概念短语来建立相互正交的主题概念子空间, 利用主题子空间中的概念向量来初始化聚类中心和隶属度矩阵.实验结果表明:不同于传统模糊c-均值聚类的随机化初始, 与文本内容相关的初始化有助于改进最后的聚类结果, 提高聚类精度.

References:

[1] Zeng Huajun, He Qicai, Chen Zheng, et al.Learning to cluster web search results[C]//Proc of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM Press, 2004:210-217.
[2] Hearst M A, Pedersen J O.Reexamining the cluster hypothesis:scatter/gather on retrieval results[C]//Proc of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Zurich, Switzerland, 1996:76-84.
[3] Jain A K, Murty M N, Flynn P J.Data clustering:a review[J].ACM Computing Surveys, 1999, 31(3):264-323.
[4] Aiello Marco, Pegoretti Andrea.Textual article clustering in newspaper pages[R].Trento:University of Trento, 2004.
[5] Ferragina Paolo, Gulli Antonio.A personalized search engine based on web-snippet hierarchical clustering[C]//Special Interest Tracks and Posters of the 14th International Conference on World Wide Web.Chiba, Japan, 2005:801-810.
[6] Leouski Anton V, Croft W Bruce.An evaluation of techniques for clustering search results[R].Amherst:Computer Science Department of University of Massachusetts, 1996.
[7] Pal N R, Bezdek J C.On cluster validity for the fuzzy c-means model[J].IEEE Trans on Fuzzy Systems, 1995, 3(3):370-379.
[8] Chai Shengsan.Application of content words and co-citation clustering analysis to science structure studies [J].Journal of the China Society for Scientific and Technical Information, 1997, 16(1):69-74.(in Chinese)
[9] Fan Jiulun, Wu Chengmao.The new explanation of membership degree in FCM and its applications[J].Journal of Electronics, 2004, 32(2):350-352.(in Chinese)
[10] Xue Zhong, Xie Weixin.A initialization method of the fuzzy C-means clustering algorithm[J].Systems Engineering and Electronics, 1995, 17(11):64-69.(in Chinese)
[11] Hotho A, Staab S, Stumme G.Wordnet improves text document clustering[C]//Proc of the SIGIR 2003 Semantic Web Workshop.Toronto, Canada, 2003.
[12] Shehata Shady, Karray Fakhri, Kamel Mohamed.Enhancing text clustering using concept-based mining model[C]//Proc of the Sixth International Conference on Data Mining.Washington, DC:IEEE Computer Society, 2006:1043-1048.

Memo

Memo:
Biographies: Ji Xianghua(1982—), male, graduate;Yu Nenghai(corresponding author), male, doctor, professor, ynh@ustc.edu.cn.
Last Update: 2007-09-20