|Table of Contents|

[1] Luo Na, Zuo Wanli, Yuan Fuyu, et al. Using ontology semantics to improve text documents clustering [J]. Journal of Southeast University (English Edition), 2006, 22 (3): 370-374. [doi:10.3969/j.issn.1003-7985.2006.03.017]
Copy

Using ontology semantics to improve text documents clustering()
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
22
Issue:
2006 3
Page:
370-374
Research Field:
Automation
Publishing date:
2006-09-30

Info

Title:
Using ontology semantics to improve text documents clustering
Author(s):
Luo Na1 2 Zuo Wanli1 Yuan Fuyu1 Zhang Jingbo2 Zhang Huijie2
1 College of Computer Science and Technology, Jilin University, Changchun 130012, China
2 School of Computer Science, Northeast Normal University, Changchun 130024, China
Keywords:
ontology text clustering lexicon WordNet
PACS:
TP181
DOI:
10.3969/j.issn.1003-7985.2006.03.017
Abstract:
In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering.A new document clustering algorithm based WordNet in the phrase of document processing is proposed.First, every word vector by new entities is extended after the documents are represented by tf-idf.Then the feature extracting algorithm is applied for the documents.Finally, the algorithm of ontology aggregation clustering(OAC)is proposed to improve the result of document clustering.Experiments are based on the data set of Reuters 20 News Group, and experimental results are compared with the results obtained by mutual information(MI).The conclusion draws that the proposed algorithm of document clustering based on ontology is better than the other existed clustering algorithms such as MNB, CLUTO, co-clustering, etc.

References:

[1] Kim H J, Lee S G.A semi-supervised document clustering technique for information and organization [A].In:Proc of the Ninth International Conference on Information and Knowledge Management [C].McLean, Virginia, 2002.159-168.
[2] Brusilovsky P.Methods and techniques of adaptive hypermedia [J].User Modeling and User Adapted Interaction, 1996, 6(2, 3):87-129.
[3] Berners-Lee T, Hendler J, Lassila O.The semantic web[J].Scientific American, 2001, 184(5):34-43.
[4] Abdelali Ahmed, Cowie James, Farwell David, et al.Cross-language information retrieval using ontology [A].In:Proc of Traitment Automatique des Languages Naturelles [C].Batz-sur-Mer, France, 2003.236-248.
[5] Porter M F.An algorithm for suffix stripping [J].Program, 1980, 14(3):130-137.
[6] Gruber T.A translation approach to portable ontology specifications [J].An International Journal of Knowledge Acquisition for Knowledge-Based Systems, 1993, 5(2):62-69.
[7] Miller G.WordNet:a lexical database for English [J].Communications of the Association for Computing Machinery, 1995, 38(11):39-41.
[8] Karypis G, Zhao Y. Evaluation of hierarchical clustering algorithms for document datasets [A]. In: Proc of the International Conference on Information and Knowledge Management[C]. New York, 2002. 515-524.
[9] Strehl A, Ghosh J. Cluster ensembles—a knowledge reuse framework for combining partitions [J]. Journal of Machine Learning Research, 2002, 3:583-617.
[10] Strehl A, Ghosh J, Mooney R J. Impact of similarity measures on web-page clustering [A]. In: Proc of AAAI Workshop on AI for Web Search[C]. Austin, Texas, 2000.58-64.

Memo

Memo:
Biographies: Luo Na(1980—), female, graduate;Zuo Wanli(corresponding author), male, doctor, professor, wanli@jlu.edu.cn.
Last Update: 2006-09-20