|Table of Contents|

[1] Wang Hongwei, Yi Lei, Wang Jianhui,. New text classification algorithmbased on interdependence and equivalent radius [J]. Journal of Southeast University (English Edition), 2007, 23 (1): 63-69. [doi:10.3969/j.issn.1003-7985.2007.01.014]
Copy

New text classification algorithmbased on interdependence and equivalent radius()
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
23
Issue:
2007 1
Page:
63-69
Research Field:
Automation
Publishing date:
2007-03-30

Info

Title:
New text classification algorithmbased on interdependence and equivalent radius
Author(s):
Wang Hongwei1 Yi Lei2 Wang Jianhui3
1School of Economics and Management, Tongji University, Shanghai 200092, China
2Department of Mathematics, Fudan University, Shanghai 200433, China
3School of Information Science and Engineering, Fudan University, Shanghai 200433, China
Keywords:
classification equivalent radius vector space interdependence interdependence and equivalent radius
PACS:
TP139
DOI:
10.3969/j.issn.1003-7985.2007.01.014
Abstract:
To improve the traditional classifying methods, such as vector space model(VSM)-based methods with highly complicated computation and poor scalability, a new classifying method(called IER)is presented based on two new concepts:interdependence and equivalent radius.In IER, the attribute is selected according to the value of interdependence, and the classifying rule is based on equivalent radius and center of gravity.The algorithm analysis shows that IER is good at classifying a large number of samples with higher scalability and lower computation complexity.After several experiments in classifying Chinese texts, the conclusion is drawn that IER outperforms k-nearest neighbor(kNN)and classifcation based on the center of classes(CCC)methods, so IER can be used online to automatically classify a large number of samples while keeping higher precision and recall.

References:

[1] Bian Zhaoqi, Zhang Xuegong.Pattern recognition [M].Beijing:Tsinghua University Press, 2001:9-43.(in Chinese)
[2] Burges C J C.A tutorial on support vector machines for pattern recognition [J].Data Mining and Knowledge Discovery, 1998, 2(2):955-974.
[3] Schapire R, Singer Y.BoosTexter:a boosting-based system for text categorization [J].Machine Learning, 2000, 39(2/3):135-168.
[4] Dasarathy Y.Minimal consistent set(MCS)identification for optimal nearest neighbor decision system terms design [J].IEEE Trans Syst Man Cybern, 1996, 24(3):511-517.
[5] Lam W, Ho C Y.Using a generalized instance set for automatic text categorization [C]//Proc of the 21th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval.Melbourne, Australia, 1998:81-89.
[6] Zhou Shuigeng.The research on Chinese text database and Chinese text processing [D].Shanghai:Fudan University, 2001.(in Chinese)
[7] Peng Fuchun, Schuurmans Dale.Self-supervised Chinese word segmentation [C]//Proc of the 4th Int Symposium on Intelligent Data Analysis.Cascais, Portugal, 2001:238-247.
[8] Sproat R, Shih C L A stochastic finite-state word segmentation algorithm for Chinese [J].Computational Linguistics, 1996, 22(3):377-404.
[9] Wang Jianhui, Hu Yunfa.An algorithm to classify documents based on equivalent radius. Technical report, No.021011346 [R].Shanghai:Fudan University, 2002.(in Chinese)
[10] Emerson Thomas.Segmenting Chinese in Unicode [C]//Proc of the 16th Int Unicode Conference.Amsterdam, Holland, 2000:1-10.

Memo

Memo:
Biography: Wang Hongwei(1973—), male, doctor, lecturer, hwwang@mail.tongji.edu.cn.
Last Update: 2007-03-20