|Table of Contents|

[1] Zhang Weifeng, , Xu Baowen, et al. Document classification approachby rough-set-based corner classification neural network [J]. Journal of Southeast University (English Edition), 2006, 22 (3): 439-444. [doi:10.3969/j.issn.1003-7985.2006.03.032]
Copy

Document classification approachby rough-set-based corner classification neural network()
一种基于粗糙集角分类神经网络的文档分类方法
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
22
Issue:
2006 3
Page:
439-444
Research Field:
Automation
Publishing date:
2006-09-30

Info

Title:
Document classification approachby rough-set-based corner classification neural network
一种基于粗糙集角分类神经网络的文档分类方法
Author(s):
Zhang Weifeng1 2 3 Xu Baowen2 3 Cui Zifeng2 3 Xu Junling 2 3
1College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2College of Computer Science and Engineering, Southeast University, Nanjing 210096, China
3Jiangsu Institute of Software Quality, Nanjing 210096, China
张卫丰1 2 3 徐宝文2 3 崔自峰2 3 徐峻岭2 3
1南京邮电大学计算机学院, 南京 210003; 2东南大学计算机科学与工程学院, 南京 210096; 3江苏省软件质量研究所, 南京 210096
Keywords:
document classification neural network rough set meta search engine
文档分类 神经网络 粗糙集 元搜索引擎
PACS:
TP183
DOI:
10.3969/j.issn.1003-7985.2006.03.032
Abstract:
A rough set based corner classification neural network, the Rough-CC4, is presented to solve document classification problems such as document representation of different document sizes, document feature selection and document feature encoding.In the Rough-CC4, the documents are described by the equivalent classes of the approximate words.By this method, the dimensions representing the documents can be reduced, which can solve the precision problems caused by the different document sizes and also blur the differences caused by the approximate words.In the Rough-CC4, a binary encoding method is introduced, through which the importance of documents relative to each equivalent class is encoded.By this encoding method, the precision of the Rough-CC4 is improved greatly and the space complexity of the Rough-CC4 is reduced.The Rough-CC4 can be used in automatic classification of documents.
针对文档分类过程中不同大小文档表示、文档特征选择和文档特征编码问题, 提出了一种基于粗糙集的角分类神经网络Rough-CC4.利用近义词构成等价类, 以此表示文档, 可以缩小文档表示的维数、解决由于文档不同大小导致的精度问题、模糊近义词之间的差别;利用二进制编码方法对文档特征编码, 可以提高Rough-CC4的精度, 同时减小Rough-CC4的空间复杂度.Rough-CC4可以广泛用于大量文档集合的自动分类.

References:

[1] Karypis G, Han E H, Kumar V.CHAMELEON:a hierarchical clustering algorithm using dynamic modeling(No.99-007)[R].Department of Computer Science and Engineering of University of Minnesota, 1999.
[2] Guha S, Rastogi R, Shim K.CURE:an efficient clustering algorithm for large databases [A].In:Proc of the ACM SIGMOD Int’l Conf on Management of Data[C].Seattle, 1998.73-84.
[3] Zhang T, Ramakrishnan R, Livny M.BIRCH:an efficient data clustering method for very large databases [A].In:Proc of the ACM SIGMOD Int’l Conf on Management of Data[C].Montreal, Canada, 1996.103-114.
[4] Kamber M. Data mining concepts and techniques [M].Translated by Fan M, Meng X F. Beijing:China Machine Press, 2001.(in Chinese).
[5] Ordonez C, Omiecinski E.FREM:fast and robust EM clustering for large data sets [A].In:Proc of the ACM CIKM Int’l Conf on Information and Knowledge Management[C].McLean, 2002.590-599.
[6] Hinneburg A, Keim D.An efficient approach to clustering in large multimedia databases with noise [A].In:Proc of the 4th Int’l Conf on Knowledge Discovery and Data Mining (KDD’98)[C].New York:AAAI Press, 1998.58-65.
[7] Ankerst M, Breunig M M, Kriegel H P, et al.OPTICS:ordering points to identify the clustering structure [A].In:Proc of ACM SIGMOD Int’l Conf on Management of Data[C].Philadelphia, 1999.49-60.
[8] Ester M, Kriegel H, Sander J, et al.A density-based algorithm for discovering clusters in large spatial databases with noise [A].In:Proc of the 2nd Int’l Conf on Knowledge Discovery and Data Mining (KDD’96)[C].Portland, 1996.226-231.
[9] Song Q B, Shen J Y.A web document clustering algorithm based on association rule [J].Journal of Software, 2002, 13(3):417-423.(in Chinese)
[10] Wang W, Yang J, Muntz R R.STING:a statistical information grid approach to spatial data mining [A].In:Proc of the 23rd Int’l Conf on Very Large Data Bases[C].Athens, 1997.186-195.
[11] Sheikholeslami G, Chatterjee S, Zhang A D.WaveCluster:a multi-resolution clustering approach for very large spatial databases [A].In:Proc of the 24th Int’l Conf on Very Large Data Bases[C].New York, 1998.428-439.
[12] Rakesh A, Johanners G, Dimitrios G, Prabhakar R.Automatic subspace clustering of high dimensional data for data mining applications [A].In:Proc of the ACM SIGMOD Int’l Conf on Management of Data [C].Minneapolis, 1994.94-105.
[13] Shu B, Kak S.A neural network-based intelligent meta search engine [J].Information Sciences, 1999, 120(1):1-11.
[14] Chen Enhong.An extended corner classification neural network based classification approach [J].Journal of Software, 2002, 13(5):871-878.
[15] Pawlak, Z.Rough sets [J].International Journal of Computer and Information Sciences, 1982, 11(5):341-356.

Memo

Memo:
Biographies: Zhang Weifeng(1975—), male, doctor, associate professor, wfzhang@yahoo.com;Xu Baowen(1961—), male, doctor, professor, bwxu@seu.edu.cn.
Last Update: 2006-09-20