|Table of Contents|

[1] Yang Nan, Gao Jie, Xue Honghu, Liu Xiude, et al. Extracting and evaluating method of web dense cores [J]. Journal of Southeast University (English Edition), 2008, 24 (3): 276-280. [doi:10.3969/j.issn.1003-7985.2008.03.006]
Copy

Extracting and evaluating method of web dense cores()
Web紧密核的抽取和评价方法
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
24
Issue:
2008 3
Page:
276-280
Research Field:
Computer Science and Engineering
Publishing date:
2008-09-30

Info

Title:
Extracting and evaluating method of web dense cores
Web紧密核的抽取和评价方法
Author(s):
Yang Nan Gao Jie Xue Honghu Liu Xiude
School of Information, Renmin University of China, Beijing 100872, China
杨楠 高洁 薛鸿鹄 刘秀德
中国人民大学信息学院, 北京100872
Keywords:
dense cores link analysis hierarchical clustering modularity measure
紧密核 链接分析 层次聚类 模块化度量
PACS:
TP311
DOI:
10.3969/j.issn.1003-7985.2008.03.006
Abstract:
This paper focuses on some key problems in web community discovery and link analysis.Based on the topic-oriented technology, the characteristics of a bipartite graph are studied.An x bipartite core set is introduced to more clearly define extracting ways.By scanning the topic subgraph to construct x bipartite graph and then prune the graph with i and j, an x bipartite core set, which is also the minimum element of a community, can be found.Finally, a hierarchical clustering algorithm is applied to many x bipartite core sets and the dendrogram of the community inner construction is obtained.The correctness of the constructing and pruning method is proved and the algorithm is designed.The typical datasets in the experiment are prepared according to the way in HITS(hyperlink-induced topic search).Ten topics and four search engines are chosen and the returned results are integrated.The modularity, which is a measure of the strength of the community structure in the social network, is used to validate the efficiency of the proposed method.The experimental results show that the proposed algorithm is effective and efficient.
针对web社区的发现和链接分析技术的一些关键问题, 基于面向主题的技术, 重点研究了二分图的特征, 引入了x二分核集来更为明确地定义抽取的方法.通过扫描主题子图构造x二分图, 对该子图的(i, j)裁剪后得到x二分核集, 这也是社区的最小元素.最后, 对所抽取的所有x二分核集应用层次聚类的方法得到社区内部结构的树状图, 证明了构造和裁剪方法的正确性并设计了算法.实验采用HITS(hyperlink-induced topic search)算法中的典型数据集获取方法, 选择了10个主题和4个搜索引擎并综合返回的结果.采用社会网中测量社区结构强度的模块化度量来验证所提方法的有效性, 实验结果表明所提方法是有效并可行的.

References:

[1] BOUTELL.COM.WWW FAQs:How many websites are there?[EB/OL].(2007-02-15)[2008-03-15].http://www.boutell.com/newfaq/misc/sizeofweb.html.
[2] Kleinberg J, Lawrence S.The structure of the web[J].Science, 2001, 294(30):1849-1850.
[3] Kumar R, Raghavan R, Rajagopalan S, et al.Trawling the web for emerging cyber-communities[C]//Proc of the 8th Intl WWW Conf. Toronto, Canada, 1999:403-415.
[4] Gibson D, Kleinberg J, Raghavan P.Inferring web communities from link topology[C]//Proc of the 9th ACM Conf on Hypertext and Hypermedia. Pittsburgh, PA, USA, 1998:225-234.
[5] Kumar R, Raghavan P, Rajagopalan S, et al.Extracting large-scale knowledge bases from the web[C]//Proc of the 25th VLDB Conference. Edinburgh, Scotland, 1999:639-650.
[6] Dourisboure Y, Geraci F, Pellegrini M.Extraction and classification of dense communities in the web[C]//Proc of the 16th Intl WWW Conf.Banff, Alberta, Canada, 2007:461-470.
[7] Flake G W, Lawrence S, Giles C L.Efficient identification of Web communities[C]//Proc of the 6th ACM SIGKDD Intl Conf on Knowledge Discovery and Data Mining.Boston, MA, USA, 2000:150-160.
[8] Flake G W, Lawrence S, Giles C L, et al.Self-organization and identification of web communities[J].IEEE Computer, 2002, 35(3):66-71.
[9] Wang Xiaoyu, Lu Zhiguo, Zhou Aoying.Topic exploration and distillation for Web search by a similarity-based analysis[C]//Proc of the 3rd Intl Conf on Web-Age Information Management(WAIM’02). Beijing, China, 2002:316-327.
[10] Montfort N.Discovering communities through information structure and dynamics:a review of recent research.MS-CIS-04-18 [R].Philadelphia, USA:University of Pennsylvania, 2004.
[11] Newman M E J, Girvan M.Finding and evaluating community structure in networks[J].Phys Rev E, 2004, 69(2):026113.
[12] White S, Smyth P.A spectral clustering approach to finding communities in graphs[C]//SIAM International Conference on Data Mining. Philadelphia, USA, 2005:274-285.

Memo

Memo:
Biography: Yang Nan(1962—), male, associate professor, yangnan@ruc.edu.cn.
Foundation items: The National Natural Science Foundation of China(No.60773216), the National High Technology Research and Development Program of China(863 Program)(No.2006AA010109), the Natural Science Foundation of Renmin University of China(No.06XNB052), Free Exploration Project(985 Project of Renmin University of China)(No.21361231).
Citation: Yang Nan, Gao Jie, Xue Honghu, et al.Extracting and evaluating method of web dense cores[J].Journal of Southeast University(English Edition), 2008, 24(3):276-280.
Last Update: 2008-09-20