|Table of Contents|

[1] Xue Yewei, Shen Junyi, Zhang Yun, Bao Junpeng, et al. Method of acquiring web features and its application in web search [J]. Journal of Southeast University (English Edition), 2008, 24 (3): 330-334. [doi:10.3969/j.issn.1003-7985.2008.03.019]

Method of acquiring web features and its application in web search()

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

2008 3
Research Field:
Computer Science and Engineering
Publishing date:


Method of acquiring web features and its application in web search
Xue Yewei Shen Junyi Zhang Yun Bao Junpeng
Department of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
web search relevance ranking retrieval effectiveness
Focusing on the problem that it is hard to utilize the web multi-fields information with various forms in large scale web search, a novel approach, which can automatically acquire features from web pages based on a set of well defined rules, is proposed.The features describe the contents of web pages from different aspects and they can be used to improve the ranking performance for web search.The acquired feature has the advantages of unified form and less noise, and can easily be used in web page relevance ranking.A special specs for judging the relevance between user queries and acquired features is also proposed.Experimental results show that the features acquired by the proposed approach and the feature relevance specs can significantly improve the relevance ranking performance for web search.


[1] Baeza-Yates R A, Ribeiro-Neto B A.Modern information retrieval [M].New York:Addison-Wesley, 1999:27-86.
[2] Brin S, Page L.The anatomy of a large-scale hypertextual web search engine [J].Computer Networks and ISDN Systems, 1998, 30(1/2/3/4/5/6/7):107-117.
[3] Ogilvie P, Callan J.Combining structural information and the use of priors in mixed named-page and homepage finding [C]//Proc of TREC’03.Gaithersburg:NIST Special Publication 500-251, 2003:177-184.
[4] Song R, Xin G, Shi S, et al.Exploring url hit priors for web search [C]//Proc of ECIR’06.Berlin:Springer, 2006:277-288.
[5] Westerveld T, Kraaij W, Hiemstra D.Retrieving web pages using content, links, urls and anchors [C]//Proc of TREC’01.Gaithersburg:NIST Special Publication 500-249, 2001:663-672.
[6] Robertson S E, Walker S, Hancock-Beaulieu M.Experimentation as a way of life:Okapi at TREC [J].Information Processing and Management, 2000, 36(1):95-108.
[7] Mittal V, Baluja S, Sahami M.Google tutorial on web information retrieval [C]//Proc of RIAO 2004 Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval.Avignon, France, 2004.
[8] Xue Y, Hu Y, Xin G, et al.Web page title extraction and its application [J].Information Processing and Management, 2007, 43(5):1332-1347.
[9] TREC.TREC data information [EB/OL].(2004-07-15)[2008-02-20].http://trec.nist.gov/data.html.
[10] Shi S, Song R, Wen J.Latent additivity:combining homogeneous evidence, MSR-TR-2006-110 [R].Microsoft Research, 2006.
[11] Craswell N, Hawking D.Overview of the TREC-2002 web track [C]//Proc of TREC’02.Gaithersburg:NIST Special Publication 500-251, 2003:78-92.


Biographies: Xue Yewei(1980—), male, graduate;Shen Junyi(corresponding author), male, professor, jyshen@mail.xjtu.edu.cn.
Foundation item: The National Natural Science Foundation of China(No.60673087).
Citation: Xue Yewei, Shen Junyi, Zhang Yun, et al.Method of acquiring web features and its application in web search[J].Journal of Southeast University(English Edition), 2008, 24(3):330-334.
Last Update: 2008-09-20