«Previous Article|Table of Contents|Next Article»

[1] Guo Xi, Yang Xiaochun, Yu Ge, Li Guangao, et al. Choosing meaningful structure data for improving web search [J]. Journal of Southeast University (English Edition), 2008, 24 (3): 343-346. [doi:10.3969/j.issn.1003-7985.2008.03.022]
Copy

Choosing meaningful structure data for improving web search()

用于改善web搜索的结构化数据抽取技术

Share：

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:: 24
Issue:: 2008 3

Page:: 343-346

Research Field:: Computer Science and Engineering

Publishing date:: 2008-09-30

Info

Title:: Choosing meaningful structure data for improving web search

: 用于改善web搜索的结构化数据抽取技术

Author(s):: Guo Xi, Yang Xiaochun, Yu Ge, Li Guangao; College of Information Science and Engineering, Northeastern University, Shenyang 110004, China

: 郭茜, 杨晓春, 于戈, 李广翱; 东北大学信息科学与工程学院, 沈阳110004

Keywords:: web; semantic; attributes relationship; structure data; query expansion

: web; 语义; 属性关系; 结构化数据; 查询扩展

PACS:: TP311

DOI:: 10.3969/j.issn.1003-7985.2008.03.022

Abstract:: In order to improve the quality of web search, a new query expansion method by choosing meaningful structure data from a domain database is proposed.It categories attributes into three different classes, named as concept attribute, context attribute and meaningless attribute, according to their semantic features which are document frequency features and distinguishing capability features.It also defines the semantic relevance between two attributes when they have correlations in the database.Then it proposes trie-bitmap structure and pair pointer tables to implement efficient algorithms for discovering attribute semantic feature and detecting their semantic relevances.By using semantic attributes and their semantic relevances, expansion words can be generated and embedded into a vector space model with interpolation parameters.The experiments use an IMDB movie database and real texts collections to evaluate the proposed method by comparing its performance with a classical vector space model.The results show that the proposed method can improve text search efficiently and also improve both semantic features and semantic relevances with good separation capabilities.

: 为了提高web文本搜索质量, 提出了基于语义结构化数据的查询扩展方法.通过分析属性的语义特征(文档频率特征和辨识能力特征)将属性分为概念属性、背景属性和无用属性3类, 并且提出了衡量属性语义相关度的标准.设计了trie-bitmap和pair pointer table数据结构来实现发掘属性语义特征和检测属性语义相关度的有效算法.通过使用合适的属性和它们的语义关系, 可以为查询关键字生成扩展词并将它们嵌入到具有插值参数的向量空间模型中.实验使用IMDB电影数据库和真实文本数据集来比较所提方法和原始向量空间模型的性能.实验结果证明所提出的查询扩展方法可以有效地提高文本搜索性能, 同时属性语义特征和属性语义相关度都具有良好的分类能力.

References:

[1] Manning Christopher D, Raghavan Prabhakar, Schutze Hinrich. An introduction to information retrieval [M].Cambridge:Cambridge University Press, 2008:109-133;253-287.
[2] Billerbeck Bodo, Zobel Justin.Questioning query expansion:an examination of behavior and parameters [C]//Proc of the Fifteenth Australasian Database Conference.Dunedin, New Zealand, 2004:69-76.
[3] Custis Tonya, Al-Kofahi Khalid.A new approach for evaluating query expansion:query-document term mismatch [C]//Proc of the 30th Annual International ACM SIGIR Conference.New York:ACM Press, 2007:575-582.
[4] Cao Guihong, Nie Jianyun, Bai Jing.Integrating word relationships into language models [C]//Proc of the 28th Annual International ACM SIGIR Conference. New York:ACM Press, 2005:298-305.
[5] Crouch Carolyn J, Yang Bokyung.Experiments in automatic statistical thesaurus construction [C]//Proc of the 15th Annual International ACM SIGIR Conference. New York:ACM Press, 1992:77-88.
[6] Park Laurence A F, Ramamohanarao Kotagiri.Query expansion using a collection dependent probabilistic latent semantic thesaurus [C]//Proc of the 11th Pacific-Asia Conference, PAKDD. Nanjing, China, 2007:224-235.
[7] Fang Hui, Zhai Chengxiang.Semantic term matching in axiomatic approaches to information retrieval [C]//Proc of the 29th Annual International ACM SIGIR Conference.New York:ACM Press, 2006:115-122.
[8] Fonseca Bruno M, Golgher Paulo, Possas Bruno, et al.Concept-based interactive query expansion [C]//Proc of the 14th ACM International Conference on Information and Knowledge Management. New York:ACM Press, 2006:696-703.
[9] Nandi Arnab, Jagadish H V.Effective phrase prediction [C]//Proc of the 33rd International Conference on Very Large Data Bases. Vienna, Austria, 2007:219-230.
[10] Bast Holger, Weber Ingmar.Type less, find more:fast autocompletion search with a succint index [C]//Proc of the 29th Annual International ACM SIGIR Conference. New York:ACM Press, 2006:364-371.
[11] Chakaravarthy Venkatesan T, Gupta Himanshu, Roy Prasan, et al.Efficiently linking text documents with relevant structured information [C]//Proc of the 32nd International Conference on Very Large Data Bases. Seoul, Korea, 2006:667-678.
[12] Chakrabarti Kaushik, Ganti Venkatesh, Han Jiawei, et al.Ranking objects based on relationships [C]//Proc of the 2006 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2006:371-382.
[13] Gusfield Dan.Algorithms on strings, trees, and sequences:computer science and computational biology [M].New York:Cambridge University Press, 1997:16-67.
[14] Bodner Richard C, Song Fei.Knowledge-based approaches to query expansion in information retrieval [C]//Proc of the 11th Biennial Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence. Springer-Verlag, 1996:146-158.

Memo

Memo:: Biographies: Guo Xi(1983─), female, graduate;Yang Xiaochun(corresponding author), female, doctor, associate professor, yangxc@mail.neu.edu.cn.
Foundation items: Program for New Century Excellent Talents in University(No.NCET-06-0290), the National Natural Science Foundation of China(No.60503036), the Fok Ying Tong Education Foundation Award(No.104027).
Citation: Guo Xi, Yang Xiaochun, Yu Ge, et al.Choosing meaningful structure data for improving web search[J].Journal of Southeast University(English Edition), 2008, 24(3):343-346.

Last Update: 2008-09-20

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Info

References:

Memo

Common functions

Navigate

Tools

Statistics