|Table of Contents|

[1] Mao Qinjiao, Feng Boqin, Pan Shanliang,. Latent semantic analysis for query interfaces of deep web sites [J]. Journal of Southeast University (English Edition), 2008, 24 (3): 312-314. [doi:10.3969/j.issn.1003-7985.2008.03.014]
Copy

Latent semantic analysis for query interfaces of deep web sites()
Deep web站点查询界面的潜在语义分析
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
24
Issue:
2008 3
Page:
312-314
Research Field:
Computer Science and Engineering
Publishing date:
2008-09-30

Info

Title:
Latent semantic analysis for query interfaces of deep web sites
Deep web站点查询界面的潜在语义分析
Author(s):
Mao Qinjiao1, Feng Boqin1, Pan Shanliang2
1Department of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
2Information Science and Engineering Institute, Ningbo University, Ningbo 315211, China
茅琴娇1, 冯博琴1, 潘善亮2
1西安交通大学计算机科学与技术系, 西安 710049; 2宁波大学信息科学与工程学院, 宁波 315211
Keywords:
deep web information retrieval latent semantic analysis singular value decomposition
deep web 信息检索 潜在语义分析 奇异值分解
PACS:
TP311
DOI:
10.3969/j.issn.1003-7985.2008.03.014
Abstract:
To further enhance the efficiencies of search engines, achieving capabilities of searching, indexing and locating the information in the deep web, latent semantic analysis is a simple and effective way.Through the latent semantic analysis of the attributes in the query interfaces and the unique entrances of the deep web sites, the hidden semantic structure information can be retrieved and dimension reduction can be achieved to a certain extent.Using this semantic structure information, the contents in the site can be inferred and the similarity measures among sites in deep web can be revised.Experimental results show that latent semantic analysis revises and improves the semantic understanding of the query form in the deep web, which overcomes the shortcomings of the keyword-based methods.This approach can be used to effectively search the most similar site for any given site and to obtain a site list which conforms to the restrictions one specifies.
为了进一步提高搜索引擎的效率, 实现对deep web中所蕴含的大量有用信息的检索、索引和定位, 引入潜在语义分析理论是一种简单而有效的方法.通过对作为deep web站点入口的查询界面里的表单属性进行潜在语义分析, 从表单属性中挖掘出潜在语义结构, 并实现一定程度上的降维.利用这种潜在语义结构, 推断对应站点的数据内容并改善不同站点的相似度计算.实验结果显示, 潜在语义分析修正和改善了deep web站点的表单属性的语义理解, 弥补了单纯的关键字匹配带来的一些不足.该方法可以被用来实现为某一站点查找网络上相似度高的站点及通过键入表单属性给出拥有相似表单的站点列表.

References:

[1] Bergman M K.The deep web:surfacing hidden value[EB/OL].(2001-08)[2008-03-25].http://www.completeplanet.com/Tutorials/DeepWeb/index.asp.
[2] Zhang Z, He B, Chuan K C.Understanding web query interfaces:best-effort parsing with hidden syntax[C]//Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data.Paris, France, 2004:107-118.
[3] Zhang Z, He B, Chuan K C.Light-weight domain-based form assistant:querying web databases on the fly[C]//Proceedings of the 31st International Conference on Very Large Data Bases.Trondheim, Norway:VLDB Endowment, 2005:97-108.
[4] Kabra G, Li C, Chuan K C.Query routing:finding ways in the Maze of the deep web[C]//Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration.Washington, DC, USA:IEEE Computer Society, 2005:64-73.
[5] Caverlee J, Liu L, Buttler D.Probe, cluster, and discover:focused extraction of QA-Pagelets from the deep web[C]//Proceedings of the 20th International Conference on Data Engineering.Washington, DC, USA:IEEE Computer Society, 2004:103.
[6] Ipeirotis P G, Gravano L, Sahami M.Probe, count, and classify: categorizing hidden web databases[C]//Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data.Santa Barbara, CA, USA, 2001:67-78.
[7] Raghavan S, Garcia-Molina H.Crawling the hidden web[C]//Proceedings of the 27th International Conference on Very Large Data Bases.San Francisco, CA, USA:Morgan Kaufmann Publishers Inc, 2001:129-138.
[8] Ntoulas A, Zerfos P, Cho J.Downloading textual hidden web content through keyword queries[C]//Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries.Denver, CO, USA, 2005:100-109.
[9] Wang Huili, Liu Wenyu.The latent semantic analysis:rationale and application [J].Journal of Huazhong University of Science and Technology:Social Science Edition, 2004, 18(4):91-94.(in Chinese)
[10] Oates T, Bhat V, Shanbhag V.Using latent semantic analysis to find different names for the same entity in free text[C]//Proceedings of the 4th International Workshop on Web Information and Data Management.McLean, Virginia, USA, 2002:31-35.
[11] Li Li, Zhang Taihong, Li Xia.Application of latent semantic analysis to Chinese text classification[J].Journal of Xinjiang Agricultural University, 2006, 29(2):99-102.(in Chinese)
[12] The UIUC web integration repository[EB/OL].(2003)[2008-03-15].http://metaquerier.cs.uiuc.edu/repository.

Memo

Memo:
Biographies: Mao Qinjiao(1983—), female, graduate;Feng Boqin(corresponding author), male, professor, bqfeng@mail.xjtu.edu.cn.
Citation: Mao Qinjiao, Feng Boqin, Pan Shanliang.Latent semantic analysis for query interfaces of deep web sites[J].Journal of Southeast University(English Edition), 2008, 24(3):312-314.
Last Update: 2008-09-20