|Table of Contents|

[1] Mao Qinjiao, Feng Boqin, Pan Shanliang,. Latent semantic analysis for query interfaces of deep web sites [J]. Journal of Southeast University (English Edition), 2008, 24 (3): 312-314. [doi:10.3969/j.issn.1003-7985.2008.03.014]

Latent semantic analysis for query interfaces of deep web sites()

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

2008 3
Research Field:
Computer Science and Engineering
Publishing date:


Latent semantic analysis for query interfaces of deep web sites
Mao Qinjiao1 Feng Boqin1 Pan Shanliang2
1Department of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
2Information Science and Engineering Institute, Ningbo University, Ningbo 315211, China
deep web information retrieval latent semantic analysis singular value decomposition
To further enhance the efficiencies of search engines, achieving capabilities of searching, indexing and locating the information in the deep web, latent semantic analysis is a simple and effective way.Through the latent semantic analysis of the attributes in the query interfaces and the unique entrances of the deep web sites, the hidden semantic structure information can be retrieved and dimension reduction can be achieved to a certain extent.Using this semantic structure information, the contents in the site can be inferred and the similarity measures among sites in deep web can be revised.Experimental results show that latent semantic analysis revises and improves the semantic understanding of the query form in the deep web, which overcomes the shortcomings of the keyword-based methods.This approach can be used to effectively search the most similar site for any given site and to obtain a site list which conforms to the restrictions one specifies.


[1] Bergman M K.The deep web:surfacing hidden value[EB/OL].(2001-08)[2008-03-25].http://www.completeplanet.com/Tutorials/DeepWeb/index.asp.
[2] Zhang Z, He B, Chuan K C.Understanding web query interfaces:best-effort parsing with hidden syntax[C]//Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data.Paris, France, 2004:107-118.
[3] Zhang Z, He B, Chuan K C.Light-weight domain-based form assistant:querying web databases on the fly[C]//Proceedings of the 31st International Conference on Very Large Data Bases.Trondheim, Norway:VLDB Endowment, 2005:97-108.
[4] Kabra G, Li C, Chuan K C.Query routing:finding ways in the Maze of the deep web[C]//Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration.Washington, DC, USA:IEEE Computer Society, 2005:64-73.
[5] Caverlee J, Liu L, Buttler D.Probe, cluster, and discover:focused extraction of QA-Pagelets from the deep web[C]//Proceedings of the 20th International Conference on Data Engineering.Washington, DC, USA:IEEE Computer Society, 2004:103.
[6] Ipeirotis P G, Gravano L, Sahami M.Probe, count, and classify: categorizing hidden web databases[C]//Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data.Santa Barbara, CA, USA, 2001:67-78.
[7] Raghavan S, Garcia-Molina H.Crawling the hidden web[C]//Proceedings of the 27th International Conference on Very Large Data Bases.San Francisco, CA, USA:Morgan Kaufmann Publishers Inc, 2001:129-138.
[8] Ntoulas A, Zerfos P, Cho J.Downloading textual hidden web content through keyword queries[C]//Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries.Denver, CO, USA, 2005:100-109.
[9] Wang Huili, Liu Wenyu.The latent semantic analysis:rationale and application [J].Journal of Huazhong University of Science and Technology:Social Science Edition, 2004, 18(4):91-94.(in Chinese)
[10] Oates T, Bhat V, Shanbhag V.Using latent semantic analysis to find different names for the same entity in free text[C]//Proceedings of the 4th International Workshop on Web Information and Data Management.McLean, Virginia, USA, 2002:31-35.
[11] Li Li, Zhang Taihong, Li Xia.Application of latent semantic analysis to Chinese text classification[J].Journal of Xinjiang Agricultural University, 2006, 29(2):99-102.(in Chinese)
[12] The UIUC web integration repository[EB/OL].(2003)[2008-03-15].http://metaquerier.cs.uiuc.edu/repository.


Biographies: Mao Qinjiao(1983—), female, graduate;Feng Boqin(corresponding author), male, professor, bqfeng@mail.xjtu.edu.cn.
Citation: Mao Qinjiao, Feng Boqin, Pan Shanliang.Latent semantic analysis for query interfaces of deep web sites[J].Journal of Southeast University(English Edition), 2008, 24(3):312-314.
Last Update: 2008-09-20