|Table of Contents|

[1] Li Wenwu**, Jin Yuanping, Tong Mina,. Lossless Mapping from Semi-Structured Data to Structured Data* [J]. Journal of Southeast University (English Edition), 2002, 18 (1): 46-53. [doi:10.3969/j.issn.1003-7985.2002.01.009]
Copy

Lossless Mapping from Semi-Structured Data to Structured Data*()
半结构化数据到结构化数据的无损映射
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
18
Issue:
2002 1
Page:
46-53
Research Field:
Computer Science and Engineering
Publishing date:
2002-03-30

Info

Title:
Lossless Mapping from Semi-Structured Data to Structured Data*
半结构化数据到结构化数据的无损映射
Author(s):
Li Wenwu** Jin Yuanping Tong Mina
Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China
李文武 金远平 童咪娜
东南大学计算机科学与工程系, 南京 210096
Keywords:
semi-structured data DTD RDB schema mapping DOM overflow data
半结构化数据 DTD 关系数据库 模式映射 DOM 溢出数据
PACS:
TP311
DOI:
10.3969/j.issn.1003-7985.2002.01.009
Abstract:
Most semi-structured data are of certain structure regularity. Having been stored as structured data in relational database(RDB), they can be effectively managed by database management system(DBMS). Some semi-structured data are difficult to transform due to their irregular structures. We design an efficient algorithm and data structure for ensuring lossless transformation. We bring forward an approach of schema extraction through data mining, in which different kinds of elements are transformed respectively and lossless mapping from semi-structured data to structured data can be achieved.
大多数半结构化数据都具有一定的结构规律, 将它们转化为基于关系数据库存储的结构化数据, 可有效地应用DBMS技术进行处理.部分不便于转化的数据作特殊处理, 以保证整个数据的无损映射.本文在完成DTD的转换后, 从一种最简单的映射方式入手, 提出改进方案, 利用一种基于数据挖掘的模式抽取方法, 对不同类型的元素分别处理, 设计了一套有效的溢出数据处理办法, 实现了半结构化数据到结构化数据的无损映射.

References:

[1] Deutsch A, Fernandez M, Suciu D. Storing Semistructured Data with STORED[A]. ACM SIGMOD International Conference on Management of Data[C]. Philadelphia, 1999, 28(2):431-442.
[2] Wang K, Liu H. Discovering Typical Structures of Documents: a road map approach[A]. ACM SIGIR Conference on Research and Development in Information Retrieval[C]. New York, 1998.146-154.
[3] Christophides V, Abiteboul S, Cluet S, et al. From Structured Document to Novel Query Facilities[A]. In: Snodgrass R, Winslett M, eds. Predeedings of 1994 ACM SIGMOD International Conference On Management of Data[C]. Minneapolis, 1994.313-324.

Memo

Memo:
* The project supported by the plan of key university faculty members of State Education Ministry and “333” Talent Plan of Jiangsu Province.
** Born in 1978, male, graduate.
Last Update: 2002-03-20