|Table of Contents|

[1] Li Huayang, Liu Yubao, Li Youkui, et al. Application of fuzzy equivalence theory in data cleaning [J]. Journal of Southeast University (English Edition), 2004, 20 (4): 454-457. [doi:10.3969/j.issn.1003-7985.2004.04.012]
Copy

Application of fuzzy equivalence theory in data cleaning()
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
20
Issue:
2004 4
Page:
454-457
Research Field:
Computer Science and Engineering
Publishing date:
2004-12-30

Info

Title:
Application of fuzzy equivalence theory in data cleaning
Author(s):
Li Huayang1 2 Liu Yubao1 Li Youkui3
1College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
2UFsoft School of Software, Jiangxi University of Finance and Economics, Nanchang 330013, China
3Nanjing Institute, Huawei Technologies Co., Ltd, Nanjing 210001, China
Keywords:
equivalence theory equivalence degree data cleaning
PACS:
T391
DOI:
10.3969/j.issn.1003-7985.2004.04.012
Abstract:
This paper presents a rule merging and simplifying method and an improved analysis deviation algorithm. The fuzzy equivalence theory avoids the rigid way(either this or that)of traditional equivalence theory. During a data cleaning process task, some rules exist such as “included”/“being included” relations with each other. The equivalence degree of the being-included rule is smaller than that of the including rule, so a rule merging and simplifying method is introduced to reduce the total computing time. And this kind of relation will affect the deviation of fuzzy equivalence degree. An improved analysis deviation algorithm that omits the influence of the included rules’ equivalence degree is also presented. Normally the duplicate records are logged in a file, and users have to check and verify them one by one. It’s time-cost. The proposed algorithm can save users’ labor during duplicate records checking. Finally, an experiment is presented which demonstrates the possibility of the rule.

References:

[1] Rahm E, Hai Do H. Data cleaning: problems and current approaches[J]. Data Engineering, 2000, 23(4): 3-13.
[2] Davidson Susan B, Kosky Anthony S. Specifying database transformations in WOL [J]. Data Engineering, 1999, 22(1): 25-31.
[3] Haas Laura, Miller Renee, Niswonger Bartholomew, et al. Transforming heterogeneous data with database middleware: beyond integration [J]. Data Engineering, 1999, 22(1): 31-37.
[4] Raman V, Joseph M. Potter’s wheel: an interactive data cleaning system[A]. In: Very Large Data Bases [C]. ACM Press, 2001. 381-390.
[5] GalhardasHelena, Florescu Daniela, Shasha Dennis. Declarative data cleaning: language, model and algorithms[A]. In: Very Large Data Bases [C]. ACM Press, 2001. 371-380.
[6] Hernandez Mauricio A, Stolfo Salvatore J. The merge/purge problem for large databases [A]. In: SIGMOD Conf [C]. ACM Press, 1995. 127-138.
[7] Li Huayang, Liu Yubao, Li Youkui. The equivalence theory based on fuzzy theory [A]. In: The 3rd International Conf on Machine Learning and Cybernetics [C]. Shanghai, 2004. 1272-1276.

Memo

Memo:
Biography: Li Huayang(1973—), male, doctor, associate professor, dariusli@tom.com.
Last Update: 2004-12-20