|Table of Contents|

[1] Wang Tao, Lu Yansheng,. Mining condensed frequent subtree base [J]. Journal of Southeast University (English Edition), 2006, 22 (1): 48-53. [doi:10.3969/j.issn.1003-7985.2006.01.011]
Copy

Mining condensed frequent subtree base()
挖掘频繁子树精简基
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
22
Issue:
2006 1
Page:
48-53
Research Field:
Computer Science and Engineering
Publishing date:
2006-03-20

Info

Title:
Mining condensed frequent subtree base
挖掘频繁子树精简基
Author(s):
Wang Tao Lu Yansheng
College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
王涛 卢炎生
华中科技大学计算机科学与技术学院, 武汉 430074
Keywords:
data mining tree pattern condensed subtree base
数据挖掘 树模式 子树精简基
PACS:
TP311
DOI:
10.3969/j.issn.1003-7985.2006.01.011
Abstract:
In frequent tree pattern mining, the number of frequent subtrees generated is often too large.To tackle this problem, the concept of condensed frequent subtree base is proposed.The base consists of the maximal frequent subtrees for a series of support thresholds.It is a subset of frequent subtrees, and is used to approximate the support of arbitrary frequent subtrees with guaranteed maximal error bound.In addition, an algorithm is developed to mine such a condensed subtree base in a database of labeled rooted ordered trees.The algorithm adopts the way of right-most extension to generate systematically all frequent rooted ordered subtrees.Several techniques are proposed to prune the branches that do not correspond to the maximal frequent subtrees.Heuristic techniques are used to arrange the order of computation so that relatively expensive computation is avoided as much as possible.Experimental results show that the size of the base is less than 10% of that of the complete set, and the algorithm outperforms the previous algorithms.
为了解决频繁树模式挖掘中频繁子树的数目通常太大的问题, 提出了频繁子树精简基的概念, 精简基由相对于一系列支持度阈值的最大频繁子树组成, 它是频繁子树的一个子集, 可用来估计任一频繁子树的支持度, 并能将误差控制在确定范围内.提出了一个在带标号的有根的有序树的数据库中挖掘这种子树精简基的算法, 该算法采用最右扩展方法系统地生成所有的频繁有序有根子树.采用的剪枝技术能尽早地剪掉一些不可能生成最大频繁子树的分枝, 还采用了启发式的技术来安排计算的次序以尽可能避免代价高的计算.实验结果表明该精简基的大小不到全集的10%, 算法的性能也比挖掘全集的算法要高.

References:

[1] Zaki M J.Efficiently mining frequent trees in a forest [C]//8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Alberta, Canada, 2002: 68-75.
[2] Asai T, Abe K, Kawasoe S, et al.Efficient substructure discovery from large semi-structured data [C]//Proc 2002 SIAM Int Conf on Data Mining (SDM’02).Arlington, VA, 2002: 457-473.
[3] Pei J, Dong G, Zou W, et al.On computing condensed frequent pattern bases[C]//Proc 2002 Int Conf on Data Mining (ICDM’02).Maebashi, Japan, 2002: 378-385.
[4] Wang C, Hong M, Pei J, et al.Efficient pattern-growth methods for frequent tree pattern mining [C]//The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’04).Sydney, Australia, 2004: 441-451.
[5] Chi Y, Yang Y, Xia Y, et al.CMTreeMiner:mining both closed and maximal frequent subtrees [C]//The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’04).Sydney, Australia, 2004: 63-73.
[6] Kudo T.FREQT:an implementation of FREQT [EB/OL].(2003-03-21)[2004-12-15].http://chasen.org/~taku/software/freqt/.

Memo

Memo:
Biographies: Wang Tao(1969—), female, graduate;Lu Yansheng(corresponding author), male, professor, lys@mail.hust.edu.cn.
Last Update: 2006-03-20