|Table of Contents|

[1] Zou Lingjun, Chen Ling, Tu Li, et al. Clustering algorithm for multiple data streamsbased on spectral component similarity [J]. Journal of Southeast University (English Edition), 2008, 24 (3): 264-266. [doi:10.3969/j.issn.1003-7985.2008.03.003]
Copy

Clustering algorithm for multiple data streamsbased on spectral component similarity()
一种基于谱分量相似度的多数据流聚类算法
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
24
Issue:
2008 3
Page:
264-266
Research Field:
Computer Science and Engineering
Publishing date:
2008-09-30

Info

Title:
Clustering algorithm for multiple data streamsbased on spectral component similarity
一种基于谱分量相似度的多数据流聚类算法
Author(s):
Zou Lingjun1 Chen Ling1 2 Tu Li3
1Information Engineering College, Yangzhou University, Yangzhou 225009, China
2State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
3College of Information Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
邹凌君1 陈崚1 2 屠莉3
1扬州大学信息工程学院, 扬州 225009; 2南京大学计算机软件新技术国家重点实验室, 南京 210093; 3南京航空航天大学信息科学与技术学院, 南京 210016
Keywords:
data streams clustering AR model spectral component
数据流 聚类 AR模型 谱分量
PACS:
TP311
DOI:
10.3969/j.issn.1003-7985.2008.03.003
Abstract:
A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive(AR)modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters, namely, amplitude, phase, damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model, the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods.
提出了一种新的多数据流聚类算法.该算法可以有效地对有相似行为但存在一定时间延迟的多数据流进行聚类.算法采用自回归模型技术度量数据流间的延迟相关, 利用频谱估计来抽取数据流的特征.每一个数据流用其谱分量的和来表示, 从而来计算每对数据流间的相关关系.每个谱分量用振幅、相位、衰减率、频率4个参数来描述.算法计算谱分量对之间的ε-延时相关关系, 并以此为基础来得到聚类分析中数据流间距离的度量.此外, 算法采用滑动窗口技术对多数据流进行聚类, 实时地得出聚类结果且动态地调节聚类的个数.在人工数据集和实际数据集上的实验结果表明, 所提出的算法比其他类似的算法具有更快的速度和更好的聚类效果.

References:

[1] Han J, Kamber M.Data mining:concepts and techniques [M].2nd ed.Beijing:China Machine Press, 2006:467-531.
[2] Babcock B, Babu S, Datar M, et al.Models and issues in data stream systems [C]//Proceedings of the 21st ACM Symp on Principles of Databases Systems.Madison:ACM Press, 2002:1-16.
[3] Guha S, Meyerson A, Mishra N, et al.Clustering data streams:theory and practice [J].IEEE Transactions on Knowledge and Data Engineering, 2003, 3(15):515-528.
[4] Aggarwal C C, Han J, Wang J, et al.A framework for projected clustering of high dimensional data streams[C]//Proceedings of the VLDB.Toronto:Morgan Kaufmann Publishers, 2004:852-863.
[5] Nam H, Won S.Statistical grid-based clustering over data streams [J].SIGMOD Record, 2004, 33(1):32-37.
[6] Cao F, Ester M, Qian W, et al.Density-based clustering over an evolving data stream with noise[C]//Proceedings of the 2006 SIAM Conference on Data Mining.Springer, 2006:326-337.
[7] Nasraoui O, Cardona C, Rojas C, et al.TECNO-STREAMS:tracking evolving clusters in noisy data streams with a scalable immune system learning model [C]//Proceedings of the 3rd IEEE Intl Conf on Data Mining.Melbourne, 2003:235-242.
[8] Beringer J, Hullermeier E.Online clustering of parallel data streams[J].Data and Knowledge Engineering, 2006, 58(2):180-204.
[9] Aggarwal C C, Han J, Wang J, et al.On demand classification of data streams [C]//Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Seattle:ACM Press, 2004:503-508.
[10] Dai B, Huang J, Yeh M, et al.Adaptive clustering for multiple evolving streams [J].IEEE Transactions on Knowledge and Data Engineering, 2006, 18(9):1166-1180.
[11] Sakurai Y, Papadimitriou S, Faloutsos C.BRAID:stream mining through group lag correlations [C]//Proceedings of the 2005 ACM SIGMOD Intl Conf on Management of Data.Baltimore:ACM Press, 2005:599-610.
[12] Yeung L K, Szeto L K, Liew A W C, et al.Dominant spectral component analysis for transcriptional regulations using microarray time-series data [J].Bioinformatics, 2004, 20(5):742-749.
[13] Yeung L K, Yan H, Liew A W C, et al.Measuring correlation between microarray time-series data using dominant spectral component [C]//Proceedings of the 2nd Asia-Pacific Bioinformatics Conference.Dunedin:Australian Computer Society, 2004:309-314.

Memo

Memo:
Biographies: Zou Lingjun(1984—), female, graduate;Chen Ling(corresponding author), male, professor, lchen@yzcn.net.
Foundation items: The National Natural Science Foundation of China(No.60673060), the Natural Science Foundation of Jiangsu Province(No.BK2005047).
Citation: Zou Lingjun, Chen Ling, Tu Ii.Clustering algorithm for multiple data streams based on spectral component similarity[J].Journal of Southeast University(English Edition), 2008, 24(3):264-266.
Last Update: 2008-09-20