|Table of Contents|

[1] Zou Lingjun, Chen Ling, Tu Li, et al. Clustering algorithm for multiple data streamsbased on spectral component similarity [J]. Journal of Southeast University (English Edition), 2008, 24 (3): 264-266. [doi:10.3969/j.issn.1003-7985.2008.03.003]
Copy

Clustering algorithm for multiple data streamsbased on spectral component similarity()
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
24
Issue:
2008 3
Page:
264-266
Research Field:
Computer Science and Engineering
Publishing date:
2008-09-30

Info

Title:
Clustering algorithm for multiple data streamsbased on spectral component similarity
Author(s):
Zou Lingjun1 Chen Ling1 2 Tu Li3
1Information Engineering College, Yangzhou University, Yangzhou 225009, China
2State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
3College of Information Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
Keywords:
data streams clustering AR model spectral component
PACS:
TP311
DOI:
10.3969/j.issn.1003-7985.2008.03.003
Abstract:
A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive(AR)modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters, namely, amplitude, phase, damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model, the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods.

References:

[1] Han J, Kamber M.Data mining:concepts and techniques [M].2nd ed.Beijing:China Machine Press, 2006:467-531.
[2] Babcock B, Babu S, Datar M, et al.Models and issues in data stream systems [C]//Proceedings of the 21st ACM Symp on Principles of Databases Systems.Madison:ACM Press, 2002:1-16.
[3] Guha S, Meyerson A, Mishra N, et al.Clustering data streams:theory and practice [J].IEEE Transactions on Knowledge and Data Engineering, 2003, 3(15):515-528.
[4] Aggarwal C C, Han J, Wang J, et al.A framework for projected clustering of high dimensional data streams[C]//Proceedings of the VLDB.Toronto:Morgan Kaufmann Publishers, 2004:852-863.
[5] Nam H, Won S.Statistical grid-based clustering over data streams [J].SIGMOD Record, 2004, 33(1):32-37.
[6] Cao F, Ester M, Qian W, et al.Density-based clustering over an evolving data stream with noise[C]//Proceedings of the 2006 SIAM Conference on Data Mining.Springer, 2006:326-337.
[7] Nasraoui O, Cardona C, Rojas C, et al.TECNO-STREAMS:tracking evolving clusters in noisy data streams with a scalable immune system learning model [C]//Proceedings of the 3rd IEEE Intl Conf on Data Mining.Melbourne, 2003:235-242.
[8] Beringer J, Hullermeier E.Online clustering of parallel data streams[J].Data and Knowledge Engineering, 2006, 58(2):180-204.
[9] Aggarwal C C, Han J, Wang J, et al.On demand classification of data streams [C]//Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Seattle:ACM Press, 2004:503-508.
[10] Dai B, Huang J, Yeh M, et al.Adaptive clustering for multiple evolving streams [J].IEEE Transactions on Knowledge and Data Engineering, 2006, 18(9):1166-1180.
[11] Sakurai Y, Papadimitriou S, Faloutsos C.BRAID:stream mining through group lag correlations [C]//Proceedings of the 2005 ACM SIGMOD Intl Conf on Management of Data.Baltimore:ACM Press, 2005:599-610.
[12] Yeung L K, Szeto L K, Liew A W C, et al.Dominant spectral component analysis for transcriptional regulations using microarray time-series data [J].Bioinformatics, 2004, 20(5):742-749.
[13] Yeung L K, Yan H, Liew A W C, et al.Measuring correlation between microarray time-series data using dominant spectral component [C]//Proceedings of the 2nd Asia-Pacific Bioinformatics Conference.Dunedin:Australian Computer Society, 2004:309-314.

Memo

Memo:
Biographies: Zou Lingjun(1984—), female, graduate;Chen Ling(corresponding author), male, professor, lchen@yzcn.net.
Foundation items: The National Natural Science Foundation of China(No.60673060), the Natural Science Foundation of Jiangsu Province(No.BK2005047).
Citation: Zou Lingjun, Chen Ling, Tu Ii.Clustering algorithm for multiple data streams based on spectral component similarity[J].Journal of Southeast University(English Edition), 2008, 24(3):264-266.
Last Update: 2008-09-20