|Table of Contents|

[1] Chen Ningtao, Wang Nengchao, Shi Baochang,. Fast algorithm for constructing neighbor-joining phylogenetic trees [J]. Journal of Southeast University (English Edition), 2006, 22 (2): 176-179. [doi:10.3969/j.issn.1003-7985.2006.02.007]
Copy

Fast algorithm for constructing neighbor-joining phylogenetic trees()
创建neighbor-joining进化树的快速算法
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
22
Issue:
2006 2
Page:
176-179
Research Field:
Computer Science and Engineering
Publishing date:
2006-06-30

Info

Title:
Fast algorithm for constructing neighbor-joining phylogenetic trees
创建neighbor-joining进化树的快速算法
Author(s):
Chen Ningtao1, Wang Nengchao2, Shi Baochang2
1 School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
2 Parallel Computing Institute, Huazhong University of Science and Technology, Wuhan 430074, China
陈宁涛1, 王能超2, 施保昌2
1华中科技大学计算机科学与技术学院, 武汉 430074; 2华中科技大学并行计算研究所, 武汉 430074
Keywords:
phylogenetic tree neighbor-joining method fast algorithm progressive multiple alignment
进化树 邻接法 快速算法 进化多序列比对
PACS:
TP301.6
DOI:
10.3969/j.issn.1003-7985.2006.02.007
Abstract:
To improve the performance of Saitou and Nei’s algorithm(SN)and Studier and Keppler’s improved algorithm(SK)for constructing neighbor-joining phylogenetic trees and reduce the time complexity of the computation, a fast algorithm is proposed.The proposed algorithm includes three techniques.First, a linear array A[N] is introduced to store the sum of every row of the distance matrix(the same as SK), which can eliminate many repeated computations.Secondly, the value of A[i] is computed only once at the beginning of the algorithm, and is updated by three elements in the iteration.Thirdly, a very compact formula for the sum of all the branch lengths of operational taxonomic units(OTUs)i and j is designed, and the correctness of the formula is proved.The experimental results show that the proposed algorithm is from tens to hundreds times faster than SN and roughly two times faster than SK when N increases, constructing a tree with 2 000 OTUs in 3 min on a current desktop computer.To earn the time with the cost of the space and reduce the computations in the inner-most loop are the basic solutions for algorithms with many loops.
为了改善Saitou 和Nei提出的neighbor-joining进化树算法(SN)及Studier和Keppler的改进算法(SK), 降低计算的时间复杂度, 设计了一种快速算法.该算法涉及3种技术:第一, 引入一个线性数组A[N], 用于存储距离矩阵每一行的值, 以减少许多重复计算;第二, A[i]的值在算法开始时全部计算, 在迭代步中间只进行更新3个变化的值;第三, 设计了一个紧凑的公式用于计算OTUs之间的边长, 并对该公式进行了证明.实验结果表明:随着节点数的增多, 该算法比SN算法快几十倍到上百倍, 比SK算法快2倍以上;在一台桌面计算机上, 该算法能在3 min左右创建具有2 000个节点的进化树.以空间换时间, 减少最内层循环的计算量是设计多重循环算法的基本思路.

References:

[1] Yang C, Khuri S.PTC:an interactive tool for phylogenetic tree construction [A]. In:IEEE Proceedings of the Com-putational Systems Bioinformatics[C].California:Stanford University, 2003, 8:476-477.
[2] Foulds L R, Graham R L.The sterner problem in phylogeny is NP-complete [J].Advances Appl Math, 1982, 3:43-49.
[3] Day W.Computational complexity of inferring phylogenies from dissimilarity matrices [J].Bull Math Biol, 1987, 49(4):461-467.
[4] Saitou N, Nei M.The neighbor-joining method:a new method for reconstructing phylogenetic trees [J].Mol Biol Evol, 1987, 4(4):406-425.
[5] Studier J A, Keppler K J.A note on the neighbor-joining algorithm of Saitou and Nei [J].Mol Biol Evol, 1988, 5(6):729-731.
[6] Saitou N, Imanishi T.Relative efficiencies of the Fitch-Margoliash, maximum-parsimony, maximum-likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree [J].Mol Biol Evol, 1989, 6(5):514-525.
[7] Thompson J D, Higgins D G, Gibson T J.CLUSTAL W:improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice [J].Nucleic Acids Res, 1994, 22(22):4673-4680.
[8] Notredame C, Higgins D G, Heringa J.T-Coffee:a novel method for fast and accurate multiple sequence alignment [J].J Mol Biol, 2000, 302(1):205-217.
[9] Katoh K, Misawa K, Kuma K, et al.MAFFT:a novel method for rapid multiple sequence alignment based on fast Fourier transform [J].Nucleic Acids Res, 2002, 30(14):3059-3066.
[10] Edgar R C.MUSCLE:multiple sequence alignment with high accuracy and high throughput [J].Nucleic Acids Res, 2004, 32(5):1792-1797.
[11] Sneath P H A, Sokal R R.Numerical taxonomy [M].San Francisco:Freeman, 1973.
[12] Edgar R C.Local homology recognition and distance measures in linear time using compressed amino acid alphabets [J].Nucleic Acids Res, 2004, 32(1):380-385.

Memo

Memo:
Biographies: Chen Ningtao(1976—), male, graduate;Shi Baochang(corresponding author), male, doctor, professor, sbchust0382@sina.com.
Last Update: 2006-06-20