|Table of Contents|

[1] Xie Xueying, Sun Xiao, Lu Zuhong,. Auto-selection order of Markov chain forbackground sequences with chi-square test [J]. Journal of Southeast University (English Edition), 2003, 19 (4): 311-316. [doi:10.3969/j.issn.1003-7985.2003.04.002]
Copy

Auto-selection order of Markov chain forbackground sequences with chi-square test()
卡方检验确定背景序列模型Markov chain的阶数
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
19
Issue:
2003 4
Page:
311-316
Research Field:
Biological Science and Medical Engineering
Publishing date:
2003-12-30

Info

Title:
Auto-selection order of Markov chain forbackground sequences with chi-square test
卡方检验确定背景序列模型Markov chain的阶数
Author(s):
Xie Xueying Sun Xiao Lu Zuhong
Chien-Shiung Wu Laboratory, Southeast University, Nanjing 210096, China
谢雪英 孙啸 陆祖宏
东南大学吴健雄实验室, 南京 210096
Keywords:
non-coding sequences regulatory elements chi-square test Markov chain
非编码序列 调控元件 卡方测试 马尔可夫链
PACS:
Q52
DOI:
10.3969/j.issn.1003-7985.2003.04.002
Abstract:
Modeling non-coding background sequences appropriately is important for the detection of regulatory elements from DNA sequences. Based on the chi-square statistic test, some explanations about why to choose higher-order Markov chain model and how to automatically select the proper order are given in this paper. The chi-square test is first run on synthetic data sets to show that it can efficiently find the proper order of Markov chain. Using chi-square test, distinct higher order context dependences inherent in ten sets of sequences of yeast S.cerevisiae from other literature have been found. So the Markov chain with higher-order would be more suitable for modeling the non-coding background sequences than an independent model.
合理建模非编码序列对正确识别DNA序列中的调控元件非常重要.基于卡方统计检验, 给出了选用Markov chain模型来模拟序列背景分布的原因及如何确定Markov chain阶数的方法.卡方测试分析模拟数据发现它能有效地确定模型阶数.选择分析啤酒酵母中10类基因的上游序列集发现:所有序列集至少具有一阶以上的上下文相关性, 除1组基因外, 其余9组数据集具有二阶或三阶的上下文相关性.这说明用高阶Markov chain来建模背景序列比单碱基模型(零阶)更合理.

References:

[1] Lawrence C E, Altschul S F, Boguski M S, et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment [J]. Science, 1993, 262(5131): 208-214.
[2] Bailey T L, Elkan C. Unsupervised learning of multiple motifs in biopolymers using expectation maximization [J]. Mach Learn, 1995, 21(1, 2): 51-83.
[3] Roth F P, Hughes J D, Estep P W, et al. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation [J]. Nat Biotechnol, 1998, 16(10): 939-945.
[4] Hughes J D, Estep P W, Tavazoie S, et al. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Sacchaomyces cerevisiae [J]. J Mol Biol, 2000, 296(5): 1205-1214.
[5] Delcher A L, Harmon D, Kasif S, et al. Improved microbial gene identification with Glimmer [J]. Nucleic Acids Res, 1999, 27(23): 4636-4641.
[6] Lukashin A V, Borodovsky M. GeneMark.hmm: new solutions for gene finding [J]. Nucleic Acids Res, 1998, 26(4): 1107-1115.
[7] Liu X, Brutlag D L, Liu J S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes [A]. In: Altman R B, ed. Proceedings of the 6th Pacific Symposium on Biocomputing [C]. USA: World Scientific Pub Co, 2001. 127-138.
[8] Thijs G, Lescot M, Marchal K, et al. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling [J]. Bioinformatics, 2001, 17(12): 1113-1122.
[9] Schbath S, Prum B, Turckheim é de. Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences [J]. J Comput Biol, 1995, 2(3): 417-437.
[10] Schbath S. An efficient statistic to detect over- and under-represented words in DNA sequences [J]. J Comput Biol, 1997, 4(2): 189-192.
[11] Reinert G, Schbath S, Waterman M S. Probabilistic and statistical properties of words: an overview [J]. J Comput Biol, 2000, 7(1, 2): 1-46.
[12] van Helden J, Andre B, Collado-Vides J. Extracting regulatory sites from upstream region of yeast genes by computational analysis of oligo-nucleotide frequencies[J]. J Mol Biol, 1998, 281(5): 827-842.
[13] Goffeau A, Barrell B G, Bussey H, et al. Life with 6000 genes [J]. Science, 1996, 274(5287): 546-567.
[14] Dolinski K, Balajrushnan R, Christil K R, et al. Saccharomyces genome database [EB/OL]. http:// genome-www. stanford.edu/saccharomyces/, 2002.

Memo

Memo:
Biographies: Xie Xueying(1977—), female, graduate; Sun Xiao(corresponding author), male, doctor, professor, xsun@seu.edu.cn.
Last Update: 2003-12-20