«Previous Article|Table of Contents|Next Article»

[1] Wang Shikui, Tang Yibin, You Hongyan, et al. Frame erasure concealment in wideband speech codingbased on large hidden Markov model [J]. Journal of Southeast University (English Edition), 2009, 25 (2): 152-155. [doi:10.3969/j.issn.1003-7985.2009.02.002]
Copy

Frame erasure concealment in wideband speech codingbased on large hidden Markov model()

基于大型隐马尔可夫模型的宽带语音丢帧补偿

Share：

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:: 25
Issue:: 2009 2

Page:: 152-155

Research Field:: Computer Science and Engineering

Publishing date:: 2009-06-30

Info

Title:: Frame erasure concealment in wideband speech codingbased on large hidden Markov model

: 基于大型隐马尔可夫模型的宽带语音丢帧补偿

Author(s):: Wang Shikui¹; 2; Tang Yibin¹; You Hongyan¹; Wu Zhenyang¹; ¹School of Information Science and Engineering, Southeast University, Nanjing 210096, China
²School of Physics and Electronic Information, Anhui Normal University, Wuhu 241000, China

: 王仕奎¹; 2; 汤一彬¹; 尤红岩¹; 吴镇扬¹; ¹东南大学信息科学与工程学院, 南京 210096; ²安徽师范大学物理与电子信息学院, 芜湖 241000

Keywords:: frame erasure concealment; wideband speech; large hidden Markov model; immittance spectral frequency(ISF)parameter

: 丢帧补偿; 宽带语音; 大型隐马尔可夫模型; ISF参数

PACS:: TP391

DOI:: 10.3969/j.issn.1003-7985.2009.02.002

Abstract:: Frame erasure concealment is studied to solve the problem of rapid speech quality reduction due to the loss of speech parameters during speech transmission. A large hidden Markov model is applied to model the immittance spectral frequency(ISF)parameters in AMR-WB codec to optimally estimate the lost ISFs based on the minimum mean square error(MMSE)rule. The estimated ISFs are weighted with the ones of their previous neighbors to smooth the speech, resulting in the actual concealed ISF vectors. They are used instead of the lost ISFs in the speech synthesis on the receiver. Comparison is made between the speech concealed by this algorithm and by Annex I of G.722.2 specification, and simulation shows that the proposed concealment algorithm can lead to better performance in terms of frequency-weighted spectral distortion and signal-to-noise ratio compared to the baseline method, with an increase of 2.41 dB in signal-to-noise ratio(SNR)and a reduction of 0.885 dB in frequency-weighted spectral distortion.

: 研究了在语音传输过程中由于参数丢失导致语音质量急剧下降的丢帧补偿问题. 利用大规模隐式马尔可夫模型对自适应多速率宽带语音编码(AMR-WB)的ISF参数进行建模, 然后对丢失的ISF参数进行基于最小均方误差(MMSE)准则的最优估计, 将估计的ISF参数和前帧的ISF参数进行加权以平滑估计值, 得到补偿的ISF参数. 在接收端, 利用ISF参数的估计值进行语音合成. 将本算法的合成语音和由G.722.2标准附件I的基准补偿的合成语音进行比较, 仿真结果表明, 本补偿算法可以得到更好的性能, 在频率加权谱失真和信噪比这2种评价准则上都有所改善, 信噪比提高约2.41 dB, 频率加权谱失真下降约0.885 dB, 证明了该算法的有效性.

References:

[1] Ehsan M S, Kubin G.Frame change ratio:a measure to mod-el short-time stationarity of speech[C]//Innovations in Information Technology.Dubai, United Arab Emirates, 2006:1-5.
[2] Vaillancourt T, Jelinek M, Salami R, et al.Efficient frame erasure concealment in predictive speech codecs using glottal pulse resynchronization [C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, HI, USA, 2007:1113-1116.
[3] Thyssen J, Zopf R, Chen Juin-Hwey, et al.A candidate for the ITU-T G.722 packet loss concealment standard[C]//IEEE International Conference on Speech and Signal Processing. Honolulu, HI, USA, 2007:549-552.
[4] Telecommunication Standardization Sector of ITU.ITU-T recommendation G.722.2 wideband coding of speech at around 16 kbit/s using adaptive multi-rate wideband(AMR-WB)[S].2003.
[5] Telecommunication Standardization Sector of ITU.Wideband coding of speech at around 16 kbit/s using adaptive multi-rate wideband(AMR-WB).Appendix I:Error concealment of erroneous or lost frames[S].2002.
[6] Rabiner L, Juang B H.Fundamentals of speech recognition[M].New Jersey:Prentice-Hall, 1993:321-389.
[7] Rodbro C A, Murthi M N, Andersen S V, et al.Hidden Markov model-based packet loss concealment for voice over IP[J].IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(5):1609-1623.
[8] Pepper D J, Clements M A.Phonemic recognition using a large hidden Markov model[J].IEEE Transactions on Signal Processing, 1992, 40(6):1590-1595.
[9] Pepper D J, Clements M A.On the phonetic structure of a large hidden Markov model [C]//International Conference on Acoustics, Speech, and Signal Processing.Toronto, Ont, Canada, 1991:465-468.
[10] Collura J S, McCree A, Tremain T E.Perceptually based distortion measurements for spectrum quantization[C]//IEEE Workshop on Speech Coding for Telecommunications.1995:49-50.
[11] Gilbert E N.Capacity of a burst-noise channel[J].Ben Syst Tech Ⅰ, 1960, 39:1253-1266.

Memo

Memo:: Biographies: Wang Shikui(1970—), male, graduate;Wu Zhenyang(corresponding author), male, professor, zhenyang@seu.edu.cn.
Foundation items: The Science Foundation of Southeast University(No.XJ0704268), the Natural Science Foundation of the Education Department of Anhui Province(No.KJ2007B088).
Citation: Wang Shikui, Tang Yibin, You Hongyan, et al.Frame erasure concealment in wideband speech coding based on large hidden Markov model[J].Journal of Southeast University(English Edition), 2009, 25(2):152-155.

Last Update: 2009-06-20

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Info

References:

Memo

Common functions

Navigate

Tools

Statistics