|Table of Contents|

[1] Song Peng, Jin Yun, Bao Yongqiang, et al. Efficient fundamental frequency transformationfor voice conversion [J]. Journal of Southeast University (English Edition), 2012, 28 (2): 140-144. [doi:10.3969/j.issn.1003-7985.2012.02.002]
Copy

Efficient fundamental frequency transformationfor voice conversion()
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
28
Issue:
2012 2
Page:
140-144
Research Field:
Information and Communication Engineering
Publishing date:
2012-06-30

Info

Title:
Efficient fundamental frequency transformationfor voice conversion
Author(s):
Song Peng1 Jin Yun1 2 Bao Yongqiang3 Zhao Li1 Zou Cairong1
1 Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing 210096, China
2 School of Physics and Electronic Engineering, Xuzhou Normal University, Xuzhou 221116, China
Keywords:
-
PACS:
TN912.3
DOI:
10.3969/j.issn.1003-7985.2012.02.002
Abstract:
In order to improve the performance of voice conversion, the fundamental frequency(F0)transformation methods are investigated, and an efficient F0 transformation algorithm is proposed. First, unlike the traditional linear transformation methods, the relationships between F0s and spectral parameters are explored. In each component of the Gaussian mixture model(GMM), the F0s are predicted from the converted spectral parameters using the support vector regression(SVR)method. Then, in order to reduce the over-smoothing caused by the statistical average of the GMM, a mixed transformation method combining SVR with the traditional mean-variance linear(MVL)conversion is presented. Meanwhile, the adaptive median filter, prevalent in image processing, is adopted to solve the discontinuity problem caused by the frame-wise transformation. Objective and subjective experiments are carried out to evaluate the performance of the proposed method. The results demonstrate that the proposed method outperforms the traditional F0 transformation methods in terms of the similarity and the quality.

References:

[1] Stylianou Y, Cappé O, Moulines E. Continuous probabilistic transform for voice conversion [J]. IEEE Transactions on Speech and Audio Processing, 1998, 6(2):131-142.
[2] Kain A, Macon M W. Spectral voice conversion for text-to-speech synthesis [C]//International Conference on Acoustics, Speech, and Signal Processing. Seattle, USA, 1998: 285-288.
[3] Inanoglu Z. Transforming pitch in a voice conversion framework [D]. Cambridge, UK: St.Edmund’s College of the University of Cambridge, 2003: 28-32.
[4] Wu Z Z, Kinnunen T, Chng E S, et al. Text-independent F0 transformation with non-parallel data for voice conversion [C]//11th Annual Conference of the International Speech Communication Association. Makuhari, Japan, 2010: 1732-1735.
[5] Shao X, Milner B. Pitch prediction from MFCC vectors for speech reconstruction [C]//Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Montreal, Canada, 2004: 97-100.
[6] Basak D, Pal S, Patranabis D C. Support vector regression [J]. Neural Information Processing—Letters and Reviews, 2007, 11(10): 203-224.
[7] Song P, Bao Y Q, Zhao L, et al. Voice conversion using support vector regression [J]. Electronics Letters, 2011, 47(18): 1045-1046.
[8] Hwang H, Haddad R A. Adaptive median filters: new algorithms and results [J]. IEEE Transactions on Image Processing, 1995, 4(4): 499-502.
[9] Kominek J, Black A W. The CMU Arctic speech databases [C]//Proceedings of the 5th ISCA Speech Synthesis Workshop. Pittsburgh, USA, 2004: 223-224.
[10] Kawahara H, Masuda-Katsuse I, de Cheveigné A. Restructuring speech representation using pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds [J]. Speech Communication, 1999, 27(3): 187-207.

Memo

Memo:
Biographies: Song Peng(1983—), male, graduate; Zhao Li(corresponding author), male, doctor, professor, zhaoli@seu.edu.cn.
Foundation items: The National Natural Science Foundation of China(No.60975017), the Natural Science Foundation of Guangdong Province(No.10252800001000001), the Natural Science Foundation of Higher Education Institutions of Jiangsu Province(No.10KJB510005).
Citation: Song Peng, Jin Yun, Bao Yongqiang, et al.Efficient fundamental frequency transformation for voice conversion[J].Journal of Southeast University(English Edition), 2012, 28(2):140-144.[doi:10.3969/j.issn.1003-7985.2012.02.002]
Last Update: 2012-06-20