[1] Lamel L, Rabiner L, Rosenberg A, et al. An improved endpoint detector for isolated word recognition[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, 29(4): 777-785. DOI:10.1109/TASSP.1981.1163642.
[2] Lu L, Jiang H, Zhang H J. A robust audio classification and segmentation method[C]//Proceedings of the Ninth ACM International Conference on Multimedia. Ottawa, Canada, 2001: 203-211. DOI:10.1145/500141.500173.
[3] Song J L, Meng Y, Cao J M, et al. Research on digital hearing aid speech enhancement algorithm[C]//2018 37th Chinese Control Conference(CCC). Wuhan, 2018: 4316-4320. DOI:10.23919/chicc.2018.8482732.
[4] C7;olak R, Akdeniz R. A novel voice activity detection for multi-channel noise reduction[C]//IEEE Access. IEEE, 2021: 91017-91026.
[5] Jaiswal R. Speech activity detection under adverse noisy conditions at low SNRs[C]//2021 6th International Conference on Communication and Electronics Systems(ICCES). Coimbatre, India, 2021: 97-101. DOI:10.1109/ICCES51350.2021.9488934.
[6] Masumura R, Matsui K, Koizumi Y, et al. Context-aware neural voice activity detection using auxiliary networks for phoneme recognition, speech enhancement and acoustic scene classification[C]//2019 27th European Signal Processing Conference(EUSIPCO). Coruna, Spain, 2019: 1-5. DOI:10.23919/EUSIPCO.2019.8902703.
[7] Moldovan A, Stan A, Giurgiu M. Improving sentence-level alignment of speech with imperfect transcripts using utterance concatenation and VAD[C]//2016 IEEE 12th International Conference on Intelligent Computer Communication and Processing(ICCP). Cluj-Napoca, Romania, 2016: 171-174. DOI:10.1109/ICCP.2016.7737141.
[8] Rabiner L R, Sambur M R. An algorithm for determining the endpoints of isolated utterances[J]. The Bell System Technical Journal, 1975, 54(2): 297-315. DOI:10.1002/j.1538-7305.1975.tb02840.x.
[9] Nemer E, Goubran R, Mahmoud S. Robust voice activity detection using higher-order statistics in the LPC residual domain[J]. IEEE Transactions on Speech and Audio Processing, 2001, 9(3): 217-231. DOI:10.1109/89.905996.
[10] Marzinzik M, Kollmeier B. Speech pause detection for noise spectrum estimation by tracking power envelope dynamics[J]. IEEE Transactions on Speech and Audio Processing, 2002, 10(2): 109-118. DOI:10.1109/89.985548.
[11] Shi L, Ahmad I, He Y J, et al. Hidden Markov model based drone sound recognition using MFCC technique in practical noisy environments[J].Journal of Communications and Networks, 2018, 20(5): 509-518. DOI:10.1109/JCN.2018.000075.
[12] Al-Ali A K H, Dean D, Senadji B, et al. Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions[J]. IEEE Access, 2017, 5: 15400-15413. DOI:10.1109/ACCESS.2017.2728801.
[13] Pham T V, Stark M, Rank E. Performance analysis of wavelet subband based voice activity detection in cocktail party environment[C]//The 2010 International Conference on Advanced Technologies for Communications. Ho Chi Minh City, Vietnam, 2010: 85-88. DOI:10.1109/ATC.2010.5672718.
[14] Ghosh P K, Tsiartas A, Narayanan S. Robust voice activity detection using long-term signal variability[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(3): 600-613. DOI:10.1109/TASL.2010.2052803.
[15] Tsiartas A, Chaspari T, Katsamanis N, et al. Multi-band long-term signal variability features for robust voice activity detection[C]//14th Annual Conference of the International Speech Communication Association(INTERSPEECH 2013). ISCA, 2013: 718-722. DOI:10.21437/interspeech.2013-201.
[16] Haider F, Luz S. Attitude recognition using multi-resolution cochleagram features[C]//2019 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Brighton, UK, 2019: 3737-3741. DOI:10.1109/ICASSP.2019.8682974.
[17] Aneeja G, Yegnanarayana B. Single frequency filtering approach for discriminating speech and nonspeech[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(4): 705-717. DOI:10.1109/TASLP.2015.2404035.
[18] Makowski R, Hossa R. Voice activity detection with quasi-quadrature filters and GMM decomposition for speech and noise[J]. Applied Acoustics, 2020, 166: 107344. DOI:10.1016/j.apacoust.2020.107344.
[19] Sohn J, Kim N S, Sung W. A statistical model-based voice activity detection[J]. IEEE Signal Processing Letters, 1999, 6(1): 1-3. DOI:10.1109/97.736233.
[20] Dey J, Bin Hossain M S, Haque M A. An ensemble SVM-based approach for voice activity detection[C]//2018 10th International Conference on Electrical and Computer Engineering(ICECE). Dhaka, Bangladesh, 2018: 297-300. DOI:10.1109/ICECE.2018.8636745.
[21] Krishnakumar H, Williamson D S. A comparison of boosted deep neural networks for voice activity detection[C]//2019 IEEE Global Conference on Signal and Information Processing(GlobalSIP). Ottawa, ON, Canada, 2019: 1-5. DOI:10.1109/GlobalSIP45357.2019.8969258.
[22] Germain F G, Sun D L, Mysore G J. Speaker and noise independent voice activity detection[C]//Interspeech 2013. France, 2013: 732-736. DOI:10.21437/interspeech.2013-204.
[23] Tachioka Y. DNN-based voice activity detection using auxiliary speech models in noisy environments[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Calgary, AB, Canada, 2018: 5529-5533. DOI:10.1109/ICASSP.2018.8461551.
[24] Paseddula C, Gangashetty S V. DNN based acoustic scene classification using score fusion of MFCC and inverse MFCC[C]//2018 IEEE 13th International Conference on Industrial and Information Systems(ICIIS). Rupnagar, India, 2018: 18-21. DOI:10.1109/ICIINFS.2018.8721379.
[25] Sun Y N, Yen G G, Yi Z. Evolving unsupervised deep neural networks for learning meaningful representations[J]. IEEE Transactions on Evolutionary Computation, 2019, 23(1): 89-103. DOI:10.1109/TEVC.2018.2808689.
[26] Long J Y, Zhang S H, Li C. Evolving deep echo state networks for intelligent fault diagnosis[J]. IEEE Transactions on Industrial Informatics, 2020, 16(7): 4928-4937. DOI:10.1109/TII.2019.2938884.