PEVD-Based Source Separation Using Informed Spherical Microphone Arrays
This website will demonstrate the source separation algorithm based on the PEVD of spherical microphone array beamformed signals. This work is developed by Vincent W. Neo, Christine Evers and Patrick A. Naylor. This is a joint work between the Speech and Audio Processing Lab at Imperial College London and the School of Electronics and Computer Science at University of Southampton in the U.K.
About This Work
Anechoic speech signals sampled at 16 kHz are taken from the TIMIT corpus [1]. The room impulses responses (RIRs) from the 2 sources to 32 microphones on a rigid sphere are simulated using the SMIRgen tool [2]. The T60 is varied b 0 s, 0.3 s and 0.7 s.
For each speech source, short utterances from a randomly selected speaker are concatenated to generate signals of 8 to 10 s duration. Each source signal was convolved with the RIR and mixed along with 30 dB sensor noise to generate the microphone signals. Eigenbeam signals are generated using the spherical harmonic (SH) transform [3,4]. The modal beamformers are directed at (90,90) in degrees for Source 1 and (90,90) in degrees for Source 2. Two microphones closest to each source were chosen for the comparative methods. The sequential matrix diagonalisation (SMD) algorithm [5] is used for all PEVD processing.
Demonstration
The algorithms used in the comparison include
- Auxiliary function-based IVA (AuxIVA) [6]
- AuxIVA using 32 microphone signals instead of 2 (AuxIVA32) [6]
- Fast Multi-channel Non-negative Matrix Factorisation (FastMNMF) [7]
- Independent Low-Rank Matrix Analysis (ILRMA) [8]
- 3rd Order Hypercardioid Modal Beamformer Optimized for Maximum Directivity (MaxDir) [9]
- 3rd Order Modified Hypercardioid Modal Beamformer (MHCARD)
- PEVD [9-11] which uses 4 beamformed signals per source. They are χ_1^-1, χ_3^-3, χ_3^-1 and modified hyper-cardioid (MHCARD) directed at Source 1, χ_1^1, χ_3^1, χ_3^3 and modified hyper-cardioid directed at Source 2. [10]
The audio player is built using the trackswitch.js tool [11].
Two females speakers in an anechoic room.
Two male speakers in reverberant room with T60 = 0.3 s.
One speaker and localized fan noise in reverberant room with T60 = 0.3 s.
One male and female speakers in reverberant room with T60 = 0.7 s.
References
[1] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallet, N. L. Dahlgren, and V. Zue, "TIMIT acoustic-phonetic continuous speech corpus," Linguistic Data Consortium (LDC), Philadelphia, Corpus, 1993.
[2] D. P. Jarrett, E. A. P. Habets, M. R. P. Thomas and P. A. Naylor, “Rigid sphere room impulse response simulation: Algorithm and applications,” J. Acoust. Soc. Am., vol. 132, no. 3, pp. 1462-1472, Sep. 2012.
[3] B. Rafaely, Fundamentals of Spherical Array Processing, Springer Topics in Signal Processing. Springer-Verlag, 2015.
[4] D. P. Jarrett, E. A. P. Habets, and P. A. Naylor, Theory and Applications of Spherical Microphone Array Processing, Springer Topics in Signal Processing. Springer-Verlag, 2017.
[5] S. Redif, S. Weiss, and J. G. McWhirter, “Sequential matrix diagonalisation algorithms for polynomial EVD of para-Hermitian matrices,” IEEE Trans. Signal Process., vol. 63, no. 1, pp. 81-89, Jan. 2015.
[6] N. Ono, “Stable and fast update rules for independent vector analysis based on auxiliary function technique,” in Proc. IEEE Workshop on Appl. of Signal Process. to Audio and Acoust. (WASPAA), pp. 189-192, Oct. 2011.
[7] K. Sekiguchi, A. A. Nugraha, Y. Bando, and K. Yoshii, “Fast multichannel source separation based on jointly diagonalizable spatial covariance matrices,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2019.
[8] D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari,” Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization," IEEE / ACM Trans. on Audio, Speech and Language Process., vol. 24, no. 9, pp. 1626-1641, Sept. 2016.
[9] J. Meyer, “A highly scalable spherical microphone array based on orthonormal decomposition of the soundfield,” in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), May 2002, pp. 1781-1784.
[10] V. W. Neo, C. Evers, and P. A. Naylor, “Polynomial matrix eigenvalue decomposition of spherical harmonics for speech enhancement,” in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), 2021.
Listening examples audio tool
[11] N. Werner, S. Balke, F.-R. Stöter, M. Müller, B. Edler, "trackswitch.js: A Versatile Web-Based Audio Player for Presenting Scientific Results." 3rd web audio conference, London, UK. 2017. [Online]. Available: https://github.com/audiolabs/trackswitch.js
PEVD-Based Algorithms for Speech Enhancement
[12] V. W. Neo, C. Evers, and P. A. Naylor, “Speech enhancement using polynomial eigenvalue decomposition,” in Proc. IEEE Workshop on Appl. of Signal Process. to Audio and Acoust. (WASPAA), 2019.
[13] V. W. Neo, C. Evers, and P. A. Naylor, “PEVD-based speech enhancement in reverberant environments,” in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), 2020.
[14] V. W. Neo, C. Evers, and P. A. Naylor, “Speech dereverberation performance of a polynomial-EVD subspace approach,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2020.
Related Works on PEVD Algorithms
[15] J. G. McWhirter, P. D. Baxter, T. Cooper, S. Redif, and J. Foster, “An EVD algorithm for para-Hermitian polynomial matrices,” IEEE Trans. Signal Process., vol. 55, no. 5, pp. 2158–2169, May 2007.
[16] V. W. Neo and P. A. Naylor, “Second order sequential best rotation algorithm with Householder transformation for polynomial matrix eigenvalue decomposition,” in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), 2019.
[17] S. Redif, S. Weiss, and J. G. McWhirter, “An approximate polynomial matrix eigenvalue decomposition algorithm for para-Hermitian matrices,” in Proc. Intl. Symp. on Signal Process. and Inform. Technology (ISSPIT), 2011, pp. 421–425.
Contact Us
Email: vincent.neo09@imperial.ac.uk
Visit Our Websites: SAP | Vincent W. Neo | Christine Evers | Patrick A. Naylor