Jing Wang, Xiang Xie, Jingming Kuang
School of Information Science and Technology, Beijing Institute of Technology, Beijing 100081, China
Speech enhancement technology has great importance in voice communication with the purpose of improving the speech quality under noisy environment and increasing the performance of subsequent processing system. By means of installing multiple microphones on the front-end in speech processing system,there are various means to overcome the adverse impact of the background noise, and it can better improve the quality of speech communication and speech recognition when comparing with the single channel speech enhancement technique. Traditional single microphone speech denoising methods such as spectral subtraction, Wiener filtering, statistical model and subspace method are widely used because of their simplicity and efficiency.However, such single channel algorithms may cause signal distortion or introduce the socalled musical noise when carrying out too much noise reduction. Hence, multi-microphone based speech enhancement approach begins to be a widely used strategy to offer better solutions for such speech enhancement problem. The microphone array can obtain signal with time and space information simul-taneously when receiving signals, so it can be more flexible to improve the performance of noise reduction.
This paper proposes a novel microphone array speech denoising scheme based on tensor filtering methods including truncated HOSVD (High-Order Singular Value Decomposition), low rank tensor approximation and multi-mode Wiener fi ltering.
Adaptive beamforming approach [1] and subspace decomposition approach [2] are some good strategies that can be applied in multi-microphone situation. There are also many improved algorithms that based on these strategies, such as the optimal distributed minimum-variance beamforming approaches [3]and the perceptual properties based subspace approaches [4]. Such microphone array denoising methods use the classical matrix-based algebraic technique. As we know, the received signals can be thought as a multi-dimension model containing: channel, time, spectrum,which is indeed a higher-order tensor structure. A tensor is a multidimensional array in multi-dimension space which can process the signal with multiple factors more efficiently than vector and matrix. Tensor analysis has been widely studied and used in many fields such as quantum physics, text mining, data analysis and image processing with the development of multi-channel or multi-dimensional data in recent years [5,6]. Tensor can be seen as a multidimensional matrix, which is a high dimensional expansion of vector and matrix.The development of tensor analysis allows people to consider modeling the signal into a tensor form so as to solve signal problems in high dimensional space [7-10]. The approaches based on tensor algebra are especially suitable for the analysis and processing of multidimensional array signals.
This paper proposes multi-mode tensor filtering model consisting three approaches including truncated HOSVD, low rank tensor approximation, multi-mode Wiener filtering,in which 3-order tensor is constructed by multi-microphone speech signal according to three dimensions of channel, time and frequency. Applying tensor algebra theory,multi-microphone speech signal denoising can be realized more efficiently. Since the similar channels and frames share the same underlying speech structure, the redundancy can be utilized to remove noise with decreasing the dimension of channel mode and frequency mode. As to the method of truncated HOSVD which is indeed a means of calculating Tucker decomposition, the key point is to select leading left singular vectors that best retain the original signal information with the purpose of reducing the dimensionality. For low-rank approximation filtering method, multi-mode linear filters are constructed to reduce noise according to low rank tensor approximation with Tucker model and ALS (Alternating Least Square Algorithm)[11,12]. As to the third method, multi-mode Wiener filtering can be thought as the spanning of one-mode wiener filtering based on tensor analysis. On the whole, the proposed three tensor filtering methods could perform well and show the potential ability in our experimental simulation.
The rest of this paper is organized as follows:Section II shows the relative tensor operations which are used in this paper. Section III gives tensor space representation and tensor filtering model. Section IV describes the proposed three speech denoising tensor filtering approaches in detail. Section V presents the parameters setup and experimental results, from the view of waveform and spectrogram, objective indexes and subjective test results of the traditional and the proposed methods. Consequently, we conclude this paper in Section VI.
For multilinear analysis, the existing framework of vector and matrix algebra appears to be insufficient and/or inappropriate [13].A tensor is a multidimensional array which represents an element ofN-order multifactor space. The order of tensor is the number of dimensions or factors, also known as mode or way. The two important and basic operations of tensor are mode-nunfolding and mode-nproduct. The mode ‘n’ means dimensionalitynreferring to then-th subspace of theN-order tensor space. In this paper, matrix is denoted by italic boldface capital letters and tensor is denoted by standard boldface capital letters.
The mode-nunfolding, also called mode-nflattening, is the matricization of the tensorin the subspace of moden,which can be expressed as the fl attened matrixHere R stands for the Rational number space and the sequence of indexes (I1,I2, …,IN) stands for the different dimensionInat the different mode n in the high-order space.
The mode-nproduct includes tensor timing vector, tensor timing matrix and tensor timing another tensor, which can be processed based on the tensor matricization [14]. This paper only makes use of tensor timing matrix.The mode-nproduct of anN-order tensorby a matrixcan be denoted by Y=X×nU, which is still anN-order tensor of sizeThe symbol ×nstands for tensor product in the direction of moden, which is different from the traditional product between two elements. This kind of tensor product can be expressed by matrix product using the mode-nunfolding as follows
Each element of tensor Y can be expressed as
Here the variablesy,xanduare the elements of tensor Y, tensor X and matrixUrespectively. And the norm of a tensorcan be defined as the square root of the sum of the squares of all its elements as follows
For anN-order tensor multiplied by a sequence of matricesthe resulting tensor can be expressed as
Here ? is the Kronecker product between matrices. This kind of tensor multiplication is very useful in tensor decomposition applied to data arrays for extracting and explaining their properties.
Tensor decomposition can use the basic tensor operations like tensor matricization and tensor product which are shown in Section 2.1.There are many choices for tensor decomposition which generally combine a choice of orthonormal bases in the domain of the tensor with a suitable truncation of its expansion.Two main kinds of tensor decomposition are CP (CANDECOMP/PARAFAC) and Tucker decomposition [14]. The latter one can be regarded as a multilinear generalization of the traditional matrix SVD (Singular Value Decomposition) [15] and plays an important role in tensor-based signal processing.
For anN-order tensorrank-approximation with the truncated Tucker decomposition is represented as
Whereare the truncated components or factor matrices (usually orthogonal matrices) in mode-1,mode-2 and mode-nsubspaces, respectively.is the truncated core tensor whose entries show the level of interaction between the different components, and is computed with
Tucker model shown in Eq. (5) provides a kind of tensor structure. One advantage of Tucker decomposition is that it can transform the original tensor into the core tensor with factor matrices. Many methods for computing a Tucker decomposition are proposed by researchers, like HOSVD, ALS, etc.
HOSVD is a convincing generalization of the matrix SVD and gives ways to ef fi ciently compute the singular vectors ofX(n). When satisfying the conditionone or moren, the decomposition is called the truncated HOSVD.
The truncated HOSVD is not optimal, but it is a good starting point for an iterative al-ternating least squares (ALS) algorithm [14].Kroonenberg proposed an ALS algorithm,called TUCKALS3 for computing a Tucker decomposition for three-way arrays, and later many approaches were developed on the bases of TUCKALS3 extending toN-ways forN>3.
For a 3-order tensor, Tucker decomposition with truncated HOSVD in Eq. (5) is shown infigure1. The red dotted lines infigure1 mean dimensionality reduction at different modes under the truncated Tucker decomposition.
We consider the noisy multi-microphone data as a three-order tensoris channel mode,I2is the time mode andI3is the frequency mode. Here, the multiple microphones are considered to be multiple channels.Each channel can be separated into a sequence of data frames in time axis. And each frame can be transformed into frequency domain.There are several common transforms in signal processing field such as Fourier Transform(FT), Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT). DCT is used in our method to obtain positive frequency spectral values.
The observed three-order tensor T with the dimension ofI1,I2,I3in each mode can be represented as
Fig. 1. Tucker decomposition.
Where T is the three-order tensor that is constructed by noisy signal data, S is constructed by clean speech data and N is constructed by noise part. This paper assumes that the noise N is independent from the clean signal S, and that thenth-mode rankKnis smaller than thenth-mode dimension?n=1 toN.
Then, assuming that thenth-mode spaceE(n)consists of two orthogonal subspaces:and.Consequently, extending the traditional subspace to tensor space is feasible,and its definitions is as follows [16].
(i): signal subspace with dimensionKn, spanned by theKnsingular vectors and associated with theKnlargest singular values of matrixS(n).
(ii): noise subspace with dimension, spanned by thesingular vectors and associated with thesmallest singular values of matrixS(n).
Hence, estimating the speech signal tensor S from noisy data tensor T is translated into the estimation ofin everynth-mode of tensor T.
High-dimensional noisy speech space consists subspace of noise which indicates that the clean speech data owns a lower intrinsic dimensionality. The multi-microphone speech data has spectrum correlations between different channels and different frames which assumes to be stationary along time. That allows us to yield the clean speech data with a lower dimensional approximation by removing the inter channel and the intra channel redundancy of multi-microphone data [12]. This can be fi nished with Tucker decomposition.
Tucker decomposition is considered as a multilinear generalization of SVD. As is discussed above, the noisy multi-microphone data is constructed as a three-order tensorwith channel mode, time mode and spectrum mode, respectively. Following the Tucker model, we have
WhereP(j),j=1,2,3 are factor matrices and G is the core tensor.
In order to reduce the dimensionality, the key point here is to select the leading left singular vectors of each flattened matrixP(j)which best retains the original signal information through the truncated high-order SVD(HOSVD) [13, 17]. As is shown in figure 2,the columns with deeper color in the flattened matrices representing the leading singular vectors and the columns with lighter color are common trivial components of the noisy speech tensor. Suppose the leadingsingular vectors of each flattened matrix are selected and formed ascore tensor of lower dimensional approximation is computed by
Then the estimated clean speech tensorcan be approximated by
After enhancement, the tensorcan be restored back toMchannels data according to how it forms originally.
4.2.1 LRTA with tucker model
HOSVD and Low Rank Tensor Approximation (LRTA) is widely used in tensor signal processing, especially in signal compression,blind source separation, etc.
As to the 3-order noisy signallow rank tensor approximation is to find an approximate tensorwith a lower rankIt can be expressed as follows
The target clean speech can be reconstructed from the approximate tensorwhich can be modeled by Tucker decomposition using Eq.(3) as follows
The core tensor G can be computed using Eq.(4).
Then the optimized rank-(K1,K2,K3) tensorcan be expressed as
Herecan be obtained from the clean signal space which is initialized by HOSVD method and is optimized by ALS (Alternating Least Square) algorithm[18].In general, the low rank approximation is realized by TUCKALS3 algorithm which has been used to process a multimode PCA (Principal Component Analysis) in order to remove the white noise in image processing [19] and denoising of multicomponent seismic [20].
4.2.2 Tensor rank estimation with MDL
As to the tensor filtering model, the purpose is to obtain an optimal approximation of the noisy tensor under the rules of least square error. In the low rank tensor approximation,the key problem is to find the optimal parametersKnwhich finally affects the performance of the speech denoising, because for allnth modes, ifKnis too small, some speech information may be partly lost, and otherwise some noise may be included after restoration.
Fig. 2. Truncated HOSVD.
Usually, the rank is setup according to the experience, and whether the rank is proper or not depends on the estimation approach [21].
ForN-order tensor T, how to find the rank in each modeforn=1 toNwhich is the number of eigenvalues of the fl attenedT(n)can permits an optimal approximation of T in the least squares sense. Here we estimate the optimalKnby the use of MDL(Minimum Description Length) criterion [21].The optimal signal subspace dimension is just the optimal rank under each mode. Thus,for each mode, the MDL criterion can be expressed as
In which,stand for theInsingular values ofT(n).Mnis the number of columns of the n-mode matrixT(n). As to the following multi-mode Wiener filtering method,are theIneigenvalues of either matrixor
Multi-mode wiener filtering can be considered as the spanning of single-mode wiener filtering, which has similarity in basic theory by MSE (Mean Square Error). Supposing 3-orderresulting from the record of microphone array with additive white, Gaussian and independent-from signal noise, we have T=S+N. Like the single channel Wiener filtering method, the target speech tensor data S can be obtained by filtering the noisy tensor data T with multi-mode Wiener filter.To obtain the n-mode Wiener filters, we can calculate Mean Square Error (MSE) between the expected signal tensor S and the estimated signal tensoras follows
In this formula, filterscan be callednth-mode Wiener filters. The computation is presented in paper [22], the minimization of Eq.(16) with respect to filterP(n),for fixedP(m),m≠n, leads to the following expression ofnth-mode Wiener filter
The expression can be found in [22]. Whereis related to S and T, howeverdepends on T.
In order to obtainP(n)through (17), we suppose that the filtersP(m),m =1 to 3,m≠nare known. Tensor T is available, but signal tensor S is unknown.
So, only the termcan be derived, and not the term. Hence, some more assumptions on S have to be made in order to overcome the in-determination over[16]. In the one-dimensional case, a classical assumption is to consider that a signal vector is a weighted combination of the signal subspace basis vectors. In extension to the tensor case, we make the assumption that then-mode unfolding matrix of signalS(n)can be expressed as a weighted combination ofKnorthogonal vectors fromn-mode signal subspace basis.
Whereis composed ofKnorthogonal vectors fromn-mode signal subspace basis. AndO(n), a random weighting matrix, is supposed to own the whole information on the expected signal tensor S.According to the fact that signal subspace and noise subspace is mutually orthogonal, hence the unfolding matrixS(n)is orthogonal toN(n)in noise subspace.
P(n)can be formulated by the following expression and the calculation can refer to [22]
Here, matrixis defined as follows, and?stands for Kronecker product.
In formula (22),is a diagonal weight matrix given by
In which,are theKnlargest eigenvalues ofQ(n)-weighted covariance matrix
Parametersdepend onwhich are theKnlargest eigenvalues ofq(n)-weighted covariance matrixaccording to the following relation
Superscriptγand Γrefers to theq(n)-weighted covariance, andQ(n)-weighted covariance respectively.is the degenerated eigenvalue of noiseq(n)-weighted covariance matrix
Considering the noise with additive and independent properties, thesmallest eigenvalues of. And its estimation is as follows
In order to get the filtersP(n), the alternating least squares (ALS) algorithm has been proposed, and its summation are shown in chapter 3.2.4 in [22]. Also, the MDL criterion described in Section 4.2.2 can be used to estimate the parameterKn.
The microphone array database used in the experiment comes from Carnegie Mellon University (CMU) [23]. This database is recorded in a real environment and contains 10 male speakers each speaking 14 utterances.There are two kinds of microphone array including 8 elements and 15 elements. We used the 8-element array microphones which are spaced linearly with a spacing of 7 cm between elements. The speaker sits directly in front of the array at a distance of 1 meter from the center and the noise mainly contains diskdrive and cooling fans of many computers in the real environment. The data sampling rate is 16 kHz with quantization precision of 16 bits PCM. We choose 15 utterances randomly from the database. Hamming window is chosen as frame window of length 32ms with 50% overlapping. Taking the utterance ‘beeoer’ as an example, the original dimensions of the noisy tensor signalare set asIc=8,It=150 andIs=512.
We compare the proposed three tensor filtering approaches with the traditional beamforming, adaptive beamforming like GSC(Generalized Sidelobe Canceller) method [1]and subspace approach [2]. By now, the subspace approach performs best among the different denoising methods, which separates the noisy subspace into two different subspaces:the signal subspace and the noise subspace.The clean signal can be reconstructed in the signal subspace. Conventional denoising methods depend on accurate noise estimation algorithm and may lead to the perception distortion because of the imbalance between noise reduction and signal distortion.
5.2.1 Tensor rank setup for method1
Original tensorwith additive noise is related three modes including channel, time and frequency, and the dimensions areI1=8,I2=150, andI3=512 in each mode. Here, through our experience in tensor experiments, the fi xed tensor ranks of the truncated HOSVD in each mode are set toK1=1,K2=120 andK3=512. Although the results are different under different tensor ranks, we select the optimal situation of the fixed tensor ranks to make the speech distortion and noise reduction be balanced. Thus the comparisons with other methods can be done in a simple way.
5.2.2 Tensor rank estimation with MDL for method2
In the ALS iteration process of the TUCALS3 algorithm in LRTA method, we use MDL cri-terion to estimate the dimension of the signal subspace, i.e. the optimal tensor rank. Figure3 shows the estimation procedure using MDL criterion. In the figure, the horizontal axis represents the different modes, the vertical axis represents the optimal MDL estimation values,and the asterisk marks the minimum point.
From figure3, we can see that the minimum value of the MDL estimation is 7, 94 and 74 for different modes (I1, I2 and I3) respectively. Thus for the different subspace, the optimal tensor rank is set to beK1=7,K2=94 andK3=74. MDL criterion can effectively compute the optimal dimension of the subspace instead of selecting fixed dimension artificially. It can make the best compromise between signal distortion and noise reduction.
5.2.3 Parameters setup for method3
We also apply MDL criterion in the multimode Wiener filtering method to estimateKnwhich is the dimension of the signal subspace,i.e. the optimal tensor rank. As to the MSE criterion in Eq.(16), the error thresholdεis set to an empirical value 0.1, which decides the extent of convergence in every iteration.Besides, the total number of iteration of the filter computation is set to 5 considering the computation complexity.
Fig. 3. Estimation results of tensor rank with MDL.
We compare the proposed three methods of tensor filtering model with the traditional three approaches including beamforming, adaptive beamforming and subspace. This section demonstrates waveform and spectrogram of clean speech, signal with noise, denoised signal respectively.
From the figures above, some conclusions can be summarized. When comparing figure4(c) with figure4 (a) and (b), beamforming method can reduce part of noise, especially noise in high frequency. However, there is no obvious improvement in low frequency. infigure4(d), the method of GSC with adaptive beamforming performs a little better than beamforming in filtering the noise. Moreover,when the sound source moves, GSC method would have more advantages. As to the matured subspace method in figure4(e), when contrasting with beamforming and GSC, its performance of filtering noise in both low and high frequency is apparently good. From the view of waveform, there is no obvious wave distortion for beamforming and GSC approaches, however for subspace the energy of the denoised signal is less than the input one. As to the subspace method, the noise is reduced in both high and low frequency. However, in high frequency the speech signal has some distortion. The more the noise is eliminated, the greater the signal is distorted.
The proposed tensor-based methods are given in figure4 (f), (g) and (h). Figure4 (f)shows the results of the truncated HOSVD method. We can see that noise above 2000Hz has been basically removed except for small amount of residual, but there is few reduction for noise lower 2000Hz. Consequently, from the view of perceptual effect, there remains a few background noise in the speech signal when listening. Figure4 (g) shows the waveform and spectrogram of the denoised signal when applying LRTA approach. Comparing with truncated HOSVD method, noise in high frequency is further reduced only remaining the noise lower 1000Hz. Besides, there is little distortion in the speech signal after denoising mostly because the optimized tensor rank with MDL criterion. Figure4(h) shows the result of multi-mode Wiener filter. It is obvious that the noise is reduced more and the performance is better comparing with the other two tensor methods in figure4 (f) and (g). From the view of spectrogram, the three tensor-based methods remains more noise in low frequency when comparing with the subspace method.
Fig. 4. The waveforms and spectrograms of utterance “beeoer”.
Several objective speech quality evaluation indexes including segmental signal-to-noise ratio enhancement (SNRseg), log-likelihood ratio (LLR), overall quality (Covl), background distortion (Cbak) [24, 25] are used to compare the performance of different speech denoising algorithms. The smaller the LLR and the bigger the other indexes, the better the performance.
Wheres(n) is the clean speech signal,is the enhanced speech signal after denoising,Nis the frame length andMis the number of frames.
Whereasandare the LPC coef fi cients of the clean speechsand the denoised speechrespectively, andRsis the autocorrelation matrix of the clean speech. The number of LPC fi lter order is set to a fi xed value 10.
Table I. Objective experiment results.
Where PESQ is the standard objective speech quality score [26] and WSS is the weighted spectrum slope distance measure [24].
In Table1, “Noisy input” represents the middle microphone speech signal of the microphone array. All scores are obtained with an average of the 15 chosen utterances. Note that higher SNRseg, Covl, Cbak scores and smaller LLR score represent better objective performance.
From table 1, we can see that subspace approach performs better than other methods for most indexes. As to the ‘SNRseg’,the proposed three tensor filtering methods perform better than the traditional GSC and beamforming methods but have worse results than subspace approach. Because the subspace approach uses noise estimation which can describe the noise space better than the other methods and leads to a large reduction of the noise. As to the background noise distortion ‘Cbak’, three tensor filtering methods performs better especially multi-mode Wiener filter which illustrates that the multi-mode filtering by tensor model leads to comfortable background feeling without distorting the residual noise. As to the overall quality ‘Covl’,tensor filtering methods have the similar results with subspace approach, in which multimode Wiener filter performs best. Because in multi-mode filtering method, the optimal tensor ranks can be obtained which could lead to an optimal situation between speech and noise. As to the ‘LLR’ indexes, the traditional beamforming method behaves higher similarity of the spectral envelope. The objective results indicate that the noise reduction degree and the spectrum similarity are both lower for the tensor filtering methods mostly because it does not use the noise estimation which can be further improved by considering the subspace signal feature.
In order to check the listening performance of the proposed approaches, we also carried out subjective test using A/B listening method. A is the enhanced signal using traditional method (GSC or Subspace). B is the tensor filtered signal. A and B are unknown and random in the listening. We invited 4 males and 4 females to join the test. They need to select the better enhanced utterances according to two listening standards of least signal distortion and least residual noise. The test results are shown in table 2.
From table 2, we can see that most of the listeners considered that the subspace approach can remove most of the noise signal and lead to the less residual which has a consist result with the objective index ‘SNRseg’.While the tensor filtering methods have a better result with the subjective feeling of signal distortion mostly because of the accurate rank estimation in each mode. From the view of hearing feeling, the subspace method has the least residual noise and the tensor filtering methods based on tensor analysis have the least signal distortion.
This paper constructs the noisy multi-microphone speech data using 3-order tensor model with three dimensions: channel, time,and spectrum. The traditional single channel filtering can be processed by tensor filtering methods which can be realized by truncated HOSVD, LRTA and multi-mode Wiener fi ltering. The proposed three kinds of higher-order speech denoising methods based on tensor model extend the speech signal processing approaches in high dimension. Preliminary results show the potential improvement on the denoised signal quality by use of tensor filtering methods especially multi-mode Wiener filtering. While the tensor filtering methods still performs worse in noise reduction ability and should be enhanced if compared with the subspace approach. The research of this paperprovide new schemes and some useful results through tensor filtering in microphone array speech denoising. In order to get a better noise reduction, the tensor constructing and filtering method could be improved referring to the processing scheme of the subspace approach.
Table II. Subjective A/B test results.
The authors would like to thank the reviewers for their kind suggestions. The work in this paper is supported by the National Natural Science Foundation of China (No. 61571044,No.11590772, No. 61473041 and No.61620106002).
References
[1] D. V. Compernolle, “Switching Adaptive Filters for Enhancing Noisy and Reverberant Speech from Microphone Array Recordings,”IEEE International Conference on Acoustics,Speech and Signal Processing, Albuquerque, USA, 1990, 2,pp. 833-836.
[2] Y. Ephaim and H. L. Van Trees, “A Signal Subspace Approach for Speech Enhancement,”IEEE Trans. Speech and Audio Processing, vol.3, pp.251-266, July 1995.
[3] S. Markovich-Golan, A. Bertrand, and Marc Moonen, “Optimal distributed minimum-variance beamforming approaches for speech enhancement in wireless acoustic sensor networks,”IEEE Trans. Signal Processing, vol. 107,pp. 4-20, February 2015.
[4] N. Cheng and W. Liu, “Perceptual Properties Based Signal Subspace Microphone Array Speech Enhancement Algorithm, ”Acta Automatica Sinica, vol. 35, no. 12, pp. 1481-1487,December 2009.
[5] L. D. Lathauwer, D. Moorb, and J. Vanderwalle,“A Multilinear Singular Value Decomposition,”SIAM J. Matrix Anal. Appl., vol. 21, pp. 1253–1278, 2000.
[6] T. G. Kolda and B. W. Bader, “Tensor Decomposition and Applications,”SIAM REVIEW, vol. 51,no. 3, pp. 455–500, 2009.
[7] Q. Wu, L. Zhang, and G. Shi, “Robust Multifactor Speech Feature Extraction Based on Gabor Analysis,”IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 4, pp.927-936, 2011.
[8] M. Kilmer, K. Braman, and N. Hao, “Third-order tensors as operators on matrices:A theoretical and computational framework with applications in imaging,”SIAM Journal on Matrix Analysis and Applications, vol. 34, no. 1, pp. 148–172, 2013.
[9] J. Wang, X. Xie, J. Kuang, “A Novel Multichannel Audio Signal Compression Method Based on Tensor Representation and Decomposition,”China Communications, vol. 11, no. 3, pp. 80-90, March 2014.
[10] Z. Wang, J. Wang, C. Xing, FEI Zesong and J.Kuang, “Tensor Based Blind Signal Recovery for Multi-Carrier Amplify-and-Forward Relay Networks,”SCIENCE CHINA InformationSciences,vol. 57, no. 10, October 2014.
[11] Weiland S and van Belzen F, “Singular Value Decompositions and Low Rank Approximations of Tensors,”IEEE Trans. Signal Processing, vol. 58,pp. 1171 -1182, March 2010.
[12] Jing Wang, Chundong Xu, Xiang Xie and Jingming Kuang, “Multichannel Audio Signal Compression Based on Tensor Decomposition,”IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, pp. 26-31,May 2013.
[13] LATHAUWER L D, MOORB D, VANDERWALLE J.A Multilinear Singular Value Decomposition [J].SIAM J. Matrix Anal. Appl., 2000, 21:1253–1278.
[14] KOLDA T G, BADER B W. Tensor Decomposition and Applications [J].SIAM REVIEW, 2009, 51(3):455–500.
[15] WEILAND S, BELZEN F. Singular Value Decompositions and Low Rank Approximations of Tensors [J].IEEE Transactions on Signal Processing,2010, 58(3):1171-1182.
[16] Marot J, Fossati C, Bourennane S. About Advances in Tensor Data Denoising Methods[J].Journal on Advances in Signal Processing, 2008,2008(1):12.
[17] Damen Muti, Salah Bourennane and Julien Marot, “Lower-rank tensor approximation and multiway filtering,”SIAM J. Matrix Anal. Appl.,vol. 30(3), pp. 1172-1204, 2008.
[18] B.W. Bader and T.G. Kolda, “Efficient MATLAB Computations with Sparse and Factored Tensors,”SIAM J. Scienti fi c Computing,vol. 30, pp.205-231, 2007.
[19] D. Muti and S. Bourennane, “Multiway filtering based on fourth-order cumulants,”EURASIP Journal on Applied Signal Processing, vol. 2005,no. 7, pp. 1147–1158, 2005.
[20] D. Muti and S. Bourennane, “Multidimensional signal processing using lower-rank tensor approximation,” in Proceedings oftheIEEE International Conference on Acoustics, Speech,andSignal Processing (ICASSP ’03), Hong Kong, April 2003, vol. 3, pp. 457–460
[21] M. Wax and T. Kailath, “Detection of signals by information theoretic criteria,”IEEE Transactions on Acoustics, Speech and Signal Processing, vol.33, no. 2, pp. 387–392, 1985.
[22] D. Muti and S. Bourennane, “Multidimensional filtering based on a tensor approach,”Signal Processing, vol. 85, no. 12, pp. 2338–2353, 2005.
[23] T. Sullivan, CMU microphone array database[Oline],available:http://www.speech.cs.cmu.edu/databases/micarray, Aug. 2008.
[24] Hu Yi and P. C. Loizou, “Evaluation of Objective Measures for Speech Enhancement,”Pro. Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, September 2006.
[25] Hu Yi and P. C. Loizou, “Evaluation of Objective Quality Measures for Speech Enhancement,”IEEE Trans. Audio, Speech, and Language Processing, vol. 16, pp. 229-238, January 2008.
[26] ITU-T P.862.2. Wideband extension to Recommendation P.862 for the assessment of wideband telephone networks and speech codecs.Geneva, Switzerland: International Telecommunication Union, 2007.