ISTFT和STFT是否可逆的問題

2023-04-06 15:42:36

引言：

前幾天聽了汪德亮老師的講座，碰到一個奇怪的問題：在低信噪比、高混響下對原始信号時頻幅度譜進行修正後，再進行 istft i s t f t 和 stft s t f t 的轉換，此時的時頻譜和修正後的原始時頻譜不一樣，而且 istft i s t f t 後獲得的時域信号并沒有起到去混響的效果反而是十分奇怪的聲音。當時同僚們對此現象都感到疑惑。按照我的了解，對于任意的複數域元素 H H ,H∈CMNH∈CMN, M M 表示資料的幀數,NN表示資料的頻點數，存在如下的關系： stft(istft(H))=H s t f t ( i s t f t ( H ) ) = H ,如果以上的關系不成立，則現在絕大多數的音頻增強算法的套路：對幅度譜進行修正，利用帶噪信号相位譜進行istft變換獲得修正時域語音，會存在一定的風險。下面對這一問題進行講解。

代碼：

realData = rand(257,100);

%realData = [realData;realData(end-1:-1:2,:)];

imgData = rand(257,100);

%imgData = [imgData;-imgData(end-1:-1:2,:)];

comData = realData + 1i*imgData;

overLap = 0.5;

frameSize = 512;

y = ISTFT(comData, frameSize, overLap);

[ftbin,Nframe,Nbin,Lspeech,speechFrame] = STFT((y), frameSize, overLap, frameSize);

error = squeeze(ftbin) - comData ;

data = ones(10240,1);

overLap =0.5;

[ftbin1,Nframe,Nbin,Lspeech,speechFrame]= STFT(data, frameSize, overLap, frameSize);

y1 = ISTFT(squeeze(ftbin1), frameSize, overLap);

[ftbin2,Nframe,Nbin,Lspeech,speechFrame]= STFT(y1, frameSize, overLap, frameSize);

error1 = data - y1;

error2 = squeeze(ftbin1) - squeeze(ftbin2) ;

ISTFT和STFT是否可逆的問題

H∈CMN H ∈ C M N :任意的複數矩陣

F F :運算符

HH:運算符

F(H)=G(H)−H F ( H ) = G ( H ) − H

G(H)=STFT(iSTFT(H)) G ( H ) = S T F T ( i S T F T ( H ) )

按照一般的了解， F(H)=0 F ( H ) = 0 成立，然而根據前文的介紹，該等式并非恒成立。

直接粘貼論文的定義吧：

The set of ==consistent spectrograms== can thus be described as the kernel (or null space) of the R-linear operator from

CMN C M N to itself defined by

F(H)=G(H)−H F ( H ) = G ( H ) − H

G(H)=STFT(iSTFT(H)) G ( H ) = S T F T ( i S T F T ( H ) )

Let H(m,n) H ( m , n ) be a set of complex numbers, where m m will correspond to the frame index and nn to the frequency band index, and W W and SS be analysis and synthesis

windows verifying the perfect reconstruction conditions for

a frame shift S S . For the set HH to be a consistent STFT spectrogram, it needs to be the STFT S T F T spectrogram of a signal X(t) X ( t ) . But by consistency, this signal can be none other than the result of the inverse STFT of the set H(m,n) H ( m , n ) . A necessary and sufficient condition for H H to be a consistent spectrogram is thus for it to be equal to the STFTSTFT of its inverse STFT S T F T . The point here is that, for a given window length N N and a given frame shift, if we denote the inverse STFTSTFT by iSTFT i S T F T , the operation iSTFT–STFT i S T F T – S T F T from the space of real signals to itself is the identity, while STFT–iSTFT S T F T – i S T F T from CMN C M N to itself is not.

這個問題對我們的啟示是，在進行語音增強後通過得到的頻域幅度譜恢複出的時域信号再傳回到時譜幅度譜時兩者并不相同，前端信号處理在頻域完成處理後輸出時域信号給識别器時，其提取的MFCC特征可能并不是最優的。對于該問題更嚴格的推導，可參考論文。

參考論文：

1.Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction.

2.FAST SIGNAL RECONSTRUCTION FROM MAGNITUDE STFT SPECTROGRAM

BASED ON SPECTROGRAM CONSISTENCY.

author:longtaochen

email:[email protected]

ISTFT和STFT是否可逆的問題

引言：

代碼：

參考論文：

繼續閱讀

頻域GSC

語音增強三，幾種固定波束形成技術Delay and Sum最大指向性因數 Maximum DFRobustSuperdirectiveNull steeringsubspace法

語音增強-自适應回聲消除

RNNoise: Learning Noise Suppression（深度學習噪聲抑制）（1）前言RNNoise噪聲抑制傳統噪音抑制深度學習和循環神經網絡混合方法定義問題頻帶結構深層結構訓練資料優化過程Optimization processGain smoothingPitch filtering從Python到C參考文檔

針對rnnoise vad 分享

語音增強二，麥克風陣列問題模組化及求解性能名額空間混疊

噪聲估計之MCRA2

噪聲估計的主要方法簡要概述

speex aec 與webrtc 回聲消除的比較優化

應用譜減法進行語音去噪的算法研究

論文閱讀：GCRN：Learning Complex Spectral Mapping With GatedConvolutional Recurrent Networks forMonaural1.摘要2.複數譜映射3.卷積循環網絡(CRN)4.基于Mask的語音增強5.分組政策6.Gated Linear Units7.網絡架構

【語音增強論文解讀 01】 A Convolutional Recurrent Neural Network for Real-Time SpeechEnhancement1. 動機3. 網絡架構4. 實驗設定5. 實驗結果6. 結論

單通道說話人語音分離——Conv-TasNet(Convolutional Time-domain audio separation Network)單通道說話人語音分離——Conv-TasNet模型(Convolutional Time-domain audio separation Network)

麥克風陣列處理之超指向波束與散射噪聲場概述原理波束圖Reference

麥克風陣列處理之TF-GSC 廣義旁瓣相消器概述原理Matlab驗證Q&AReference