ISTFT和STFT是否可逆的问题

2023-04-06 15:42:36

引言：

前几天听了汪德亮老师的讲座，碰到一个奇怪的问题：在低信噪比、高混响下对原始信号时频幅度谱进行修正后，再进行 istft i s t f t 和 stft s t f t 的转换，此时的时频谱和修正后的原始时频谱不一样，而且 istft i s t f t 后获得的时域信号并没有起到去混响的效果反而是十分奇怪的声音。当时同事们对此现象都感到疑惑。按照我的理解，对于任意的复数域元素 H H ,H∈CMNH∈CMN, M M 表示数据的帧数,NN表示数据的频点数，存在如下的关系： stft(istft(H))=H s t f t ( i s t f t ( H ) ) = H ,如果以上的关系不成立，则现在绝大多数的音频增强算法的套路：对幅度谱进行修正，利用带噪信号相位谱进行istft变换获得修正时域语音，会存在一定的风险。下面对这一问题进行讲解。

代码：

realData = rand(257,100);

%realData = [realData;realData(end-1:-1:2,:)];

imgData = rand(257,100);

%imgData = [imgData;-imgData(end-1:-1:2,:)];

comData = realData + 1i*imgData;

overLap = 0.5;

frameSize = 512;

y = ISTFT(comData, frameSize, overLap);

[ftbin,Nframe,Nbin,Lspeech,speechFrame] = STFT((y), frameSize, overLap, frameSize);

error = squeeze(ftbin) - comData ;

data = ones(10240,1);

overLap =0.5;

[ftbin1,Nframe,Nbin,Lspeech,speechFrame]= STFT(data, frameSize, overLap, frameSize);

y1 = ISTFT(squeeze(ftbin1), frameSize, overLap);

[ftbin2,Nframe,Nbin,Lspeech,speechFrame]= STFT(y1, frameSize, overLap, frameSize);

error1 = data - y1;

error2 = squeeze(ftbin1) - squeeze(ftbin2) ;

ISTFT和STFT是否可逆的问题

H∈CMN H ∈ C M N :任意的复数矩阵

F F :运算符

HH:运算符

F(H)=G(H)−H F ( H ) = G ( H ) − H

G(H)=STFT(iSTFT(H)) G ( H ) = S T F T ( i S T F T ( H ) )

按照一般的理解， F(H)=0 F ( H ) = 0 成立，然而根据前文的介绍，该等式并非恒成立。

直接粘贴论文的定义吧：

The set of ==consistent spectrograms== can thus be described as the kernel (or null space) of the R-linear operator from

CMN C M N to itself defined by

F(H)=G(H)−H F ( H ) = G ( H ) − H

G(H)=STFT(iSTFT(H)) G ( H ) = S T F T ( i S T F T ( H ) )

Let H(m,n) H ( m , n ) be a set of complex numbers, where m m will correspond to the frame index and nn to the frequency band index, and W W and SS be analysis and synthesis

windows verifying the perfect reconstruction conditions for

a frame shift S S . For the set HH to be a consistent STFT spectrogram, it needs to be the STFT S T F T spectrogram of a signal X(t) X ( t ) . But by consistency, this signal can be none other than the result of the inverse STFT of the set H(m,n) H ( m , n ) . A necessary and sufficient condition for H H to be a consistent spectrogram is thus for it to be equal to the STFTSTFT of its inverse STFT S T F T . The point here is that, for a given window length N N and a given frame shift, if we denote the inverse STFTSTFT by iSTFT i S T F T , the operation iSTFT–STFT i S T F T – S T F T from the space of real signals to itself is the identity, while STFT–iSTFT S T F T – i S T F T from CMN C M N to itself is not.

这个问题对我们的启示是，在进行语音增强后通过得到的频域幅度谱恢复出的时域信号再返回到时谱幅度谱时两者并不相同，前端信号处理在频域完成处理后输出时域信号给识别器时，其提取的MFCC特征可能并不是最优的。对于该问题更严格的推导，可参考论文。

参考论文：

1.Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction.

2.FAST SIGNAL RECONSTRUCTION FROM MAGNITUDE STFT SPECTROGRAM

BASED ON SPECTROGRAM CONSISTENCY.

author:longtaochen

email:[email protected]

ISTFT和STFT是否可逆的问题

引言：

代码：

参考论文：

继续阅读

频域GSC

语音增强三，几种固定波束形成技术Delay and Sum最大指向性因数 Maximum DFRobustSuperdirectiveNull steeringsubspace法

语音增强-自适应回声消除

RNNoise: Learning Noise Suppression（深度学习噪声抑制）（1）前言RNNoise噪声抑制传统噪音抑制深度学习和循环神经网络混合方法定义问题频带结构深层结构训练数据优化过程Optimization processGain smoothingPitch filtering从Python到C参考文档

针对rnnoise vad 分享

语音增强二，麦克风阵列问题建模及求解性能指标空间混叠

噪声估计之MCRA2

噪声估计的主要方法简要概述

speex aec 与webrtc 回声消除的比较优化

应用谱减法进行语音去噪的算法研究

论文阅读：GCRN：Learning Complex Spectral Mapping With GatedConvolutional Recurrent Networks forMonaural1.摘要2.复数谱映射3.卷积循环网络(CRN)4.基于Mask的语音增强5.分组策略6.Gated Linear Units7.网络架构

【语音增强论文解读 01】 A Convolutional Recurrent Neural Network for Real-Time SpeechEnhancement1. 动机3. 网络架构4. 实验设置5. 实验结果6. 结论

单通道说话人语音分离——Conv-TasNet(Convolutional Time-domain audio separation Network)单通道说话人语音分离——Conv-TasNet模型(Convolutional Time-domain audio separation Network)

麦克风阵列处理之超指向波束与散射噪声场概述原理波束图Reference

麦克风阵列处理之TF-GSC 广义旁瓣相消器概述原理Matlab验证Q&AReference