關于語音增強效果測試方法,以前隻知道這樣分類:主觀測試方法和客觀測試方法。這個也是語音信号測試教科書交給我們的。
1) 主觀測試方法,平均意見得分(MOS),
2) 客觀測試方法,信噪比,分段信噪比,闆倉距離,PESQ等。
現在看到文獻中提及的客觀測試方法也是可以分的,侵入式(Intrusive)和非侵入式(non-intrusive).
侵入式方法依靠參考語音和測試語音之間某種形式的距離特性來預測主觀平均觀點得分(Mean option score, MOS). 非侵入式方法則僅依據測試語音來預測語音的品質, 因而更加具有挑戰性.簡單地說,侵入式還需要一個原始純淨語音做參考語音。而非侵入式測試,則不需要原始純淨做參考。
看到一個文獻總結不少客觀測試方法,總結的方法很多,有些聽說過,如SII等,更多的方法聞所未聞。另外,如侵入式檢測的方法,p.563,列了下載下傳位址,可以拿着源碼來學習一下,也是很不錯的。列如下:
Intrusive measures: need a reference clean signal (x) to judge the noisy signal (y)
LSP based weights:
Inverse harmonic mean weighting (IHMW)
● higher weights to regions where LSP are closer to each other → strong resonance
Inverse variance weighting (IVW)
● Euclidian distance between LSP extracted from x and y, normalized by variance →
approximation of log spectral distortion
Gardner weighting (GW)
● sensitivity matrix for LSP → approximation of log spectral distortion
Formant bounded weight (FBW)
● combines IHMW and GW
Positions and distance weighting
● it is the weighted sum of the Euclidian distances taken with regard to LSP values and their
relative position
Standards for Quality or Intelligibility of speech
Perceptual Evaluation of Speech Quality (PESQ) 1
● based mostly on Perceptual Speech Quality Measure (PSQM) and Perceptual Analysis
Measurement System (PAMS)
● shows high correlation with subjective measures
● works only for sampling frequency up to 16kHz
Speech Intelligibility Index (SII) 3
● a weighted SNR in frequency domain
● it compares the clean signal x with the noise
● the internal representation is the critical band filtered signal
● the weights and bands are defined in the standard
Coherence SII (cSII)
● extension of the Speech Intelligibility Index (SII)
● incorporate the coherence for the SNR (SDR) calculation, so it also includes distortion
effects
● coherence is the normalised cross spectral density and calculated in 3 different levels of
spectral amplitude regions
● for additive noise cSII == SII
Measures based in perceptual models:
Dau measure 4
● based on the Dau model for the effective processing in the human auditory system
● calculates time-frequency domain internal representation of x and y
● the internal representation considers:
○ filter banks
○ spectral and temporal masking
○ hair cell transformation
○ non linear adaptive -> realistic dynamic compression, temporal masking effects
● the measure is the average normalized linear correlation coefficient taken across overlapping
frames of the internal representation signals
Glimpse proportion 5
● based on the Glimpse model
● calculates time-frequency domain internal representation of x and the noise
● the internal representation considers:
○ gammaton filter banks
● the measure is the proportion of time-frequency bins where clean speech x has higher energy
levels than noise
HNS
● based on Dau model as well with an extra frequency weight in the output (higher weights to
higher frequencies)
PAR
● calculated in the frequency domain
● it is based on soft frequency masking thresholds
● designed for sinusoidal type of distortions (sinusoidal audio coders)
TAA
● based on spectro-temporal masking curves
● computational complexity as low as a spectral masking model
● PAR with parts of DAU (log instead of NL and hair cell transformation)
Non intrusive measures (don’t need a reference clean speech signal)
ITU-T P-563 6
● vocal tract analysis
● speech reconstruction from corrected vocal tract parameters -> reference signal
● parameter extraction and classification of degradations: low static SNR, mutes, low sSNR,
unnatural voice, unnatural male voice, unnatural female voice.
HMM based approach for speech synthesis
● measure is the normalized log likelihood of features extracted from the synthesised signal
and evaluated in a HMM trained with natural speech (models are gender dependent)
Resources for distance measures codes
1 – P. Loizou, Speech Enhancement: Theory and Practice. Boca Raton, CRC, 2007.
Matlab code that comes with the book
(LLR, IS, CEP, WSS, FWS, nFWS, PESQ)
2 – P. Loizou, COLEA: A Matlab software tool for speech analysis
Available to download: http://www.utdallas.edu/~loizou/speech/colea.htm
(LLR, IS, CEP, WSS, SNR)
3 – Implementation of the standard SII, Matlab and C codes
Available to download: http://www.sii.to/html/programs.html
(SII)
4 – Computational Auditory Signal Processing and Perception (CASP) model
Available to download upon request: http://www.dtu.dk/centre/cahr/English/downloads.aspx
(Dau model)
5 – Glimpse proportion measure
Ask Martin Cooke for code.
6 – Implementation of the standard ITU-T P-563
available to download: http://www.itu.int/rec/T-REC-P.563-200405-I
(P-563)