天天看点

语音增强效果的测试方法

     关于语音增强效果测试方法,以前只知道这样分类:主观测试方法和客观测试方法。这个也是语音信号测试教科书交给我们的。

1) 主观测试方法,平均意见得分(MOS),

2)  客观测试方法,信噪比,分段信噪比,板仓距离,PESQ等。

     现在看到文献中提及的客观测试方法也是可以分的,侵入式(Intrusive)和非侵入式(non-intrusive).

侵入式方法依靠参考语音和测试语音之间某种形式的距离特性来预测主观平均观点得分(Mean option score, MOS). 非侵入式方法则仅依据测试语音来预测语音的质量, 因而更加具有挑战性.简单地说,侵入式还需要一个原始纯净语音做参考语音。而非侵入式测试,则不需要原始纯净做参考。

   看到一个文献总结不少客观测试方法,总结的方法很多,有些听说过,如SII等,更多的方法闻所未闻。另外,如侵入式检测的方法,p.563,列了下载地址,可以拿着源码来学习一下,也是很不错的。列如下:

Intrusive measures: need a reference clean signal (x) to judge the noisy signal (y)

LSP based weights:

Inverse harmonic mean weighting (IHMW)

● higher weights to regions where LSP are closer to each other → strong resonance

Inverse variance weighting (IVW)

● Euclidian distance between LSP extracted from x and y, normalized by variance →

approximation of log spectral distortion

Gardner weighting (GW)

● sensitivity matrix for LSP → approximation of log spectral distortion

Formant bounded weight (FBW)

● combines IHMW and GW

Positions and distance weighting

● it is the weighted sum of the Euclidian distances taken with regard to LSP values and their

relative position

Standards for Quality or Intelligibility of speech

Perceptual Evaluation of Speech Quality (PESQ) 1

● based mostly on Perceptual Speech Quality Measure (PSQM) and Perceptual Analysis

Measurement System (PAMS)

● shows high correlation with subjective measures

● works only for sampling frequency up to 16kHz

Speech Intelligibility Index (SII) 3

● a weighted SNR in frequency domain

● it compares the clean signal x with the noise

● the internal representation is the critical band filtered signal

● the weights and bands are defined in the standard

Coherence SII (cSII)

● extension of the Speech Intelligibility Index (SII)

● incorporate the coherence for the SNR (SDR) calculation, so it also includes distortion

effects

● coherence is the normalised cross spectral density and calculated in 3 different levels of

spectral amplitude regions

● for additive noise cSII == SII

Measures based in perceptual models:

Dau measure 4

● based on the Dau model for the effective processing in the human auditory system

● calculates time-frequency domain internal representation of x and y

● the internal representation considers:

○ filter banks

○ spectral and temporal masking

○ hair cell transformation

○ non linear adaptive -> realistic dynamic compression, temporal masking effects

● the measure is the average normalized linear correlation coefficient taken across overlapping

frames of the internal representation signals

Glimpse proportion 5

● based on the Glimpse model

● calculates time-frequency domain internal representation of x and the noise

● the internal representation considers:

○ gammaton filter banks

● the measure is the proportion of time-frequency bins where clean speech x has higher energy

levels than noise

HNS

● based on Dau model as well with an extra frequency weight in the output (higher weights to

higher frequencies)

PAR

● calculated in the frequency domain

● it is based on soft frequency masking thresholds

● designed for sinusoidal type of distortions (sinusoidal audio coders)

TAA

● based on spectro-temporal masking curves

● computational complexity as low as a spectral masking model

● PAR with parts of DAU (log instead of NL and hair cell transformation)

Non intrusive measures (don’t need a reference clean speech signal)

ITU-T P-563 6

● vocal tract analysis

● speech reconstruction from corrected vocal tract parameters -> reference signal

● parameter extraction and classification of degradations: low static SNR, mutes, low sSNR,

unnatural voice, unnatural male voice, unnatural female voice.

HMM based approach for speech synthesis

● measure is the normalized log likelihood of features extracted from the synthesised signal

and evaluated in a HMM trained with natural speech (models are gender dependent)

Resources for distance measures codes

1 – P. Loizou, Speech Enhancement: Theory and Practice. Boca Raton, CRC, 2007.

Matlab code that comes with the book

(LLR, IS, CEP, WSS, FWS, nFWS, PESQ)

2 – P. Loizou, COLEA: A Matlab software tool for speech analysis

Available to download: http://www.utdallas.edu/~loizou/speech/colea.htm

(LLR, IS, CEP, WSS, SNR)

3 – Implementation of the standard SII, Matlab and C codes

Available to download: http://www.sii.to/html/programs.html

(SII)

4 – Computational Auditory Signal Processing and Perception (CASP) model

Available to download upon request: http://www.dtu.dk/centre/cahr/English/downloads.aspx

(Dau model)

5 – Glimpse proportion measure

Ask Martin Cooke for code.

6 – Implementation of the standard ITU-T P-563

available to download: http://www.itu.int/rec/T-REC-P.563-200405-I

(P-563)

继续阅读