close

Вход

Log in using OpenID

embedDownload
Reverberation Suppression based on sparse
linear prediction in noisy environments
Nicolás LOPEZ1,2 , Gaël RICHARD2 , Yves GRENIER2 , Ivan BOURMEYSTER1
1
2
Arkamys - Paris, France / Institut Mines-Télécom ; Télécom ParisTech ; CNRS LTCI - Paris, France
nlopez@arkamys.com
Proposed Approach
Reducing Complexity: Subband Gathering
A single channel late reverberation
and noise suppression method is presented:
x(t)
STFT
X
Estimate
Interferences
Xl
Y
inverse
STFT
Filter
y(t)
Z
• Subsample spectrogram X from K to
J K channels
X1
Xs 1
Lasso
α1
Xs k
Lasso
α2
Xs J
Lasso
αJ
D1
Define a J-segment partition P of [1, K]
Xk
Take average in each segment and obtain
subsampled spectrogram X s
Dk
Solve LASSO in each subsampled channel
Φ
• Late reverberation estimated using frequency domain linear prediction with
sparse constraints
• Background noise estimated when
speech is absent
• Blind processing is assumed
• Real time method
• Estimate late reverberation using dictionary (1)
and the J subsampled predictors, mapped to K
channels
XK
DK
P
Reducing Complexity: Block-wise processing
• For each subsampled frequency j, estimate one single predictor for N adjacent
frames:
Late Reverberation Estimation
Define an observation vector:
Vj,n = [
• Observation model:
Xk,n =
early
Xk,n
+
late
Xk,n
late
ˆ
Xk,n =
...
s
Xj,n−N
+1
T
V
αj ||2 s.t. ||αj ||1 ≤ λ
minimize ||Vj,n − Dj,n
]
αj
Define a block-based dictionary:
V
Dj,n
• Late reverberation model:
L−1
X
s
Xj,n
Find j th predictor and map to k channels:
= [ Vj,n−δ
Vj,n−δ−L+1 ] ∈ RN ×L
...
αk,i Xk,n−i−δ = Dk,n αk
Estimate late reverberation using dictionary (1) :
T
late
late
late
Vk,n = Xk,n . . . Xk,n−N +1
i=0
L: model order, δ: delay
• Estimation with the LASSO :
Evaluation
minimize ||Xk,n −Dk,n αk ||2 s.t. ||αk ||1 ≤ λ
λ
x(t)
αk
STFT
Sparse prediction vector:
|.|
X
Subband
Gathering
X
s
Observation
Vector
DV
Φ
Solve
LASSO
α
Estimate
Reverberation
T
αk = [αk,0 . . . αk,L−1 ]
D
...
Xk,n−δ−L+1 ] (1)
• Solution with Least Angle Regression
(LARS) algorithm
• Late reverberation psd :
late
late
ˆ late |2
R k,n
= β` R k,n−1
+ (1 − β` ) |X
k,n
• Use Voice Activity Detection with hard
threshold
• Update noise psd if speech is absent :
Zk,n = βZ Zk,n−1 + (1 − βZ )|Xk,n |2
• If reverberation is high, it is likely to
late
be estimated as noise ⇒ If Zk,n ≈ Rk,n
:
suppress reverberation only
• Filtering using the LSA estimator for
multiple interferences
Y
Filter
inverse
STFT
y(t)
Estimate
Noise
Speech Enhancement Task: RTF : 9.41% on SimData and 9.29% on RealData
CD
Baseline
DRVNR
DRV
NR
LLR
Background Noise Estimation
X late
Z
Signal-based dictionary :
Dk,n = [ Xk,n−δ
Φ
X
Baseline
DRVNR
DRV
NR
Room 1
Room 2
Room 3 Ave.
Near Far Near Far Near Far
1.99
2.67
3.88
4.45
2.67
3.03
4.21
4.82
4.63
4.32
4.65
4.41
5.21
4.87
5.22
5.35
4.38
4.14
4.61
4.86
4.96
4.63
5.07
5.64
3.97
3.94
4.61
4.92
Room 1
Room 2
Room 3 Ave.
Near Far Near Far Near Far
0.35
0.42
0.79
0.78
0.38
0.45
0.83
0.86
0.49
0.51
0.81
1.01
0.75
0.72
1.02
1.23
0.65
0.67
0.94
1.09
0.84
0.81
1.06
1.28
0.58
0.60
0.91
1.04
SimData
SRMR Room 1
Room 2
Room 3 Ave.
Near Far Near Far Near Far
RealData
Room 1 Ave.
Near Far
Baseline
DRVNR
DRV
NR
3.17
7.40
9.05
4.62
4.50
6.96
5.91
4.70
4.58
8.19
6.39
5.09
3.74
6.59
5.83
4.28
2.97
7.21
5.94
4.01
3.57
6.23
5.70
4.28
2.73
6.28
5.77
3.76
3.68
6.91
5.92
4.35
3.19
7.68
8.83
4.76
3.18
7.54
8.94
4.69
Room 1
Room 2
Room 3 Ave.
FWSNR Near Far Near Far Near Far
Baseline
DRVNR
DRV
NR
8.12
6.47
4.95
6.18
6.68
6.29
4.63
5.50
3.35
4.05
5.16
5.69
1.04
2.91
3.90
1.06
2.27
3.51
4.62
3.56
0.24
2.42
3.54
0.93
3.62
4.27
4.47
3.82
ASR Task: focus on DRVNR approach
WER
Room 1
Near Far
SimData
Room 2
Room 3
Near Far Near Far
Baseline 12.93 17.72 24.03 72.54 30.46 79.72
AMclean 17.54 22.42 24.04 45.60 30.78 56.92
AMmmc 19.13 21.42 21.00 29.89 24.45 35.24
Ave.
RealData
Room 1
Ave.
Near Far
39.53
32.87
25.35
83.16 84.48 83.81
74.58 71.71 73.14
52.06 51.08 51.57
• AMclean : Acoustic Model trained on clean data
• AMmmc : Acoustic Model trained on Multi-Condition data processed
with DRVNR approach.
• Good performance in far
field and in big rooms
• Over subtraction in small
rooms:
consequence of
blind processing
1/--pages
Пожаловаться на содержимое документа