close

Вход

Log in using OpenID

embedDownload
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING
1
Texture Retrieval via the Scattering Transform
arXiv:1501.02655v1 [cs.IR] 12 Jan 2015
Alexander Sagel, Dominik Meyer, Student Member, IEEE and Hao Shen, Member, IEEE
Abstract—This work studies the problem of content-based
image retrieval, specifically, texture retrieval. Our approach
employs a recently developed method, the so-called Scattering
transform, for the process of feature extraction in texture
retrieval. It shares a distinctive property of providing a robust
representation, which is stable with respect to spatial deformations. Recent work has demonstrated its capability for texture
classification, and hence as a promising candidate for the problem
of texture retrieval. Moreover, we adopt a common approach of
measuring the similarity of textures by comparing the subband
histograms of a filterbank transform via the Kullback-Leibler
divergence. Despite the popularity of describing histograms
using parametrized probability density functions, such as the
Generalized Gaussian Distribution (GGD), it is unfortunately
not applicable for describing most of the Scattering transform
subbands, due to the complex modulus performed on each one
of them. In this work, we propose to use the Weibull distribution
to model the Scattering subbands of descendant layers. Our
numerical experiments demonstrated the effectiveness of the
proposes approach, in comparison with several state of the arts.
Index Terms—Content-based image retrieval, feature extraction, Kullback-Leibler divergence, Scattering transform, similarity measure, Weibull distribution.
I. I NTRODUCTION
A. Overview
ONTENT-BASED image retrieval (CBIR) is a special
case of image classification. It can be viewed as the
process of assigning a query image to a set of image classes,
where each class represents a database image. Content-based
hereby refers to the mode of formulating the search query. As
opposed to metadata-based image search, which relies on prelabeling the images beforehand, a CBIR system retrieves the
n best matches within the database with respect to the visual
similarity to the query image.
In CBIR, the concepts of feature extraction (FE) and
similarity measure (SM) play crucial roles. FE converts an
image into a feature vector of numerical values with the aim to
produce a low-dimensional, yet sensible representation of the
input image in the context of some particular application. An
SM assigns a numerical value to a pair of two feature vectors.
In this work, we assume that the SM is nonnegative and that a
smaller SM value indicates a higher similarity and vice versa.
A typical CBIR system is depicted in Fig. 1. The images
to be extracted are processed via an FE algorithm and the
extracted feature vectors (signatures) are stored in a database.
C
The authors are with the Department of Electrical, Electronic and Computer
Engineering, Technische Universit¨at M¨unchen, M¨unchen, Germany. e-mail:
{a.sagel,dominik.meyer,hao.shen}@tum.de.
This work has partially been supported by the Cluster of Excellence CoTeSys – Cognition for Technical Systems, funded by the Deutsche
Forschungsgemeinschaft (DFG) and International Graduate School of Science
and Engineering (IGSSE), Technische Universit¨at M¨unchen.
Signature
database
Input
images
FE
DB
Output
images
Query
image
FE
Fig. 1.
SM
A typical architecture of content-based image retrival.
The query image is fed to the same FE algorithm and the
output is compared with all the feature vectors in the database
by means of the defined SM. As a result, the images with
the lowest SM values are returned. Consequently, designing
a CBIR system boils down to the construction of an FE/SM
framework.
Nowadays, CBIR is employed to deal with big databases
and vast amounts of data. Thus, in additional to high retrieval
efficiency, execution speed is of great concern. This requires
sufficient dimensionality reduction within the FE and a computationally efficient SM.
B. Related Work and Our Contribution
Extracting sensible information from images has been a
field of research in digital signal processing for roughly half a
century now. Naturally, the choice of the FE method depends
on the specific application. For instance, when comparing
images of landscapes, pictures taken in a forest will have
different color distributions from those taken in a desert, so an
FE based on the color properties of the image is a reasonable
approach. On the other hand, images of objects are strongly
characterized by shapes, so in this case the FE could rely on
extracted contours. This work focuses on CBIR for textures.
The earliest approaches for texture FE, such as the graylevel co-occurrence matrices (GLCM) [1] and the biologically
motivated proposed Tamura features [2] have been presented
in the 1970s and are often based on numerical measures for
how nearby pixels relate to each other.
More recent approaches include incorporating spatialfrequency representations of the images [3], such as the Fast
Wavelet Transform (FWT) [4], Gabor frames [5] or Curvelet
frames [6]. A conventional approach is designing the features
based on the energy of the respective transform coefficients,
but it can be beneficial to look at their statistical properties
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING
[7], [8]. Thus, FE approaches based on the histograms of
filterbank transforms have become more prominent, cf. [9]–
[14]. Techniques of this kind are referred to as Subband
Histogram methods in the work.
Linear filterbank transforms feed the input image to a bank
of frequency-selective filters, yielding a set of band-pass signals as a representation. One problem of constructing FE based
on such a decomposition is that the higher frequency subbands
are prone to deformations in the spatial domain. Thereby,
even slight deformations of an image will yield significantly
different transform coefficients. However, simply neglecting
the higher-frequency subbands is not desirable since they carry
important information about such distinctive features as edges
and corners. As a remedy, Mallat introduced the Scattering
transform [15], a non-linear signal representation involving
filterbanks and the modulus operation which transforms highfrequency components into low-pass representations. The Scattering transform appears to perform well in texture classification tasks [16]–[18]. The key contribution of this work is to
define a statistical model in order to design subband histogram
FE/SM for texture images based on their Scattering transform
coefficients.
The paper is organized as follows. Section II provides
necessary mathematical preliminaries of the problem of CBIR,
subband histogram methods, and the basics of the Scattering
transform. In Section III we develop a statistical framework
of CBIR based on the Scattering transform. In Section IV
several numerical experiments are presented to investigate the
performance of the proposed approach in comparison with the
state of the art methods.
II. M ATHEMATICAL P RELIMINARIES
2
The likelihood of a set of random realizations {x1 , ..., xN }
for a PDF p is defined as
L(x1 , ..., xN |p) =
N
Y
p(xi ),
(1)
i=1
and is a measure for the probability that {x1 , ..., xN } is
subjected to p. That is to say, L(x1 , ..., xN |pd ) can be viewed
as a measure of similarity between the query image and the
database images. Since the natural logarithm ln is a monotone
function, this is also true for the log-likelihood ln L.
Let A denote the set of the PDFs of a number of database
images. The best match for a query {x1 , ..., xN } can thus be
determined via
p∗d = arg max ln L(x1 , ..., xN |pd )
pd ∈A
= arg max
pd ∈A
N
X
(2)
ln pd (xi ).
i=1
Expression (2) is called a Maximum Likelihood (ML) solution.
The Kullback-Leibler-Divergence (KLD) between two
PDFs p and q with the support S is defined as
Z
p(x)
dx.
(3)
D(pkq) =
p(x) ln
q(x)
S
Suppose that {x1 , ..., xN } are subjected to the PDF pq . Then it
can be derived from the law of large numbers, that minimizing
for the KLD between pq and pd yields the same results
as maximizing for the (log-)likelihood of {x1 , ..., xN } for
pd , provided that N is sufficiently large. Thus, (2) can be
reformulated as
p∗d = arg min D(pq kpd ).
(4)
pd ∈A
A. Notations
Unless stated otherwise, a signal denotes an element of the
Lebesgue space L2 (R2 ). The terms almost everywhere and
almost all refer to conditions which hold for all R2 \N , where
N can be any null set. For the sake of simplicity, we refer to
L2 (R2 ) as a Hilbert space of functions, even though it would
be more precise to call them equivalence classes of functions
that are equal all most everywhere. Bold-faced lowercase
letters x or xi describe vectors, while regular lowercase letters
like x or xi describe scalar values. Depending on the context,
uppercase letters stand either for scalar values or for matrices.
For a complex value z, z¯ denotes its conjugate, and <(z), =(z)
the real and imaginary part, respectively. Angle brackets h·, ·i
denote the scalar product of L2 (R2 ) and k · k the canonical
norm induced by it. An asterisk denotes the convolution f ∗ g
of two signals f, g.
B. Subband Histogram Methods
The coefficients {x1 , ..., xN } produced by a transform of
the query image can be viewed as a set of realizations of a
random experiment. Let pd describe the probability density
function (PDF) of the transform coefficients of some database
image.
The authors of [9] propose features based on the statistical
moments of the coefficients generated by an FWT of a texture
image. To this end, each FWT subband is modeled by a twoparameter GGD. The FE consists of obtaining the two GGD
parameters for each subband. Since the KLD of two GGDs
can be directly computed from the two GGD parameters,
for every pair of images, the KLD for each subband can be
estimated directly from the feature vectors. Summing up all
the subband KLDs for a pair of images yields an SM. The
summation is justified by the fact that the KLD of two joint
distributions of statistically independent variables is the sum of
the KLDs belonging to the respective marginal distributions.
The described SM is hence the estimated KLD between joint
distributions for all the subbands produced by an image.
In ML based CBIR, the KLD was employed as a measure
of probability that a match is correct. However, modeling the
subband histograms with a parametrized PDF can be also
motivated bilogically. In fact, it has been observed [19] that the
gray-value distributions of the filterbank outputs of a texture
correspond to perceivable texture features. This gives rise to a
variety of similar FE/SM approaches. In particular, replacing
the FWT by any other filterbank transform might yield more
descriptive features. Furthermore, the KLD could be replaced
by any of the vast number [20] of SMs for PDFs. For instance,
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING
3
the symmetrized KLD between two distributions p and q is
defined as
D(qkp) + D(pkq)
,
2
and has the advantage that the symmetry property
Dsym (pkq) =
Dsym (pkq) = Dsym (qkp)
(5)
Sφ,ψ,J,R [f ; p] = φJ ∗ |ψjm ,Rm ∗ | · · · ∗ |ψj1 ,R1 ∗ f | · · · ||. (13)
Let us denote by PM the (ordered) set of all possible paths p
of size 0 to M . We can then define the Finite-path WST as
(6)
always holds.
C. The Scattering transform
1) Definition: Let θ ∈ L2 (R2 ) be a rotationally symmetric
window function with low-pass characteristics. Let η ∈ R2 \
{0} and J ∈ Z be fixed and R ⊂ SO(2) a finite group
|
of rotation matrices. With ψ(x) = θ(x)eiη x , we define the
wavelet ψj,R as
ψj,R (x) = 4−j ψ(2−j Rx), j ∈ {J, J − 1, ...}, R ∈ R. (7)
Further, let us assume a low-pass and rotationally symmetric
scaling function [21] φ ∈ L2 (R2 ) and define
φJ (x) = 4−J φ(2−J x),
2. The WST along the path p = ((j1 , R1 ), ..., (jm , Rm )) of
scaling factors and rotations is defined as
(8)
Sφ,ψ,J,R [f ; PM ] = {Sφ,ψ,J,R [f ; p]}p∈PM .
2) Properties: The output signals of the Finite-path WST
are the elements of (14), also referred to as WST subbands in
the following, which are all low-pass signals due to the filtering with φJ . With increasing M , Sφ,ψ,J,R [f ; PM ] captures
more and more energy of the input signal f and the WST
representation becomes more expressive as the maximum path
length grows. Let the Scattering norm be defined as
X
2
kSφ,ψ,J,R [f ; p]k2 .
(15)
kSφ,ψ,J,R [f ; PM ]kS =
p∈PM
As a consequence of the tight frame property (9) of D, the
infinite-path WST is unitary, i.e.
kSφ,ψ,J,R [f ; P∞ ]kS = lim kSφ,ψ,J,R [f ; PM ]kS
M →∞
= kf k.
ˆ φ,
ˆ
such that for the respective Fourier transforms ψ,
ˆ J ω)|2 +
|φ(2
J
X
X
ˆ J Rω)|2 = 1
|ψ(2
(9)
j=−∞ R∈R
holds, for almost all ω ∈ R2 . Then the set
D = {φ¯J (u − x)}u∈R2
∪ {ψ¯j,R (u − x)}u∈R2 ,j∈Z∧j≤J,R∈R
(10)
constitutes a Parseval-tight frame [21]. It will be assumed to
span L2 (R2 ) in the following. Note that for two signals f, g,
the equation
(f ∗ g)(u) = hf (x), g¯(u − x)i
(11)
holds.
At the core of the Windowed Scattering transform (WST)
[15] is a dyadic wavelet decomposition Uφ,ψ,J,R [f ; j, R] of the
input signal f ∈ L2 (R2 ) with the complex modulus performed
on the band-pass components, defined as
(
|ψj,R ∗ f |, j ≤ J,
Uφ,ψ,J,R [f ; j, R] =
(12)
φJ ∗ f,
j > J.
The modulus operation | · | traverses some of the energy of
the band-pass signals towards lower frequencies. Therefore,
Uφ,ψ,J,R can be applied to the output signals |ψj,R ∗ f | again.
This yields a tree structure like the one depicted in Fig. 2. Note
that (12) yields an infinite (but countable) number of output
signals. However, in practice we deal with bandlimited input
signals f . Hence, there exists an integer Jl with Jl < J, such
that Eq. (12) needs to be evaluated for j ≥ Jl only, which
corresponds to a tree with finite breadth.
Basically, the idea of the WST is to apply Uφ,ψ,J,R successively to the input signal and only keep the low-pass signals,
i.e. to neglect signals represented by the black nodes in Fig.
(14)
(16)
In practice, maximum path depths of M = 3 are expected
to capture all essential information [22], [23]. Actually reconstructing f from its WST involves a phase recovery problem.
However, it is known [24] that digital realizations of wavelet
representations similar to those employed in the WST are
uniquely determined by their complex modulus.
The WST is non-expansive [15], i.e. for two signals f1 , f2 ,
it can be shown [15] that for any positive integer M ,
kSφ,ψ,J,R [f1 ; PM ] − Sφ,ψ,J,R [f2 ; PM ]k2S
X
=
kSφ,ψ,J,R [f1 ; p] − Sφ,ψ,J,R [f2 ; p]k2
(17)
p∈PM
≤kf1 − f2 k2
holds, implying that the WST is robust with respect to additive
noise.
Since WST is a representation consisting solely of lowpass signals, it is stable with respect to small spatial translations and deformations. Specifically, let f˜ be a deformed
or translated version of f . Under certain mild assumptions
about the underlying wavelet frame, it has been proven that
the error kSφ,ψ,J,R [f ; PM ] − Sφ,ψ,J,R [f˜; PM ]k is bounded
asymptotically, cf. [15]. Note, that we are given some freedom
of choice in the wavelet frame which allows for considerable
flexibility in terms of parameters such as frequency selectivity
or directionality.
Due to its stability and flexibility, the WST is a convenient
signal representation and can be used as the foundation for the
FE from images.
3) Normalized WST: In order to reduce redundancy
and to increase invariance to distortions, the Normalized WST (NWST) [23] was introduced. Let us define
p0 to be the “empty” path corresponding to the layer
m = 0 and p˜ ∈ PM to be the predecessor of
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING
4
f
Uφ,ψ,J,R
Uφ,ψ,J,R
Uφ,ψ,J,R
Uφ,ψ,J,R
Uφ,ψ,J,R
Uφ,ψ,J,R
Uφ,ψ,J,R
Uφ,ψ,J,R
Uφ,ψ,J,R
Uφ,ψ,J,R
Uφ,ψ,J,R
Uφ,ψ,J,R
Uφ,ψ,J,R
Fig. 2. WST tree produced by successive application of Uφ,ψ,J,R on the input signal f . Lowpass signals (The WST subbands) are depicted as white nodes.
The complex moduli of band-pass signals are depicted as black nodes.
p ∈ PM , i.e. p = ((j1 , R1 ), ..., (jm , Rm )) implies p˜ =
((j1 , R1 ), ..., (jm−1 , Rm−1 )).
The NWST is defined as
(
φJ ∗ f
if p = p0 ,
¯
Sφ,ψ,J,R [f ; p] = Sφ,ψ,J,R [f ;p]
(18)
otherwise.
|Sφ,ψ,J,R [f ;p]|
˜
In words, each subband of the WST is normalized by the
respective parent subband. In practice, a small constant is
added to the denominator in order to avoid division by zero.
III. S TATISTICAL S CATTERING CBIR
A. Subband Modeling
We propose to model the gray-value distributions of the
different WST subbands with parametrized PDFs and describe
the images in terms of their respective parameters to obtain a
complete FE mechanism on top of the WST.
1) Root layer: We refer to the sets of WST subbands
belonging to the same path length as layers. The root layer,
i.e. the layer with m = 0 consists solely of the signal
Sφ,ψ,J,R [f ; p0 ] = φJ ∗ f.
(19)
The low-pass filter φJ serves as a blurring filter, implying that
φJ ∗ f is a low-resolution version of f . Hence, the gray-value
distribution of φJ ∗ f is expected to have a shape similar to
the one of f . Typically, histograms of natural images can be
modeled by multimodal PDFs. As a less elaborate alternative
which is widely used as a model for bandpass components,
cf. [25], we propose to utilize the GGD, i.e.
pGGD (x|α, β) =
β
β
e−(|x|/α) ,
2αΓ(1/β)
where Γ denotes the Gamma-Function,
Z ∞
Γ(t) =
xt−1 e−x dx,
(20)
(21)
0
and the parameter α ∈ R+ determines the scale of the
distribution while β ∈ R+ describes the shape. Despite the
simplicity of the GGD, its two parameters provide effective
measures of perceivable visual features such as contrast.
The GGD parameters α and β can be estimated as a
solution of an ML problem. Detailed derivation of a numerical
algorithm for estimating the parameters (α, β) is given in
Appendix A.
2) Descendant layer: The layers m ≥ 1 contain signals of
the form
Sφ,ψ,J,R [f ; p] =
(22)
φJ ∗ |ψjm ,Rm ∗ | · · · ∗ |ψj1 ,R1 ∗ f | · · · ||, p 6= p0 .
Clearly, the modulus eliminates all negative values which is
why a symmetric model such as the GGD is not appropriate
anymore. The histograms of signals from descendant layers
suggest a PDF that occupies only positive values and can describe a skew to the left, which makes the Weibull Distribution
(WD) a nearby choice. Fig. 3 exemplarily shows histograms of
WST subbands of different textures with a fitted WD curve.
Despite variations in skewness and spread, the WD fittings
describe the actual histograms fairly well. The PDF of the
WD is defined for x ≥ 0 as
k
pWD (x|λ, k) = λk · (λx)k−1 e−(λ·x) .
+
(23)
Analogous to the GGD, λ ∈ R is the scale parameter,
whereas k ∈ R+ determines the shape.
The above argument is purely empirical. The following
discussion aims to justify further the choice of the WD as
a model for WST subbands (22) in the layers m ≥ 1: Like
for the layer m = 0, we neglect the impact of the low-pass
filter φJ . The approximately analytical band-pass filters ψj,R
produce signals with similar zero-mean distributions in their
real and imaginary parts. Moreover, it is known that bandpass components of a natural image can be well modeled by
a GGD. It is therefore natural to assume the GGD with the
same parameters for the distribution of the real and imaginary
parts of the band-pass signals ψjm ,Rm ∗ | · · · ∗ |ψj1 ,R1 ∗ f | · · · |
involved in the WST. By neglecting statistical dependence between real and imaginary components, let us define a “complex
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING
bark2, Layer 2
bark2, Layer 1
10
B. Scattering Similarity
50
5
0.05
0.1 0.15 0.2
floor1, Layer 1
0
0
0.01 0.02 0.03 0.04 0.05
floor1, Layer 2
20
0.05 0.1 0.15 0.2
floor2, Layer 1
0.05
0.1
0.15
pebbles, Layer 1
0
0.01 0.02 0.03 0.04 0.05
floor2, Layer 2
5
0
0.01 0.02 0.03 0.04 0.05
pebbles, Layer 2
0.05 0.1 0.15 0.2 0.25
glass2, Layer 1
0
0.02 0.04 0.06
glass2, Layer 2
0.08
50
50
10
0
50
10
50
10
0
water, Layer 2
water, Layer 1
10
50
5
0
5
10
5
0
0.05 0.1 0.15 0.2
carpet1, Layer 1
0
0
0.05
0.02
0.04
0.06
carpet1, Layer 2
0.1 0.15 0.2
curdory, Layer 1
0
0.02
0.04
0.06
curdory, Layer 2
10
5
0
20
0.1
0.2
0.3
0
20
5
0
0.1
0.02 0.04 0.06 0.08
1) PDF Similarity with parametrized KLD: The extracted
GGD and WD parameters constitute a feature vector for each
image. Following the approach from [9], we utilize the KLD as
defined in (3) as a measure of similarity between two PDFs.
The equation (3) can be further simplified when evaluating
the KLD for two GGDs, p(x) = pGGD (x|α1 , β1 ), q(x) =
pGGD (x|α2 , β2 ), or two WDs, p(x) = pWD (x|λ1 , k1 ), q(x) =
pWD (x|λ2 , k2 )). With the application of some standard integration rules, the KLD for two GGDs amounts to
0.2
0.3
0
0.02 0.04 0.06 0.08
1
β1 α2 Γ(1/β2 )
−
β2 α1 Γ(1/β1 ) β1
(27)
β2
α1
Γ((β2 + 1)/β1 )
+
,
α2
Γ(1/β1 )
DGGD (α1 , β1 kα2 , β2 ) = ln
Fig. 3. Histograms (blue) and respective ML Weibull fittings (red) of WST
subbands from ayers m = 1 and m = 2 for different texture patches from
the UIUC texture database
GGD” with statistically independent real and imaginary parts
as
pCGGD (z|α, β) = pGGD (<(z)|α, β)pGGD (=(z)|α, β). (24)
In order to determine the PDF for the modulus x = |z| ≥ 0, we
need to fix |z| and integrate pCGGD over the circles centered
at 0 in the complex plane:
Z
pCGGD,abs (x|α, β) =
pCGGD (z|α, β)dz
|z|=x
=
β
αΓ(1/β)
2 Z
x
π
2
e−x
β
(25)
((cos ϕ)β +(sin ϕ)β )/αβ
dϕ
0
Evaluating the integral analytically is demanding, if not impossible. However, it is known that the modulus of a complex
variable with statistically independent and Gaussian distributed
real and imaginary parts abides the Rayleigh distribution,
√
which can be easily verified by substituting α = 2σ and
β = 2 into Eq. (25), i.e.
√
2
x 2
pCGGD,abs (x| 2σ, 2) = 2 ex /(2σ ) .
σ
(26)
The usage of the GGD for modeling FWT subbands was
motivated by the need to model histograms which are more
or less peaked in shape than it is the case for the Gaussian
distribution. For this purpose, the additional shape parameter β
was introduced. This corresponds to introducing an additional
shape parameter to the Rayleigh distribution. Altering the
peakedness of the real and imaginary part corresponds roughly
to altering the skewness of the respective modulus distribution.
From Eq. (25), we can observe that pCGGD,abs (0|α, β) = 0,
so a good alternative to pCGGD,abs for modeling the WST
subbands would be a two-parameter generalization of the
Rayleigh distribution which is controllable in its skewness
without changing its value at 0. For k > 1, these requirements
are fulfilled by the WD. Similarly, we employ a numerical approach to estimate the Weibull parameters via minimizing the
corresponding ML function. The derivation of the algorithm
can be found in Appendix B.
as indicated in [9]. With γ ≈ 0.57722 being the EulerMascheroni constant, it can be derived that the KLD between
two WDs is given by
!
k1 λk22
−1
DWD (λ1 , k1 kλ2 , k2 ) = ln
k2 λk11
k2 λ1
k2
(28)
+
Γ
+1
λ2
k1
γ
+ (k1 − k2 ) ln λ1 −
,
k1
see [26] for details.
2) KLD Sum: An SM for two images I1 , I2 can be defined
by summing up all the KLDs for the layer m = 0 (GGD)
and the remaining layers 1 ≤ m ≤ M (WD). To this
end let us assume an indexing i ∈ {0, ..., N } of all the
subbands of a WST with i = 0 denoting the subband at
the layer m = 0. Let us further denote the probability that
the gray value xi appears at a pixel in the subband i as
p1,i (xi ) for image I1 and as p2,i (xi ) for image I2 . The joint
probabilities that a combination (x1 , ..., xN )> of gray values
appears for the whole WST representation shall be given by
p1 (x0 , ..., xN ) and p2 (x0 , ..., xN ), respectively. Let us assume
statistical independence of the WST subbands, i.e.
pn (x0 , ..., xN ) =
N
Y
pn,i (xi ), n ∈ {1, 2}.
(29)
i=0
Then, the expression
DW ST (I1 kI2 ) = D(p1 kp2 )
= D(p1,0 kp2,0 ) +
N
X
D(p1,i kp2,i )
i=1
= DGGD (α1 , β1 kα2 , β2 )
+
N
X
(30)
DWD (λ1,i , k1,i kλ2,i , k2,i )
i=1
describes the KLD between p1 and p2 , where the last equation
of (30) expresses the KLD in terms of the GGD and WD parameters of the different subbands, extracted from the images
I1 , I2 .
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING
DWF+GGD
78.12%
C. Implementing the Framework
The two previous sections provide the theoretical basis for
building a complete CBIR based on WST Subband histograms.
In order to extract the features, the WST (Section II-C) is
performed on each image. Each subband is then subjected
to ML to extract the GGD parameters for layer m = 0 and
the WD parameters for all the other layers. If the number of
subbands is N , this amounts to feature vectors of size 2N . In
order to compare the query feature vector to the feature vectors
in the database, the parametrized KLDs are evaluated for each
subband via (27) and (28). Their sum (30) yields the SM. As
most of these steps can be performed independently of each
other, the whole system is well suited to be implemented in a
parallel manner, for example in applications for Big Data.
IV. N UMERICAL E XPERIMENTS
A. Experimental Settings
1) Implementation: All experiments were performed in a
Matlab 2014a environment. The code reproducing the key results is available online1 . In addition to our approach, the FWT
+ GGD method was implemented according to [9] with two
minor changes in order to further increase performance. First,
the low-pass components were included in the FE. Second,
the symmetrized version (5) of the KLD was employed. As a
further development of the FWT + GGD method, we implemented the Dual-Tree Complex Wavelet Transform (DT-CWT)
+ GGD approach which is almost identical to the former, with
the exception that instead of FWT subbands, the real parts
of the DT-CWT [27] subbands, obtained via the DT-CWT
Transform Pack2 , version 4.3, were fed to the ML estimator.
For our approach, the WST was performed by means of the
ScatNet3 toolbox, version 0.2. The Weibull parameters were
extracted via wblfit from the Statistical Toolbox. Likewise,
the similarity was measured via the symmetrized KLD.
2) Choice of parameters: Along with the regular WST, all
experiments were also conducted with its normalized version
as defined in (18). The directionality parameter, corresponding
to the cardinality of the rotation group R, is fixed to be L = 4.
All computations were performed with double precision and
the subbands of the WST are critically sampled according to
the Shannon-Nyquist sampling theorem. The bandwidth of the
low-pass filter φJ is chosen in a way such that it produces
signals of size 16 × 16 = 256, in accordance with [9]. The
WST is evaluated for the maximum path lengths M = 2 and
M = 3. In theory, a linear increase in M leads to an exponential increase in the number of subbands. However, since
the modulus operation hardly traverses any energy toward
higher frequencies, they become negligible with increasing
path length. Apart from the mentioned, default settings of
ScatNet were used. In particular, the Morlet Wavelet was used
as the band-pass filter ψ on each layer of the WST. For the
FWT + GGD experiments, the 4-tap orthogonal Daubechies
filter, cf. [21], was employed. The DT-CWT transform was
run with the options antonini and qshift_a. In both
1 https://www.ldv.ei.tum.de/uploads/media/texture
6
FWT+GΓD
78.40%
WD-HMM
80.05%
Rotated Wavelets
82.81%
TABLE II
P ERFORMANCE OF STATE OF THE ART METHODS ON DATABASE D1
cases, the size of the smallest subband was 16 × 16, which
corresponds to three levels of decomposition.
3) Data: The database D1 is the same as used in [9] and is
based on the VisTex database4 . It consists of 40 texture images
of the size 512 × 512, each divided into 16 non-overlapping
128 × 128 patches. This amounts to 640 image samples of
40 different textures depicted in Fig. 4. The database D2
was generated from images of the following subset of the
UIUC texture database5 : Bark1, Bark2, Wood2, Wood3, Water,
Marble, Floor1, Floor2, Pebbles, Wall, Brick1, Glass1, Glass2,
Carpet1, Carpet2, Wallpaper, Fur, Knit, Curdoroy, Plaid. Two
640 × 480 images from each class are used to create five
overlapping 256 × 256 patches which are then scaled down
to half the edge size. Hence, we get a database of 20 different
texture classes each containing ten 128 × 128 patches. Fig. 5
depicts one patch from each utilized image. All image patches
are normalized to zero mean and unit energy, in order to
avoid any bias caused by the overall lighting condition of each
original texture image. The set of all patches generated from
the same texture is considered a class. Its cardinality will be
denoted by c in the following. Consequently, c = 16 for D1
and c = 10 for D2. For each image patch, the c−1 most similar
patches were retrieved. The retrieval rate for each patch is
defined as the ratio of the number of retrieved patches from
the same class to c−1. The overall retrieval rate is the average
of the retrieval rates for all the images in the database.
B. Results
1) Overview: The retrieval rates are summarized in Table I.
As a benchmark, the results are compared to those produced
by two other subband histogram methods, FWT + GGD and
DT-CWT + GGD. All methods have in common that they
first obtain a statistical model for each texture image and then
measure the similarity by means of the KLD.
While WST + WD fails in comparison to DT-CWT + GGD
on Database D1, NWST + WD is able to outperform all of
the competing frameworks by 1.84% for M = 2 and 2.96%
for M = 3. Database D1 is widely used as a benchmark for
CBIR retrieval. In order to provide a sense for the state of the
art, Table II summarizes the results from recent publications
on comparable approaches: Discrete Wavelet Frames (DWF)
+ GGD [9], DWT + Generalized Gamma Distribution (GΓD)
[13], Wavelet Domain Hidden Markov Models (WD-HMM)
[10] and Rotated Complex Wavelets [28]. To the authors’
best knowledge, the best published result for this experiment
so far was produced by Rotated Complex Wavelets with a
retrieval rate of 82.81% which is still outperformed by 0.53%
by NWST + WD with M = 3.
retrieval scattering 14.zip
2 http://www-sigproc.eng.cam.ac.uk/Main/NGK
4 http://vismod.media.mit.edu/vismod/imagery/VisionTexture/distribution.html
3 http://www.di.ens.fr/data/software/scatnet/
5 http://www-cvr.ai.uiuc.edu/ponce
grp/data/
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING
7
Fig. 4. Database D1 from left to right and top to bottom, with row and column in brackets: Bark0 (1,1), Bark6 (1,2), Bark8 (1,3), Bark9 (1,4), Brick1 (1,5),
Brick4 (1,6), Brick5 (1,7), Buildings9 (1,8), Fabric0 (2,1), Fabric4 (2,2), Fabric7 (2,3), Fabric9 (2,4), Fabric11 (2,5), Fabric14 (2,6), Fabric15 (2,7), Fabric17
(2,8), Fabric18 (3,1), Flowers5 (3,2), Food0 (3,3), Food5 (3,4), Food 8 (3,5), Grass1 (3,6), Leaves8 (3,7), Leaves10 (3,8), Leaves11 (4,1), Leaves12 (4,2),
Leaves16 (4,3), Metal0 (4,4), Metal2 (4,5), Misc2 (4,6), Sand0 (4,7), Stone1 (4,8), Stone4 (5,1), Terrain10 (5,2), Tile1 (5,3), Tile4 (5,4), Tile7 (5,5), Water5
(5,6), Wood1 (5,7), and Wood2 (5,8).
Fig. 5. Database D2 (patches) from left to right and top to bottom: Bark1, Bark2, Wood2, Wood3, Water, Marble, Floor1, Floor2, Pebbles, Wall, Brick1,
Glass1, Glass2, Carpet1, Carpet2, Wallpaper, Fur, Knit, Curdoroy, Plaid.
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING
Database D1
Database D2
8
FWT + GGD
DT-CWT + GGD
WST + WD
M=2
WST + WD
M=3
NWST + WD
M=2
NWST + WD
M=3
77.40%
53.28%
80.38%
59.89%
78.93%
67.00%
77.92%
65.72%
82.22%
64.56%
83.34%
63.67%
TABLE I
P ERFORMANCE OF WST + WD AND NWST + WD IN COMPARISON WITH FWT + GGD AND DT-CWT + GGD ON DATABASES 1 AND 2
100
DT-CWT + GGD
WST + WD
NWST + WD
90
80
Retrieval rate (%)
70
60
50
40
30
20
10
0
0
5
10
15
20
25
30
35
40
Texture #
Fig. 6. Retrieval rates for each texture class of D1 individually. From left to right, with index in brackets: Bark0 (1), Bark6 (2), Bark8 (3), Bark9 (4), Brick1
(5), Brick4 (6), Brick5 (7), Buildings9 (8), Fabric0 (9), Fabric4 (10), Fabric7 (11), Fabric9 (12), Fabric11 (13), Fabric14 (14), Fabric15 (15), Fabric17 (16),
Fabric18 (17), Flowers5 (18), Food0 (19), Food5 (20), Food8 (21), Grass1 (22), Leaves8 (23), Leaves10 (24), Leaves11 (25), Leaves12 (26), Leaves16 (27),
Metal0 (28), Metal2 (29), Misc2 (30), Sand0 (31), Stone1 (32), Stone4 (33), Terrain10 (34), Tile1 (35), Tile4 (36), Tile7 (37), Water5 (38), Wood1 (39), and
Wood2 (40).
68
DT−CWT + GGD
WST + WD
NWST + WD
66
64
62
Retrieval rate (%)
Database D2 is considerably smaller than D1, but involves
more variation within the classes, for instance, in terms
of camera angle and deformation. Again, the WST greatly
improves the retrieval performance. However, this time the
regular WST along with shorter path lengths produces the best
results with a significant margin of 7.11% in comparison with
DT-CWT + GGD. Nevertheless, in both experiments, NWST
+ WD consistently outperforms comparable approaches not
involving Scattering.
2) Impact of Directionality: In general, natural textures
contain strongly directional features. Hence, it is often beneficial to incorporate a frame of higher directionality as it is
done for the Rotated Wavelets approach, for example. The DTCWT + GGD method which mostly draws its benefits from increased directional selectivity is consistently and significantly
outperformed by NWST + WD by at least 1.84% for D1
and 3.78% for D2, despite the same directionality properties.
That is to say, there are reasons to believe that the success of
the WST based methods is not exclusively due to increased
directionality.
3) Impact of Textural Structure: To further analyze the
capabilities of the involved methods, consider Fig. 6. It shows
the retrieval rates for each class in Database D1 produced by
DT-CWT + GGD as well as by WST + WD and NWST +
WD with M = 2. The results produced by FWT + GGD
were omitted in order to preserve comparability in terms of
60
58
56
54
52
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Standard deviation
Fig. 7. Retrieval rates for D2, blurred with a Gaussian filter of different bandwidths. The standard deviation refers to the respective Gaussian distribution.
A larger deviation is equivalent to a smaller bandwidth.
directionality characteristics. DT-CWT + GGD performed well
in comparison to the WST based techniques for the classes
Brick5, Buildings9, Metal0 and Wood2 and badly for Bark9,
Fabric4, Food0 and Leaves10. In general, it appears that DTCWT + GGD is most suitable to flat and (piecewise) smooth
surfaces, while the two methods involving Scattering seem
to be preferable for surfaces that are rather uneven, be it
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING
9
because they are rough or because they are bent or dented.
This correlates with the premise that Scattering coefficients
are less sensitive to spatial deformations, since on the one
hand, images of rough surfaces carry significant fractions of
high-frequency components and on the other hand spatial
deformation in the image is often caused by locally bending
or denting the depicted material. More specifically, NWST +
WD performed comparably well for the classes Bark0, Bark6,
Flowers5, Terrain10 and regular WST for the classes Bark8,
Fabric11, Grass1, Stone4. This suggests that WST + WD
works better on fine structures while NWST + WD yields
better results for coarse structures.
This assumption is corroborated by the following experiment. Fig. 7 depicts the retrieval rates for the Database D2
after blurring it with a Gaussian filter of different bandwidths. Unsurprisingly, the retrieval performance declines with
stronger blurring for all methods. However, it is evident that
the retrieval rate for WST + WD declines significantly faster
than the other two. This can be interpreted as a consequence of
the distortion invariance of normalized Scattering coefficients.
More importantly, this confirms that the regular WST method
relies on fine features since they are most heavily affected by
blurring. On the other hand, images of coarse patterns and flat
and piecewise smooth surfaces are less affected by blurring
such that the other two methods appear to be relatively robust.
A PPENDIX
A. Estimation of GGD parameters
In this section, we describe an optimization algorithm for
estimating the GGD parameters. Recall the log-likelihood of
a GGD p(x|α, β) for a set of realizations {x1 , . . . , xN } in the
following equation
V. C ONCLUSION AND O UTLOOK
Setting the right hand side of Eq. (33) be equal to 0 uniquely
determines
v
u
N
uβ X
β
∗
|xi |β
(35)
α = t
N i=1
In this work, we propose a subband histogram FE method
based on the statistics of the WST of a texture image. Our
derivation and analysis demonstrate how to extract statistical
features from a WST representation of a texture image by
means of matching a Weibull model via ML and how to
measure the similarity of the respective feature vectors via the
KLD. The proposed techniques come in handy when it comes
to reducing the enormous degree of redundancy introduced
by the WST since they represent each subband by as much
as two real numbered parameters. The experiments show that
the proposed method can outperform comparable algorithms
based on filter bank transforms in terms of retrieval accuracy
when designed as a CBIR system. Regular WST coefficients
seem to work better on fine structures and Normalized WST
coefficients on coarse structures.
Since approaches for rotation invariant Scattering representations are already available [16], [17], it appears feasible
to extend the presented ideas towards a rotation invariant
CBIR framework which would allow for more general problem
settings.
The summation (30) of the KLDs was partly motivated by
assuming statistical independence of the subbands. However,
neither does statistical independence hold for the WST subbands, nor for the subbands of other filterbank transforms, in
practice. Accounting for statistical dependence in the model
can lead to significant improvements in efficiency [10], [12].
Another approach could be based on recalling that analyzing
the subband histograms is also motivated biologically. Thus,
weight factors could be introduced to the summands of (30) in
order to account for the different significance of each subband.
fGGD (α, β) = ln LGGD (x1 , . . . , xN |α, β)
X
β
N (31)
1
|xi |
β
.
− lnΓ
−
=N ln
2α
β
α
i=1
Then the parameters (α∗ , β ∗ ) for approximating the distribution of the realizations can be identified by solving the
following maximization problem
(α∗ , β ∗ ) = argmax fGGD (α, β).
(32)
(α,β)
In this work, we employ a numerical optimization approach to
solve the maximization problem given in Eq. (32). We firstly
take the first derivative of fGGD as
N
X
N
∂fGGD
= α−β−1 β
|xi |β −
∂α
α
i=1
(33)
and
β
N ∂fGGD
N N Γ0 (1/β) X
|xi |
|xi |
=
+ 2
−
ln
. (34)
∂β
β
β Γ(1/β) i=1
α
α
for β > 0. Let us denote the Digamma function by Ψ(z) =
Γ0 (z)/Γ(z), similarly, setting the right hand side of Eq. (34)
to be equal to 0 leads to
β
N β X |xi |
|xi |
Ψ(1/β)
−
ln
= 0.
(36)
1+
β
N i=1
α
α
By substituting Eq. (35) into Eq. (36), we have
N
N
P
β P
|xi |β
|xi |β ln |xi | ln N
Ψ(1/β) i=1
i=1
1+
−
+
= 0, (37)
N
P
β
β
|xi |β
i=1
which characterizes the critical points of the function fGGD
in terms of β for α = α∗ . Iterative numerical algorithms,
such as the Newton-Raphson method, can be used to solve
the above equation for β, cf. [9]. Due to the summand
−N ln Γ (1/β), however, we can not guarantee concavity of
fGGD . Nevertheless, a choice of good initial point β0 , as
proposed in [9], [25], can increase the probability of finding
the global maximum of the function fGGD , i.e. via solving
the following equation for β0 via interpolation,
N
2
P
N
|xi |
2
(Γ(2/β0 ))
i=1
=
.
(38)
N
P
Γ(1/β0 )Γ(3/β0 )
2
|xi |
i=1
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING
10
Once the solution β ∗ is found, it can be substituted into
Eq. (35) for α∗ .
B. Estimation of Weibull parameters
In this section, we employ a numerical optimization approach to estimate the Weibull parameter. The log-likelihood
amounts to
fW D (k, λ) = ln LW D (x1 , . . . , xN |k, λ)
= N k ln λ + N ln k
+ (k − 1)
N
X
ln xi −
i=1
∗
N
X
(λxi )k .
(39)
i=1
∗
The optimal parameters (k , λ ) are identified by solving
(k ∗ , λ∗ ) = argmax fW D (k, λ),
(40)
(k,λ)
by employing a similar technique as demonstrated in Appendix A. The first derivatives of fW D with respect to k and
λ are computed as
!
N
N
X
N X
∂fW D
k
k
=
+
ln xi + N − λ
xi ln λ
∂k
k
i=1
i=1
(41)
N
X
k
k
−λ
(ln xi )xi
i=1
and
∂fW D
=k
∂λ
N
X
N
− λk−1
xki
λ
i=1
!
.
(42)
Setting Eq. (42) to be equal to 0 uniquely determines λ∗ for
k > 0. Solving for λ yields
!− k1
N
X
1
λ∗ =
xk
.
(43)
N i=1 i
Substituting (43) in Eq. (41) and equating it with 0 characterize
the critical points of fW D in terms of k for λ = λ∗ , i.e.
PN
N
N i=1 (ln xi )xki
N X
+
ln xi −
= 0.
(44)
PN k
k
i=1 xi
i=1
Since the function fW D as defined in Eq. (39) is concave with
respect to k > 0, a numerical algorithm such as the NewtonRaphson method can be used to obtain k ∗ . Substituting this
solution into Eq. (43) yields λ∗ .
R EFERENCES
[1] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for
image classification,” Systems, Man and Cybernetics, IEEE Transactions
on, no. 6, pp. 610–621, 1973.
[2] H. Tamura, S. Mori, and T. Yamawaki, “Textural features corresponding
to visual perception,” Systems, Man and Cybernetics, IEEE Transactions
on, vol. 8, no. 6, pp. 460–473, 1978.
[3] T. Randen and J. Husoy, “Filtering for texture classification: a comparative study,” Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 21, no. 4, pp. 291–310, Apr 1999.
[4] J. Smith and S.-F. Chang, “Transform features for texture classification
and discrimination in large image databases,” in Image Processing, 1994.
Proceedings. ICIP-94., IEEE International Conference, vol. 3, Nov 1994,
pp. 407–411 vol.3.
[5] B. S. Manjunath and W.-Y. Ma, “Texture features for browsing and
retrieval of image data,” Pattern Analysis and Machine Intelligence,
IEEE Transactions on, vol. 18, no. 8, pp. 837–842, 1996.
[6] I. Sumana, M. Islam, D. Zhang, and G. Lu, “Content based image
retrieval using curvelet transform,” in Multimedia Signal Processing,
2008 IEEE 10th Workshop on, Oct 2008, pp. 11–16.
[7] G. Van de Wouwer, P. Scheunders, and D. Van Dyck, “Statistical
texture characterization from discrete wavelet representations,” Image
Processing, IEEE Transactions on, vol. 8, no. 4, pp. 592–598, 1999.
[8] N. Vasconcelos, “On the efficient evaluation of probabilistic similarity
functions for image retrieval,” Information Theory, IEEE Transactions
on, vol. 50, no. 7, pp. 1482–1496, July 2004.
[9] M. N. Do and M. Vetterli, “Wavelet-based texture retrieval using
generalized gaussian density and kullback-leibler distance,” IEEE Trans.
Image Processing, vol. 11, no. 2, pp. 146–158, 2002.
[10] ——, “Rotation invariant texture characterization and retrieval using
steerable wavelet-domain hidden markov models,” Multimedia, IEEE
Transactions on, vol. 4, no. 4, pp. 517–527, 2002.
[11] D.-Y. Po and M. Do, “Directional multiscale modeling of images using
the contourlet transform,” in Statistical Signal Processing, 2003 IEEE
Workshop on, Sept 2003, pp. 262–265.
[12] G. Tzagkarakis, B. Beferull-Lozano, and P. Tsakalides, “Rotationinvariant texture retrieval with gaussianized steerable pyramids,” Image
Processing, IEEE Transactions on, vol. 15, no. 9, pp. 2702–2718, Sept
2006.
[13] S. Choy and C. Tong, “Statistical wavelet subband characterization based
on generalized gamma density and its application in texture retrieval,”
Image Processing, IEEE Transactions on, vol. 19, no. 2, pp. 281–289,
Feb 2010.
[14] M. Allili, “Wavelet modeling using finite mixtures of generalized gaussian distributions: Application to texture discrimination and retrieval,”
Image Processing, IEEE Transactions on, vol. 21, no. 4, pp. 1452–1464,
April 2012.
[15] S. Mallat, “Group invariant scattering,” Communications on Pure and
Applied Mathematics, vol. 65, no. 10, pp. 1331–1398, 2012.
[16] L. Sifre and S. Mallat, “Combined scattering for rotation invariant texture analysis,” in European Symposium on Artificial Neural Networks,
2012.
[17] ——, “Rotation, scaling and deformation invariant scattering for texture
discrimination,” in Computer Vision and Pattern Recognition (CVPR),
2013 IEEE Conference on. IEEE, 2013, pp. 1233–1240.
[18] J. Bruna, “Scattering representations for recognition,” Ph.D. dissertation,
CMAP, Ecole Polytechnique, 2012.
[19] J. R. Bergen and M. S. Landy., “Computational modeling of visual
texture segregation,” in Computational Models of Visual Processing, ser.
A Bradford book, M. Landy and J. Movshon, Eds. MIT Press, 1991.
[20] S.-H. Cha, “Comprehensive survey on distance/similarity measures
between probability density functions,” International Journal of
Mathematical Models and Methods in Applied Sciences, vol. 1, no. 4,
pp. 300–307, 2007.
[21] S. Mallat, A Wavelet Tour of Signal Processing - The Sparse Way.
Elsevier Science, 2008, pp. 89–376.
[22] J. Bruna and S. Mallat, “Invariant scattering convolution networks,”
Pattern Analysis and Machine Intelligence, IEEE Transactions on,
vol. 35, no. 8, pp. 1872–1886, 2013.
[23] J. And`en and S. Mallat, “Deep scattering spectrum,” IEEE Transactions
on Signal Processing, 2013.
[24] I. Waldspurger, A. d’Aspremont, and S. Mallat, “Phase recovery, maxcut
and complex semidefinite programming,” Mathematical Programming,
pp. 1–35, 2012.
[25] S. Mallat, “A theory for multiresolution signal decomposition: the
wavelet representation,” Pattern Analysis and Machine Intelligence,
IEEE Transactions on, vol. 11, no. 7, pp. 674–693, Jul 1989.
[26] C. Bauckhage, “Computing the Kullback-Leibler Divergence between
two Weibull Distributions,” ArXiv e-prints, 2013.
[27] I. W. Selesnick, R. G. Baraniuk, and N. C. Kingsbury, “The dual-tree
complex wavelet transform,” Signal Processing Magazine, IEEE, vol. 22,
no. 6, pp. 123–151, 2005.
[28] M. Kokare, P. K. Biswas, and B. N. Chatterji, “Texture image retrieval using new rotated complex wavelet filters,” Systems, Man, and
Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 35, no. 6,
pp. 1168–1178, 2005.
1/--pages
Пожаловаться на содержимое документа