Image segmentation based on New Frontiers 110222 0800 Berkels

Image segmentation based onlearned discriminative dictionaries

Benjamin Berkels1

Martin Rumpf2, Marc Kotowski3, Carlo Schaller3

1Interdisciplinary Mathematics InstituteUniversity of South Carolina

2Institute for Numerical SimulationUniversity of Bonn

3Service de NeurochirurgieHopitaux Universitaires de Geneve

New Frontiers in Imaging and Sensingon February 17th – 22nd in Columbia

Learning reconstructive dictionaries

Sparse signal representations based on overcomplete dictionaries

Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.

signals and atoms are considered to be elements of RN

a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj

dictionaries are usually overcomplete, i. e. K > N

in imaging, signals are usually small patches of an image

Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.

Problem: How to design of suitable dictionaries?

minx∈RK

a set of atoms d1, . . . , dK is called dictionary

,represented by the matrix D ∈ RN×K whose j-th column is dj

minx∈RK

Designing overcomplete dictionaries

Two distinct design approaches

Use a predefined dictionary associated to a signal transform

Popular transforms:

the short-time Fourier transformthe wavelet transformthe curvelet transformthe contourlet transform. . .

Learn from input or representative training data

Popular algorithms:

the Method of Optimal Directions (MOD)the K-SVD algorithm. . .

Popular transforms:

Popular algorithms:

Popular transforms:

Popular algorithms:

A variational approach to dictionary design

Given: M patches y1, ..., yM ∈ RN

Task: Find a dictionary optimal to reconstruct these patches

A dictionary D is suitable to reconstruct a patch y, if

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N,

the sparse reconstruction error is small.

−→ Minimization problem:

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M.

Difficult, nonconvex problem → sophisticated algorithms needed

minx∈RK

minx1,...,xM∈RK

D∈RN×K

M∑l=1

minx∈RK

minx1,...,xM∈RK

D∈RN×K

M∑l=1

minx∈RK

minx1,...,xM∈RK

D∈RN×K

M∑l=1

The K-SVD algorithm

Strategy: [Aharon, Elad, Bruckstein TSP ’06]

Split minimization in two alternating stages

Sparse coding stage

For 1 ≤ l ≤M solve

minxl∈RK

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L

Codebook update stage

Update D and x1, . . . , xM with fixed sparsity pattern using

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2

Sparse coding is a well-known problem→ Use any pursuit algorithm to handle it, e. g. OMP

The K-SVD algorithm

Sparse coding stage

minxl∈RK

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2

The K-SVD algorithm

Sparse coding stage

minxl∈RK

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2

The K-SVD algorithm

Sparse coding stage

minxl∈RK

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2

Sparse coding is a well-known problem

→ Use any pursuit algorithm to handle it, e. g. OMP

The K-SVD algorithm

Sparse coding stage

minxl∈RK

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2

The K-SVD algorithm

For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}

determine the restricted error matrix Ej ∈ RN×|ωj |,

column corresponding to m ∈ ωj is

ym −Dxm + (xm)jdj

Note:∑l∈ωj

‖yl −Dxl‖2 =∥∥Ej − (xωl

)jdj∥∥2

calculate SVD of Ej , i. e. Ej = U∆V T

dj ← “first column of U”

((xm)j)m∈ωj ← “∆11∗ first column of V ”

The K-SVD algorithm

For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}determine the restricted error matrix Ej ∈ RN×|ωj |,

ym −Dxm + (xm)jdj

Note:∑l∈ωj

)jdj∥∥2

The K-SVD algorithm

ym −Dxm + (xm)jdj

Note:∑l∈ωj

)jdj∥∥2

The K-SVD algorithm

ym −Dxm + (xm)jdj

Note:∑l∈ωj

)jdj∥∥2

The K-SVD algorithm

ym −Dxm + (xm)jdj

Note:∑l∈ωj

)jdj∥∥2

The K-SVD algorithm

ym −Dxm + (xm)jdj

Note:∑l∈ωj

)jdj∥∥2

Reconstructive dictionary application example

Denoising

Dictionary based denoising of a very noisy input image

(input image courtesy of Douglas Blom)

Reconstructive dictionary application example

Denoising

Gaussian blur next to the dictionary based denoising

(input image courtesy of Douglas Blom)

Learning discriminative dictionaries

Discriminative dictionaries

From reconstructive to discriminative dictionaries

Notation

approximation residual

R(y,D, x) := ‖y −Dx‖2

coefficients of the sparse best approximation

x∗(y,D) := argminx∈RK ,‖x‖0≤L

R(y,D, x)

best approximation error

R(y,D) := R(y,D, x∗(y,D))

Recall: Reconstructive dictionary learning problem

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M

Reformulation of the reconstructive dictionary learning problem

minD∈RN×K

M∑l=1

R(yl, D)

Notation

R(y,D, x) := ‖y −Dx‖2

R(y,D, x)

R(y,D) := R(y,D, x∗(y,D))

minx1,...,xM∈RK

D∈RN×K

M∑l=1

minD∈RN×K

M∑l=1

R(yl, D)

Notation

R(y,D, x) := ‖y −Dx‖2

R(y,D, x)

R(y,D) := R(y,D, x∗(y,D))

minx1,...,xM∈RK

D∈RN×K

M∑l=1

minD∈RN×K

M∑l=1

R(yl, D)

Notation

R(y,D, x) := ‖y −Dx‖2

R(y,D, x)

R(y,D) := R(y,D, x∗(y,D))

minx1,...,xM∈RK

D∈RN×K

M∑l=1

minD∈RN×K

M∑l=1

R(yl, D)

Notation

R(y,D, x) := ‖y −Dx‖2

R(y,D, x)

R(y,D) := R(y,D, x∗(y,D))

minx1,...,xM∈RK

D∈RN×K

M∑l=1

minD∈RN×K

M∑l=1

R(yl, D)

Notation

R(y,D, x) := ‖y −Dx‖2

R(y,D, x)

R(y,D) := R(y,D, x∗(y,D))

minx1,...,xM∈RK

D∈RN×K

M∑l=1

minD∈RN×K

M∑l=1

R(yl, D)

A variational approach to discriminative dictionaries

Given: Patches y1, ..., yM1+M2 of two different classes P1 and P2

(Pi := {yl : l ∈ Si}, S1 = {1, . . . ,M1}, S2 = M1 + {1, . . . ,M2})Task: Find a dictionary pair to distinguish between P1 and P2

−→ Minimization problem: [Mairal et al. CVPR ’08]

minD1,D2

2∑i=1

∑l∈Si

[Cλ((−1)i(R(yl, D1)−R(yl, D2))

)+ λγR(yl, Di)]

where Cλ(s) = ln(1 + exp(−λs)) (logistic loss function)

−1 0 1

Influence of λ on Cλ: C5 (red), C10 (green) and C20 (blue)

−→ Minimization problem: [Mairal et al. CVPR ’08]

minD1,D2

2∑i=1

∑l∈Si

[Cλ((−1)i(R(yl, D1)−R(yl, D2))

)+ λγR(yl, Di)]

where Cλ(s) = ln(1 + exp(−λs)) (logistic loss function)

Cλ((−1)1(R(yl, D1)−R(yl, D2))

){≈ 0 R(yl, D1)� R(yl, D2)

� 0 R(yl, D1)� R(yl, D2)

Discriminative dictionary learning algorithm by Mairal et al.

Like K-SVD, split minimization in two alternating stages

Sparse coding stage

For 1 ≤ i ≤ 2 and 1 ≤ l ≤M := M1 +M2 use OMP to get

xil ≈ x∗(yl, Di) = argminx∈RK ,‖x‖0≤L

R(yl, Di, x)

For 1 ≤ i ≤ 2, update Di and xi1, . . . , xiM with fixed sparsity

pattern using a truncated Newton iteration(“truncated” → second derivatives of Cλ are neglected)

Sparse coding stage

R(yl, Di, x)

Sparse coding stage

R(yl, Di, x)

Sparse coding stage

R(yl, Di, x)

The remaining foils will be publishedlater

Image segmentation based on New Frontiers 110222 0800 Berkels

Documents

Transcript of Image segmentation based on New Frontiers 110222 0800 Berkels