Image segmentation based on New Frontiers 110222 0800 Berkels

45
Image segmentation based on learned discriminative dictionaries Benjamin Berkels 1 Martin Rumpf 2 , Marc Kotowski 3 , Carlo Schaller 3 1 Interdisciplinary Mathematics Institute University of South Carolina 2 Institute for Numerical Simulation University of Bonn 3 Service de Neurochirurgie Hˆopitaux Universitaires de Gen` eve New Frontiers in Imaging and Sensing on February 17th – 22nd in Columbia

Transcript of Image segmentation based on New Frontiers 110222 0800 Berkels

Image segmentation based onlearned discriminative dictionaries

Benjamin Berkels1

Martin Rumpf2, Marc Kotowski3, Carlo Schaller3

1Interdisciplinary Mathematics InstituteUniversity of South Carolina

2Institute for Numerical SimulationUniversity of Bonn

3Service de NeurochirurgieHopitaux Universitaires de Geneve

New Frontiers in Imaging and Sensingon February 17th – 22nd in Columbia

Learning reconstructive dictionaries

Sparse signal representations based on overcomplete dictionaries

Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.

signals and atoms are considered to be elements of RN

a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj

dictionaries are usually overcomplete, i. e. K > N

in imaging, signals are usually small patches of an image

Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.

Problem: How to design of suitable dictionaries?

Learning reconstructive dictionaries

Sparse signal representations based on overcomplete dictionaries

Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.

signals and atoms are considered to be elements of RN

a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj

dictionaries are usually overcomplete, i. e. K > N

in imaging, signals are usually small patches of an image

Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.

Problem: How to design of suitable dictionaries?

Learning reconstructive dictionaries

Sparse signal representations based on overcomplete dictionaries

Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.

signals and atoms are considered to be elements of RN

a set of atoms d1, . . . , dK is called dictionary

,represented by the matrix D ∈ RN×K whose j-th column is dj

dictionaries are usually overcomplete, i. e. K > N

in imaging, signals are usually small patches of an image

Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.

Problem: How to design of suitable dictionaries?

Learning reconstructive dictionaries

Sparse signal representations based on overcomplete dictionaries

Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.

signals and atoms are considered to be elements of RN

a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj

dictionaries are usually overcomplete, i. e. K > N

in imaging, signals are usually small patches of an image

Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.

Problem: How to design of suitable dictionaries?

Learning reconstructive dictionaries

Sparse signal representations based on overcomplete dictionaries

Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.

signals and atoms are considered to be elements of RN

a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj

dictionaries are usually overcomplete, i. e. K > N

in imaging, signals are usually small patches of an image

Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.

Problem: How to design of suitable dictionaries?

Learning reconstructive dictionaries

Sparse signal representations based on overcomplete dictionaries

Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.

signals and atoms are considered to be elements of RN

a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj

dictionaries are usually overcomplete, i. e. K > N

in imaging, signals are usually small patches of an image

Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.

Problem: How to design of suitable dictionaries?

Learning reconstructive dictionaries

Sparse signal representations based on overcomplete dictionaries

Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.

signals and atoms are considered to be elements of RN

a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj

dictionaries are usually overcomplete, i. e. K > N

in imaging, signals are usually small patches of an image

Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.

Problem: How to design of suitable dictionaries?

Learning reconstructive dictionaries

Sparse signal representations based on overcomplete dictionaries

Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.

signals and atoms are considered to be elements of RN

a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj

dictionaries are usually overcomplete, i. e. K > N

in imaging, signals are usually small patches of an image

Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.

Problem: How to design of suitable dictionaries?

Learning reconstructive dictionaries

Designing overcomplete dictionaries

Two distinct design approaches

Use a predefined dictionary associated to a signal transform

Popular transforms:

the short-time Fourier transformthe wavelet transformthe curvelet transformthe contourlet transform. . .

Learn from input or representative training data

Popular algorithms:

the Method of Optimal Directions (MOD)the K-SVD algorithm. . .

Learning reconstructive dictionaries

Designing overcomplete dictionaries

Two distinct design approaches

Use a predefined dictionary associated to a signal transform

Popular transforms:

the short-time Fourier transformthe wavelet transformthe curvelet transformthe contourlet transform. . .

Learn from input or representative training data

Popular algorithms:

the Method of Optimal Directions (MOD)the K-SVD algorithm. . .

Learning reconstructive dictionaries

Designing overcomplete dictionaries

Two distinct design approaches

Use a predefined dictionary associated to a signal transform

Popular transforms:

the short-time Fourier transformthe wavelet transformthe curvelet transformthe contourlet transform. . .

Learn from input or representative training data

Popular algorithms:

the Method of Optimal Directions (MOD)the K-SVD algorithm. . .

Learning reconstructive dictionaries

A variational approach to dictionary design

Given: M patches y1, ..., yM ∈ RN

Task: Find a dictionary optimal to reconstruct these patches

A dictionary D is suitable to reconstruct a patch y, if

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N,

the sparse reconstruction error is small.

−→ Minimization problem:

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M.

Difficult, nonconvex problem → sophisticated algorithms needed

Learning reconstructive dictionaries

A variational approach to dictionary design

Given: M patches y1, ..., yM ∈ RN

Task: Find a dictionary optimal to reconstruct these patches

A dictionary D is suitable to reconstruct a patch y, if

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N,

the sparse reconstruction error is small.

−→ Minimization problem:

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M.

Difficult, nonconvex problem → sophisticated algorithms needed

Learning reconstructive dictionaries

A variational approach to dictionary design

Given: M patches y1, ..., yM ∈ RN

Task: Find a dictionary optimal to reconstruct these patches

A dictionary D is suitable to reconstruct a patch y, if

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N,

the sparse reconstruction error is small.

−→ Minimization problem:

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M.

Difficult, nonconvex problem → sophisticated algorithms needed

Learning reconstructive dictionaries

A variational approach to dictionary design

Given: M patches y1, ..., yM ∈ RN

Task: Find a dictionary optimal to reconstruct these patches

A dictionary D is suitable to reconstruct a patch y, if

minx∈RK

‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N,

the sparse reconstruction error is small.

−→ Minimization problem:

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M.

Difficult, nonconvex problem → sophisticated algorithms needed

Learning reconstructive dictionaries

The K-SVD algorithm

Strategy: [Aharon, Elad, Bruckstein TSP ’06]

Split minimization in two alternating stages

Sparse coding stage

For 1 ≤ l ≤M solve

minxl∈RK

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L

Codebook update stage

Update D and x1, . . . , xM with fixed sparsity pattern using

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2

Sparse coding is a well-known problem→ Use any pursuit algorithm to handle it, e. g. OMP

Learning reconstructive dictionaries

The K-SVD algorithm

Strategy: [Aharon, Elad, Bruckstein TSP ’06]

Split minimization in two alternating stages

Sparse coding stage

For 1 ≤ l ≤M solve

minxl∈RK

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L

Codebook update stage

Update D and x1, . . . , xM with fixed sparsity pattern using

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2

Sparse coding is a well-known problem→ Use any pursuit algorithm to handle it, e. g. OMP

Learning reconstructive dictionaries

The K-SVD algorithm

Strategy: [Aharon, Elad, Bruckstein TSP ’06]

Split minimization in two alternating stages

Sparse coding stage

For 1 ≤ l ≤M solve

minxl∈RK

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L

Codebook update stage

Update D and x1, . . . , xM with fixed sparsity pattern using

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2

Sparse coding is a well-known problem→ Use any pursuit algorithm to handle it, e. g. OMP

Learning reconstructive dictionaries

The K-SVD algorithm

Strategy: [Aharon, Elad, Bruckstein TSP ’06]

Split minimization in two alternating stages

Sparse coding stage

For 1 ≤ l ≤M solve

minxl∈RK

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L

Codebook update stage

Update D and x1, . . . , xM with fixed sparsity pattern using

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2

Sparse coding is a well-known problem

→ Use any pursuit algorithm to handle it, e. g. OMP

Learning reconstructive dictionaries

The K-SVD algorithm

Strategy: [Aharon, Elad, Bruckstein TSP ’06]

Split minimization in two alternating stages

Sparse coding stage

For 1 ≤ l ≤M solve

minxl∈RK

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L

Codebook update stage

Update D and x1, . . . , xM with fixed sparsity pattern using

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2

Sparse coding is a well-known problem→ Use any pursuit algorithm to handle it, e. g. OMP

Learning reconstructive dictionaries

The K-SVD algorithm

Codebook update stage

For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}

determine the restricted error matrix Ej ∈ RN×|ωj |,

column corresponding to m ∈ ωj is

ym −Dxm + (xm)jdj

Note:∑l∈ωj

‖yl −Dxl‖2 =∥∥Ej − (xωl

)jdj∥∥2

calculate SVD of Ej , i. e. Ej = U∆V T

dj ← “first column of U”

((xm)j)m∈ωj ← “∆11∗ first column of V ”

Learning reconstructive dictionaries

The K-SVD algorithm

Codebook update stage

For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}determine the restricted error matrix Ej ∈ RN×|ωj |,

column corresponding to m ∈ ωj is

ym −Dxm + (xm)jdj

Note:∑l∈ωj

‖yl −Dxl‖2 =∥∥Ej − (xωl

)jdj∥∥2

calculate SVD of Ej , i. e. Ej = U∆V T

dj ← “first column of U”

((xm)j)m∈ωj ← “∆11∗ first column of V ”

Learning reconstructive dictionaries

The K-SVD algorithm

Codebook update stage

For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}determine the restricted error matrix Ej ∈ RN×|ωj |,

column corresponding to m ∈ ωj is

ym −Dxm + (xm)jdj

Note:∑l∈ωj

‖yl −Dxl‖2 =∥∥Ej − (xωl

)jdj∥∥2

calculate SVD of Ej , i. e. Ej = U∆V T

dj ← “first column of U”

((xm)j)m∈ωj ← “∆11∗ first column of V ”

Learning reconstructive dictionaries

The K-SVD algorithm

Codebook update stage

For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}determine the restricted error matrix Ej ∈ RN×|ωj |,

column corresponding to m ∈ ωj is

ym −Dxm + (xm)jdj

Note:∑l∈ωj

‖yl −Dxl‖2 =∥∥Ej − (xωl

)jdj∥∥2

calculate SVD of Ej , i. e. Ej = U∆V T

dj ← “first column of U”

((xm)j)m∈ωj ← “∆11∗ first column of V ”

Learning reconstructive dictionaries

The K-SVD algorithm

Codebook update stage

For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}determine the restricted error matrix Ej ∈ RN×|ωj |,

column corresponding to m ∈ ωj is

ym −Dxm + (xm)jdj

Note:∑l∈ωj

‖yl −Dxl‖2 =∥∥Ej − (xωl

)jdj∥∥2

calculate SVD of Ej , i. e. Ej = U∆V T

dj ← “first column of U”

((xm)j)m∈ωj ← “∆11∗ first column of V ”

Learning reconstructive dictionaries

The K-SVD algorithm

Codebook update stage

For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}determine the restricted error matrix Ej ∈ RN×|ωj |,

column corresponding to m ∈ ωj is

ym −Dxm + (xm)jdj

Note:∑l∈ωj

‖yl −Dxl‖2 =∥∥Ej − (xωl

)jdj∥∥2

calculate SVD of Ej , i. e. Ej = U∆V T

dj ← “first column of U”

((xm)j)m∈ωj ← “∆11∗ first column of V ”

Learning reconstructive dictionaries

Reconstructive dictionary application example

Denoising

Dictionary based denoising of a very noisy input image

(input image courtesy of Douglas Blom)

Learning reconstructive dictionaries

Reconstructive dictionary application example

Denoising

Gaussian blur next to the dictionary based denoising

(input image courtesy of Douglas Blom)

Learning discriminative dictionaries

Discriminative dictionaries

Learning discriminative dictionaries

From reconstructive to discriminative dictionaries

Notation

approximation residual

R(y,D, x) := ‖y −Dx‖2

coefficients of the sparse best approximation

x∗(y,D) := argminx∈RK ,‖x‖0≤L

R(y,D, x)

best approximation error

R(y,D) := R(y,D, x∗(y,D))

Recall: Reconstructive dictionary learning problem

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M

Reformulation of the reconstructive dictionary learning problem

minD∈RN×K

M∑l=1

R(yl, D)

Learning discriminative dictionaries

From reconstructive to discriminative dictionaries

Notation

approximation residual

R(y,D, x) := ‖y −Dx‖2

coefficients of the sparse best approximation

x∗(y,D) := argminx∈RK ,‖x‖0≤L

R(y,D, x)

best approximation error

R(y,D) := R(y,D, x∗(y,D))

Recall: Reconstructive dictionary learning problem

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M

Reformulation of the reconstructive dictionary learning problem

minD∈RN×K

M∑l=1

R(yl, D)

Learning discriminative dictionaries

From reconstructive to discriminative dictionaries

Notation

approximation residual

R(y,D, x) := ‖y −Dx‖2

coefficients of the sparse best approximation

x∗(y,D) := argminx∈RK ,‖x‖0≤L

R(y,D, x)

best approximation error

R(y,D) := R(y,D, x∗(y,D))

Recall: Reconstructive dictionary learning problem

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M

Reformulation of the reconstructive dictionary learning problem

minD∈RN×K

M∑l=1

R(yl, D)

Learning discriminative dictionaries

From reconstructive to discriminative dictionaries

Notation

approximation residual

R(y,D, x) := ‖y −Dx‖2

coefficients of the sparse best approximation

x∗(y,D) := argminx∈RK ,‖x‖0≤L

R(y,D, x)

best approximation error

R(y,D) := R(y,D, x∗(y,D))

Recall: Reconstructive dictionary learning problem

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M

Reformulation of the reconstructive dictionary learning problem

minD∈RN×K

M∑l=1

R(yl, D)

Learning discriminative dictionaries

From reconstructive to discriminative dictionaries

Notation

approximation residual

R(y,D, x) := ‖y −Dx‖2

coefficients of the sparse best approximation

x∗(y,D) := argminx∈RK ,‖x‖0≤L

R(y,D, x)

best approximation error

R(y,D) := R(y,D, x∗(y,D))

Recall: Reconstructive dictionary learning problem

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M

Reformulation of the reconstructive dictionary learning problem

minD∈RN×K

M∑l=1

R(yl, D)

Learning discriminative dictionaries

From reconstructive to discriminative dictionaries

Notation

approximation residual

R(y,D, x) := ‖y −Dx‖2

coefficients of the sparse best approximation

x∗(y,D) := argminx∈RK ,‖x‖0≤L

R(y,D, x)

best approximation error

R(y,D) := R(y,D, x∗(y,D))

Recall: Reconstructive dictionary learning problem

minx1,...,xM∈RK

D∈RN×K

M∑l=1

‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M

Reformulation of the reconstructive dictionary learning problem

minD∈RN×K

M∑l=1

R(yl, D)

Learning discriminative dictionaries

A variational approach to discriminative dictionaries

Given: Patches y1, ..., yM1+M2 of two different classes P1 and P2

(Pi := {yl : l ∈ Si}, S1 = {1, . . . ,M1}, S2 = M1 + {1, . . . ,M2})Task: Find a dictionary pair to distinguish between P1 and P2

Learning discriminative dictionaries

A variational approach to discriminative dictionaries

Given: Patches y1, ..., yM1+M2 of two different classes P1 and P2

(Pi := {yl : l ∈ Si}, S1 = {1, . . . ,M1}, S2 = M1 + {1, . . . ,M2})Task: Find a dictionary pair to distinguish between P1 and P2

−→ Minimization problem: [Mairal et al. CVPR ’08]

minD1,D2

2∑i=1

1

Mi

∑l∈Si

[Cλ((−1)i(R(yl, D1)−R(yl, D2))

)+ λγR(yl, Di)]

where Cλ(s) = ln(1 + exp(−λs)) (logistic loss function)

Learning discriminative dictionaries

A variational approach to discriminative dictionaries

0

5

10

−1 0 1

Influence of λ on Cλ: C5 (red), C10 (green) and C20 (blue)

Learning discriminative dictionaries

A variational approach to discriminative dictionaries

Given: Patches y1, ..., yM1+M2 of two different classes P1 and P2

(Pi := {yl : l ∈ Si}, S1 = {1, . . . ,M1}, S2 = M1 + {1, . . . ,M2})Task: Find a dictionary pair to distinguish between P1 and P2

−→ Minimization problem: [Mairal et al. CVPR ’08]

minD1,D2

2∑i=1

1

Mi

∑l∈Si

[Cλ((−1)i(R(yl, D1)−R(yl, D2))

)+ λγR(yl, Di)]

where Cλ(s) = ln(1 + exp(−λs)) (logistic loss function)

Note:

Cλ((−1)1(R(yl, D1)−R(yl, D2))

){≈ 0 R(yl, D1)� R(yl, D2)

� 0 R(yl, D1)� R(yl, D2)

Learning discriminative dictionaries

Discriminative dictionary learning algorithm by Mairal et al.

Like K-SVD, split minimization in two alternating stages

Sparse coding stage

For 1 ≤ i ≤ 2 and 1 ≤ l ≤M := M1 +M2 use OMP to get

xil ≈ x∗(yl, Di) = argminx∈RK ,‖x‖0≤L

R(yl, Di, x)

Codebook update stage

For 1 ≤ i ≤ 2, update Di and xi1, . . . , xiM with fixed sparsity

pattern using a truncated Newton iteration(“truncated” → second derivatives of Cλ are neglected)

Learning discriminative dictionaries

Discriminative dictionary learning algorithm by Mairal et al.

Like K-SVD, split minimization in two alternating stages

Sparse coding stage

For 1 ≤ i ≤ 2 and 1 ≤ l ≤M := M1 +M2 use OMP to get

xil ≈ x∗(yl, Di) = argminx∈RK ,‖x‖0≤L

R(yl, Di, x)

Codebook update stage

For 1 ≤ i ≤ 2, update Di and xi1, . . . , xiM with fixed sparsity

pattern using a truncated Newton iteration(“truncated” → second derivatives of Cλ are neglected)

Learning discriminative dictionaries

Discriminative dictionary learning algorithm by Mairal et al.

Like K-SVD, split minimization in two alternating stages

Sparse coding stage

For 1 ≤ i ≤ 2 and 1 ≤ l ≤M := M1 +M2 use OMP to get

xil ≈ x∗(yl, Di) = argminx∈RK ,‖x‖0≤L

R(yl, Di, x)

Codebook update stage

For 1 ≤ i ≤ 2, update Di and xi1, . . . , xiM with fixed sparsity

pattern using a truncated Newton iteration(“truncated” → second derivatives of Cλ are neglected)

Learning discriminative dictionaries

Discriminative dictionary learning algorithm by Mairal et al.

Like K-SVD, split minimization in two alternating stages

Sparse coding stage

For 1 ≤ i ≤ 2 and 1 ≤ l ≤M := M1 +M2 use OMP to get

xil ≈ x∗(yl, Di) = argminx∈RK ,‖x‖0≤L

R(yl, Di, x)

Codebook update stage

For 1 ≤ i ≤ 2, update Di and xi1, . . . , xiM with fixed sparsity

pattern using a truncated Newton iteration(“truncated” → second derivatives of Cλ are neglected)

Learning discriminative dictionaries

The remaining foils will be publishedlater