Post on 20-Jan-2023
Image segmentation based onlearned discriminative dictionaries
Benjamin Berkels1
Martin Rumpf2, Marc Kotowski3, Carlo Schaller3
1Interdisciplinary Mathematics InstituteUniversity of South Carolina
2Institute for Numerical SimulationUniversity of Bonn
3Service de NeurochirurgieHopitaux Universitaires de Geneve
New Frontiers in Imaging and Sensingon February 17th – 22nd in Columbia
Learning reconstructive dictionaries
Sparse signal representations based on overcomplete dictionaries
Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.
signals and atoms are considered to be elements of RN
a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj
dictionaries are usually overcomplete, i. e. K > N
in imaging, signals are usually small patches of an image
Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.
minx∈RK
‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.
Problem: How to design of suitable dictionaries?
Learning reconstructive dictionaries
Sparse signal representations based on overcomplete dictionaries
Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.
signals and atoms are considered to be elements of RN
a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj
dictionaries are usually overcomplete, i. e. K > N
in imaging, signals are usually small patches of an image
Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.
minx∈RK
‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.
Problem: How to design of suitable dictionaries?
Learning reconstructive dictionaries
Sparse signal representations based on overcomplete dictionaries
Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.
signals and atoms are considered to be elements of RN
a set of atoms d1, . . . , dK is called dictionary
,represented by the matrix D ∈ RN×K whose j-th column is dj
dictionaries are usually overcomplete, i. e. K > N
in imaging, signals are usually small patches of an image
Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.
minx∈RK
‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.
Problem: How to design of suitable dictionaries?
Learning reconstructive dictionaries
Sparse signal representations based on overcomplete dictionaries
Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.
signals and atoms are considered to be elements of RN
a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj
dictionaries are usually overcomplete, i. e. K > N
in imaging, signals are usually small patches of an image
Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.
minx∈RK
‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.
Problem: How to design of suitable dictionaries?
Learning reconstructive dictionaries
Sparse signal representations based on overcomplete dictionaries
Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.
signals and atoms are considered to be elements of RN
a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj
dictionaries are usually overcomplete, i. e. K > N
in imaging, signals are usually small patches of an image
Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.
minx∈RK
‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.
Problem: How to design of suitable dictionaries?
Learning reconstructive dictionaries
Sparse signal representations based on overcomplete dictionaries
Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.
signals and atoms are considered to be elements of RN
a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj
dictionaries are usually overcomplete, i. e. K > N
in imaging, signals are usually small patches of an image
Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.
minx∈RK
‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.
Problem: How to design of suitable dictionaries?
Learning reconstructive dictionaries
Sparse signal representations based on overcomplete dictionaries
Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.
signals and atoms are considered to be elements of RN
a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj
dictionaries are usually overcomplete, i. e. K > N
in imaging, signals are usually small patches of an image
Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.
minx∈RK
‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.
Problem: How to design of suitable dictionaries?
Learning reconstructive dictionaries
Sparse signal representations based on overcomplete dictionaries
Key assumption: Finite dimensional signals can be wellapproximated by sparse linear combinations of atom signals.
signals and atoms are considered to be elements of RN
a set of atoms d1, . . . , dK is called dictionary,represented by the matrix D ∈ RN×K whose j-th column is dj
dictionaries are usually overcomplete, i. e. K > N
in imaging, signals are usually small patches of an image
Sparse approximation problem (sparsity-constrained)Given an input signal y, find its best sparse approximation, i. e.
minx∈RK
‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N.
Problem: How to design of suitable dictionaries?
Learning reconstructive dictionaries
Designing overcomplete dictionaries
Two distinct design approaches
Use a predefined dictionary associated to a signal transform
Popular transforms:
the short-time Fourier transformthe wavelet transformthe curvelet transformthe contourlet transform. . .
Learn from input or representative training data
Popular algorithms:
the Method of Optimal Directions (MOD)the K-SVD algorithm. . .
Learning reconstructive dictionaries
Designing overcomplete dictionaries
Two distinct design approaches
Use a predefined dictionary associated to a signal transform
Popular transforms:
the short-time Fourier transformthe wavelet transformthe curvelet transformthe contourlet transform. . .
Learn from input or representative training data
Popular algorithms:
the Method of Optimal Directions (MOD)the K-SVD algorithm. . .
Learning reconstructive dictionaries
Designing overcomplete dictionaries
Two distinct design approaches
Use a predefined dictionary associated to a signal transform
Popular transforms:
the short-time Fourier transformthe wavelet transformthe curvelet transformthe contourlet transform. . .
Learn from input or representative training data
Popular algorithms:
the Method of Optimal Directions (MOD)the K-SVD algorithm. . .
Learning reconstructive dictionaries
A variational approach to dictionary design
Given: M patches y1, ..., yM ∈ RN
Task: Find a dictionary optimal to reconstruct these patches
A dictionary D is suitable to reconstruct a patch y, if
minx∈RK
‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N,
the sparse reconstruction error is small.
−→ Minimization problem:
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M.
Difficult, nonconvex problem → sophisticated algorithms needed
Learning reconstructive dictionaries
A variational approach to dictionary design
Given: M patches y1, ..., yM ∈ RN
Task: Find a dictionary optimal to reconstruct these patches
A dictionary D is suitable to reconstruct a patch y, if
minx∈RK
‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N,
the sparse reconstruction error is small.
−→ Minimization problem:
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M.
Difficult, nonconvex problem → sophisticated algorithms needed
Learning reconstructive dictionaries
A variational approach to dictionary design
Given: M patches y1, ..., yM ∈ RN
Task: Find a dictionary optimal to reconstruct these patches
A dictionary D is suitable to reconstruct a patch y, if
minx∈RK
‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N,
the sparse reconstruction error is small.
−→ Minimization problem:
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M.
Difficult, nonconvex problem → sophisticated algorithms needed
Learning reconstructive dictionaries
A variational approach to dictionary design
Given: M patches y1, ..., yM ∈ RN
Task: Find a dictionary optimal to reconstruct these patches
A dictionary D is suitable to reconstruct a patch y, if
minx∈RK
‖y −Dx‖2 such that ‖x‖0 ≤ L for a fixed L ∈ N,
the sparse reconstruction error is small.
−→ Minimization problem:
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M.
Difficult, nonconvex problem → sophisticated algorithms needed
Learning reconstructive dictionaries
The K-SVD algorithm
Strategy: [Aharon, Elad, Bruckstein TSP ’06]
Split minimization in two alternating stages
Sparse coding stage
For 1 ≤ l ≤M solve
minxl∈RK
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L
Codebook update stage
Update D and x1, . . . , xM with fixed sparsity pattern using
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2
Sparse coding is a well-known problem→ Use any pursuit algorithm to handle it, e. g. OMP
Learning reconstructive dictionaries
The K-SVD algorithm
Strategy: [Aharon, Elad, Bruckstein TSP ’06]
Split minimization in two alternating stages
Sparse coding stage
For 1 ≤ l ≤M solve
minxl∈RK
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L
Codebook update stage
Update D and x1, . . . , xM with fixed sparsity pattern using
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2
Sparse coding is a well-known problem→ Use any pursuit algorithm to handle it, e. g. OMP
Learning reconstructive dictionaries
The K-SVD algorithm
Strategy: [Aharon, Elad, Bruckstein TSP ’06]
Split minimization in two alternating stages
Sparse coding stage
For 1 ≤ l ≤M solve
minxl∈RK
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L
Codebook update stage
Update D and x1, . . . , xM with fixed sparsity pattern using
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2
Sparse coding is a well-known problem→ Use any pursuit algorithm to handle it, e. g. OMP
Learning reconstructive dictionaries
The K-SVD algorithm
Strategy: [Aharon, Elad, Bruckstein TSP ’06]
Split minimization in two alternating stages
Sparse coding stage
For 1 ≤ l ≤M solve
minxl∈RK
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L
Codebook update stage
Update D and x1, . . . , xM with fixed sparsity pattern using
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2
Sparse coding is a well-known problem
→ Use any pursuit algorithm to handle it, e. g. OMP
Learning reconstructive dictionaries
The K-SVD algorithm
Strategy: [Aharon, Elad, Bruckstein TSP ’06]
Split minimization in two alternating stages
Sparse coding stage
For 1 ≤ l ≤M solve
minxl∈RK
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L
Codebook update stage
Update D and x1, . . . , xM with fixed sparsity pattern using
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2
Sparse coding is a well-known problem→ Use any pursuit algorithm to handle it, e. g. OMP
Learning reconstructive dictionaries
The K-SVD algorithm
Codebook update stage
For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}
determine the restricted error matrix Ej ∈ RN×|ωj |,
column corresponding to m ∈ ωj is
ym −Dxm + (xm)jdj
Note:∑l∈ωj
‖yl −Dxl‖2 =∥∥Ej − (xωl
)jdj∥∥2
calculate SVD of Ej , i. e. Ej = U∆V T
dj ← “first column of U”
((xm)j)m∈ωj ← “∆11∗ first column of V ”
Learning reconstructive dictionaries
The K-SVD algorithm
Codebook update stage
For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}determine the restricted error matrix Ej ∈ RN×|ωj |,
column corresponding to m ∈ ωj is
ym −Dxm + (xm)jdj
Note:∑l∈ωj
‖yl −Dxl‖2 =∥∥Ej − (xωl
)jdj∥∥2
calculate SVD of Ej , i. e. Ej = U∆V T
dj ← “first column of U”
((xm)j)m∈ωj ← “∆11∗ first column of V ”
Learning reconstructive dictionaries
The K-SVD algorithm
Codebook update stage
For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}determine the restricted error matrix Ej ∈ RN×|ωj |,
column corresponding to m ∈ ωj is
ym −Dxm + (xm)jdj
Note:∑l∈ωj
‖yl −Dxl‖2 =∥∥Ej − (xωl
)jdj∥∥2
calculate SVD of Ej , i. e. Ej = U∆V T
dj ← “first column of U”
((xm)j)m∈ωj ← “∆11∗ first column of V ”
Learning reconstructive dictionaries
The K-SVD algorithm
Codebook update stage
For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}determine the restricted error matrix Ej ∈ RN×|ωj |,
column corresponding to m ∈ ωj is
ym −Dxm + (xm)jdj
Note:∑l∈ωj
‖yl −Dxl‖2 =∥∥Ej − (xωl
)jdj∥∥2
calculate SVD of Ej , i. e. Ej = U∆V T
dj ← “first column of U”
((xm)j)m∈ωj ← “∆11∗ first column of V ”
Learning reconstructive dictionaries
The K-SVD algorithm
Codebook update stage
For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}determine the restricted error matrix Ej ∈ RN×|ωj |,
column corresponding to m ∈ ωj is
ym −Dxm + (xm)jdj
Note:∑l∈ωj
‖yl −Dxl‖2 =∥∥Ej − (xωl
)jdj∥∥2
calculate SVD of Ej , i. e. Ej = U∆V T
dj ← “first column of U”
((xm)j)m∈ωj ← “∆11∗ first column of V ”
Learning reconstructive dictionaries
The K-SVD algorithm
Codebook update stage
For 1 ≤ j ≤ Kdetermine the patches using dj , i. e. ωj = {l : (xl)j 6= 0}determine the restricted error matrix Ej ∈ RN×|ωj |,
column corresponding to m ∈ ωj is
ym −Dxm + (xm)jdj
Note:∑l∈ωj
‖yl −Dxl‖2 =∥∥Ej − (xωl
)jdj∥∥2
calculate SVD of Ej , i. e. Ej = U∆V T
dj ← “first column of U”
((xm)j)m∈ωj ← “∆11∗ first column of V ”
Learning reconstructive dictionaries
Reconstructive dictionary application example
Denoising
Dictionary based denoising of a very noisy input image
(input image courtesy of Douglas Blom)
Learning reconstructive dictionaries
Reconstructive dictionary application example
Denoising
Gaussian blur next to the dictionary based denoising
(input image courtesy of Douglas Blom)
Learning discriminative dictionaries
From reconstructive to discriminative dictionaries
Notation
approximation residual
R(y,D, x) := ‖y −Dx‖2
coefficients of the sparse best approximation
x∗(y,D) := argminx∈RK ,‖x‖0≤L
R(y,D, x)
best approximation error
R(y,D) := R(y,D, x∗(y,D))
Recall: Reconstructive dictionary learning problem
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M
Reformulation of the reconstructive dictionary learning problem
minD∈RN×K
M∑l=1
R(yl, D)
Learning discriminative dictionaries
From reconstructive to discriminative dictionaries
Notation
approximation residual
R(y,D, x) := ‖y −Dx‖2
coefficients of the sparse best approximation
x∗(y,D) := argminx∈RK ,‖x‖0≤L
R(y,D, x)
best approximation error
R(y,D) := R(y,D, x∗(y,D))
Recall: Reconstructive dictionary learning problem
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M
Reformulation of the reconstructive dictionary learning problem
minD∈RN×K
M∑l=1
R(yl, D)
Learning discriminative dictionaries
From reconstructive to discriminative dictionaries
Notation
approximation residual
R(y,D, x) := ‖y −Dx‖2
coefficients of the sparse best approximation
x∗(y,D) := argminx∈RK ,‖x‖0≤L
R(y,D, x)
best approximation error
R(y,D) := R(y,D, x∗(y,D))
Recall: Reconstructive dictionary learning problem
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M
Reformulation of the reconstructive dictionary learning problem
minD∈RN×K
M∑l=1
R(yl, D)
Learning discriminative dictionaries
From reconstructive to discriminative dictionaries
Notation
approximation residual
R(y,D, x) := ‖y −Dx‖2
coefficients of the sparse best approximation
x∗(y,D) := argminx∈RK ,‖x‖0≤L
R(y,D, x)
best approximation error
R(y,D) := R(y,D, x∗(y,D))
Recall: Reconstructive dictionary learning problem
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M
Reformulation of the reconstructive dictionary learning problem
minD∈RN×K
M∑l=1
R(yl, D)
Learning discriminative dictionaries
From reconstructive to discriminative dictionaries
Notation
approximation residual
R(y,D, x) := ‖y −Dx‖2
coefficients of the sparse best approximation
x∗(y,D) := argminx∈RK ,‖x‖0≤L
R(y,D, x)
best approximation error
R(y,D) := R(y,D, x∗(y,D))
Recall: Reconstructive dictionary learning problem
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M
Reformulation of the reconstructive dictionary learning problem
minD∈RN×K
M∑l=1
R(yl, D)
Learning discriminative dictionaries
From reconstructive to discriminative dictionaries
Notation
approximation residual
R(y,D, x) := ‖y −Dx‖2
coefficients of the sparse best approximation
x∗(y,D) := argminx∈RK ,‖x‖0≤L
R(y,D, x)
best approximation error
R(y,D) := R(y,D, x∗(y,D))
Recall: Reconstructive dictionary learning problem
minx1,...,xM∈RK
D∈RN×K
M∑l=1
‖yl −Dxl‖2 such that ‖xl‖0 ≤ L for 1 ≤ l ≤M
Reformulation of the reconstructive dictionary learning problem
minD∈RN×K
M∑l=1
R(yl, D)
Learning discriminative dictionaries
A variational approach to discriminative dictionaries
Given: Patches y1, ..., yM1+M2 of two different classes P1 and P2
(Pi := {yl : l ∈ Si}, S1 = {1, . . . ,M1}, S2 = M1 + {1, . . . ,M2})Task: Find a dictionary pair to distinguish between P1 and P2
Learning discriminative dictionaries
A variational approach to discriminative dictionaries
Given: Patches y1, ..., yM1+M2 of two different classes P1 and P2
(Pi := {yl : l ∈ Si}, S1 = {1, . . . ,M1}, S2 = M1 + {1, . . . ,M2})Task: Find a dictionary pair to distinguish between P1 and P2
−→ Minimization problem: [Mairal et al. CVPR ’08]
minD1,D2
2∑i=1
1
Mi
∑l∈Si
[Cλ((−1)i(R(yl, D1)−R(yl, D2))
)+ λγR(yl, Di)]
where Cλ(s) = ln(1 + exp(−λs)) (logistic loss function)
Learning discriminative dictionaries
A variational approach to discriminative dictionaries
0
5
10
−1 0 1
Influence of λ on Cλ: C5 (red), C10 (green) and C20 (blue)
Learning discriminative dictionaries
A variational approach to discriminative dictionaries
Given: Patches y1, ..., yM1+M2 of two different classes P1 and P2
(Pi := {yl : l ∈ Si}, S1 = {1, . . . ,M1}, S2 = M1 + {1, . . . ,M2})Task: Find a dictionary pair to distinguish between P1 and P2
−→ Minimization problem: [Mairal et al. CVPR ’08]
minD1,D2
2∑i=1
1
Mi
∑l∈Si
[Cλ((−1)i(R(yl, D1)−R(yl, D2))
)+ λγR(yl, Di)]
where Cλ(s) = ln(1 + exp(−λs)) (logistic loss function)
Note:
Cλ((−1)1(R(yl, D1)−R(yl, D2))
){≈ 0 R(yl, D1)� R(yl, D2)
� 0 R(yl, D1)� R(yl, D2)
Learning discriminative dictionaries
Discriminative dictionary learning algorithm by Mairal et al.
Like K-SVD, split minimization in two alternating stages
Sparse coding stage
For 1 ≤ i ≤ 2 and 1 ≤ l ≤M := M1 +M2 use OMP to get
xil ≈ x∗(yl, Di) = argminx∈RK ,‖x‖0≤L
R(yl, Di, x)
Codebook update stage
For 1 ≤ i ≤ 2, update Di and xi1, . . . , xiM with fixed sparsity
pattern using a truncated Newton iteration(“truncated” → second derivatives of Cλ are neglected)
Learning discriminative dictionaries
Discriminative dictionary learning algorithm by Mairal et al.
Like K-SVD, split minimization in two alternating stages
Sparse coding stage
For 1 ≤ i ≤ 2 and 1 ≤ l ≤M := M1 +M2 use OMP to get
xil ≈ x∗(yl, Di) = argminx∈RK ,‖x‖0≤L
R(yl, Di, x)
Codebook update stage
For 1 ≤ i ≤ 2, update Di and xi1, . . . , xiM with fixed sparsity
pattern using a truncated Newton iteration(“truncated” → second derivatives of Cλ are neglected)
Learning discriminative dictionaries
Discriminative dictionary learning algorithm by Mairal et al.
Like K-SVD, split minimization in two alternating stages
Sparse coding stage
For 1 ≤ i ≤ 2 and 1 ≤ l ≤M := M1 +M2 use OMP to get
xil ≈ x∗(yl, Di) = argminx∈RK ,‖x‖0≤L
R(yl, Di, x)
Codebook update stage
For 1 ≤ i ≤ 2, update Di and xi1, . . . , xiM with fixed sparsity
pattern using a truncated Newton iteration(“truncated” → second derivatives of Cλ are neglected)
Learning discriminative dictionaries
Discriminative dictionary learning algorithm by Mairal et al.
Like K-SVD, split minimization in two alternating stages
Sparse coding stage
For 1 ≤ i ≤ 2 and 1 ≤ l ≤M := M1 +M2 use OMP to get
xil ≈ x∗(yl, Di) = argminx∈RK ,‖x‖0≤L
R(yl, Di, x)
Codebook update stage
For 1 ≤ i ≤ 2, update Di and xi1, . . . , xiM with fixed sparsity
pattern using a truncated Newton iteration(“truncated” → second derivatives of Cλ are neglected)