arXiv:2108.05551v1 [math-ph] 12 Aug 2021

268
arXiv:2108.05551v1 [math-ph] 12 Aug 2021 Stochastic processes in classical and quantum physics and engineering Harish Parthasarathy ECE division, NSUT August 13, 2021

Transcript of arXiv:2108.05551v1 [math-ph] 12 Aug 2021

arX

iv:2

108.

0555

1v1

[m

ath-

ph]

12

Aug

202

1

Stochastic processes in classical and quantum

physics and engineering

Harish ParthasarathyECE division, NSUT

August 13, 2021

2

Preface

This book covers a wide range of problems involving the applications ofstochastic processes, stochastic calculus, large deviation theory, group represen-tation theory and quantum statistics to diverse fields in dynamical systems, elec-tromagnetics, statistical signal processing, quantum information theory, quan-tum neural network theory, quantum filtering theory, quantum electrodynam-ics, quantum general relativity, string theory, problems in biology and classicaland quantum fluid dynamics. The selection of the problems has been basedon courses taught by the author to undergraduates and postgraduates in elec-tronics and communication engineering. The theory of stochastic processes hasbeen applied to problems of system identification based on time series models ofthe process concerned to derive time and order recursive parameter estimationfollowed by statistical performance analysis. The theory of matrices has beenapplied to problems such as quantum hypothesis testing between two states in-cluding a proof of the quantum Stein Lemma involving asymptotic formulas forthe error probabilities involved in discriminating between a sequence of orderedpairs of tensor product states. This problem is the quantum generalization ofthe large deviation result regarding discriminating between to probability dis-tributions from iid measurements. The error rates are obtained in terms of thequantum relative entropy between the two states. Classical neural networks areused to match an input-output signal pair. This means that the weights of thenetwork are trained so that it takes as input the true system input and repro-duces a good approximaition of the system output. But if the input and outputsignals are stochastic processes characterized by probability densities and notby signal values, then the network must take as input the input probabilitydistribution and reproduce a good approximation of the output probability dis-tribution. This is a very hard task since a probability distribution is specifiedby an infinite number of values. A quantum neural network solves this problemeasily since Schrodinger’s equation evolves a wave function whose magnitudesquare is always a probability density and hence by controlling the potentialin this equation using the input pdf (probability distribution function) and theerror between the desired output pdf and that generated by Schrodinger’s equa-tion, we are able to track a time varying output pdf successfully during thetraining process. This has applications to the problem of transforming speechdata to EEG data. Speech is characterized by a pdf and so is EEG. Further thespeech pdf is correlated to the EEG pdf because if the brain acquires a disease,the EEG signal will get distorted and correspondingly the speech signal will alsoget slurred. Thus, the quantum neural network based on Schrodinger’s equationcan exploit this correlation to get a fairly good approximation of the EEG pdfgiven the speech pdf. These issues have been discussed in this book. Other prob-lems include a proof of the celebrated Lieb’s inequality which is used in provingconvexity and concavity properties of various kinds of quantum entropies andquantum relative entropies that are used in establishing coding theorems inquantum information theory like what is the maximum rate at which classicalinformation can be transmitted over a Cq channel so that a decoder with errorprobability converging to zero can be constructed. Some of the inequalities dis-

3

cussed in this book find applications to more advanced problems in quantuminformation theory that have not been discussed here like how to generate a se-quence of maximally entangled states having maximal dimension asymptoticallyfrom a sequence of tensor product states using local quantum operations withone way or two way classical communication using a sequence of TPCP mapsor conversely, how to generate a sequence tensor product states optimally froma sequence of maximally entangled states by a sequence of TPCP maps withthe maximally entangled states having minimal dimension. Or how to gener-ate a sequence of maximal shared randomness between two users a sequence ofquantum operations on a sequence of maximally entangled states. The perfor-mance of such algorithms are characterized by the limit of the logarithm of thelength divided by the number of times that the tensor product is taken. Thesequantities are related to the notion of information rate. Or, the maximal rate atwhich information can be transmitted from one user to another when they sharea partially entangled state. This book also covers problems involving quantiza-tion of the action functionals in general relativity and expressing the resultingapproximate Hamiltonian as perturbations to harmonic oscillator Hamiltonians.It also covers topics such as writing down the Schrodinger equation for a mixedstate of a particle moving in a potential coupled to a noisy bath in terms ofthe Wigner distribution function for position and momentum. The utility ofthis lies in the fact that the position and momentum observables in quantummechanics do not commute and hence by the Heisenberg uncertainty principle,they do not have a joint probability density unlike the classical case of a particlemoving in a potential with fluctuation and dissipation forces present where wehave a Fokker-Planck equation for the joint pdf of the position and momentum.The Wigner distribution being complex, is not a joint probability distributionfunction but its marginals are and it is the closest that one can come in quan-tum mechanics for making an analogy with classical joint probabilities and weshow in this book that its dynamics is the same a that of a classical Fokker-Planck equation but with small quantum correction terms that are representedby a power series in Planck’s constant. We then also apply this formalism toBelavkin’s quantum filtering theory based on the Hudson-Parthasarathy quan-tum stochastic calculus by showing that when this equation for a particle movingin a potential with damping and noise is described in terms of the Wigner dis-tribution function, then it just the same as the Kushner-Kallianpur stochasticfilter but with quantum correction terms expressed as a power series in Planck’sconstant. This book also covers some aspects of classical large deviation theoryespecially in the derivation of the rate function of the empirical distribution ofa Markov process and applied to the Brownian motion and the Poisson process.This problem has applications to stochastic control where we desire to controlthe probability of deviation of the empirical distribution of the output processof a stochastic system from a desired probability distribution by incorporatingcontrol terms in the stochastic differential equation. This book also discussessome aspects of string theory especially the problem of how field equations ofphysics get modified when string theory is taken into account and conformalinvariance is imposed. How supersymmetry breaking stochastic terms in the

4

Lagrangian can be reduced by feedback stochastic control etc. Some problemdeal with modeling blood flow, breathing mechanisms etc using stochastic or-dinary and partial differential equations and how to control them by applyinglarge deviation theory. One of the problems deals with analysis of the motionof a magnet through a tube with a coil wound around it in a graviational fieldtaking into account electromagnetic induction and back reaction. This problemis result of discussions with one of my colleagues. The book discusses quan-tization of this motion when the electron and electromagnetic fields are alsoquantized. In short, this book will be useful to research workers in appliedmathematics, physics and quantum information theory who wish to explore therange of applications of classical and quantum stochastics to problems of physicsand engineering.

Table of Contents

chapter 1

Stability theory of nonlinear systems based on Taylor approximations of thenonlinearity

1.1 Stochastic perturbation of nonlinear dynamical systems:Linearized anal-ysis

1.2 Large deviation analysis of the stability problem1.3 LDP for Yang-Mills gauge fields and LDP for the quantized metric tensor

of space-time1.4 Strings in background fields

chapter 2

2.1 String theory and large deviations2.2 Evaluating quantum string amplitudes using functional integration2.3 LDP for Poisson fields2.4 LDP for strings perturbed by mixture of Gaussian and Poisson noise

chapter 3

ordinary, partial and stochastic differential equation models for biologicalsystems,quantum gravity, Hawking radiation, quantum fluids,statistical signalprocessing, group representations, large deviations in statistical physics

3.1 A lecture on biological aspects of virus attacking the immune system3.2 More on quantum gravity3.3 Large deviation properties of virus combat with antibodies3.4 Large deviations problems in quantum fluid dynamics3.5 Lecture schedule for statistical signal processing3.6 Basic concepts in group theory and group representation theory3.7 Some remarks on quantum gravity using the ADM action3.8 Diffusion exit problem3.9 Large deviations for the Boltzmann kinetic transport equation3.10 Orbital integrals for SL(2,R) with applications to statistical image pro-

cessing.3.11 Loop Quantum Gravity (LQG)

5

3.12 Group representations in relativistic image processing3.13 Approximate solution to the wave equation in the vicinity of the event

horizon of a Schwarchild black-hole3.14 More on group theoretic image processing3.15 A problem in quantum information theory.3.16 Hawking radiationchapter 4

Research Topics on representations with image processing applications,breathingdynamics

4.1 Remarks on induced representations and Haar integrals.4.2 On the dynamics of breathing in the presence of a mask4.3 Plancherel formula for SL(2,R)4.4 Some applications of the Plancherel formula for a Lie group to image

processingchapter 5

Some problems in quantum information theory5.1 Quantum information theory and quantum blackhole physics5.2 Harmonic analysis on the Schwarz space of G = SL(2,R)5.3 Monotonicity of quantum entropy under pinchingchapter 6

6.1 Lectures on Statistical Signal Processing6.2 Large deviation analysis of the RLS algorithm6.3 Problem suggested by my colleague Prof. Dhananjay6.4 Fidelity between two quantum states6.5 Stinespring’s representation of a TPCP map, ie, of a quantum operation6.6 Approximate computation of the matrix elements of the principal and

discrete series for G = SL(2,R)6.7 Proof of the converse part of the Cq coding theorem6.8 Proof of the direct part of the Cq coding theorem6.9 Induced representations6.10 Estimating non-uniform transmission line parameters and line voltage

and current using the extended Kalman filter6.11 Multipole expansion of the magnetic field produced by a time varying

current distributionchapter 7

Some aspects of stochastic processes7.1 Problem suggested by Prof.Dhananjay7.2 Converse of the Shannon coding theoremchapter 8

Problems in electromagnetics and gravitationchapter 9

Some more aspects of quantum information theory9.1 An expression for state detection error.9.2 Operator convex functions and operator monotone functions9.3 A selection of matrix inequalities9.4 Matrix Holder Inequality

6

chapter 10

10.1 Lecture Plan for Statistical Signal Processing10.2 End-Semester Exam, duration:four hours, Pattern Recognition10.3 Quantum Stein’s theorem10.4 Question paper, short test, Statistical signal processing, M.Tech10.5 Question paper on statistical signal processing, long test10.5.1 Atom excited by a quantum field10.5.2 Estimating the power spectral density of a stationary Gaussian process

using an AR model:Performance analysis of the algorithm10.5.3 Statistical performance analysis of the MUSIC and ESPRIT algo-

rithms10.6 Quantum neural networks for estimating EEG pdf from speech pdf

based on Schrodinger’s equationReference:Paper by Vijay Upreti, Harish Parthasarathy, Vijayant Agarwal

and Guguloth Sagar.sectionStatistical Signal Processing, M.Tech, Long TestMax marks:5010.7 Monotonocity of quantum relative Renyi entropychapter 11

My commentary on the Ashtavakra Gitachapter 12

LDP-UKF based controllerchapter 13

Abstracts for a book on quantum signal processing using quantum field the-ory

chapter 14

Quantum neural networks and learning using mixed state dynamics for openquantum systems

chapter 15

Quantum gate design, magnet motion, quantum filtering, Wigner distribu-tions

15.1 Designing a quantum gate using QED with a control c-number fourpotential

15.2 On the motion of a magnet through a pipe with a coil under gravityand the reaction force exerted by the induced current through the coil on themagnet

15.3 Quantum filtering theory applied to quantum field theory for trackinga given probability density function

15.4 Lecture on the role of statistical signal processing in general relativityand quantum mechanics

15.5 Problems in classical and quantum stochastic calculus15.6 Some remarks on the equilibrium Fokker-Planck equation for the Gibbs

distribution15.7 Wigner-distributions in quantum mechanicsConstruction of the quantum Fokker-Planck equation for a specific

choice of the Lindblad operator

7

Problems in quantum corrections to classical theories in probabil-

ity theory and in mechanics

15.8 Motion of an element having both an electric and a magnetic dipolemoment in an external electromagnetic field plus a coil

chapter 16

Large deviations for Markov chains, quantum master equation, Wigner dis-tribution for the Belavkin filter

16.1 Large deviation rate function for the empirical distribution of a Markovchain

16.2 Quantum Master Equation in general relativity16.3 Belavkin filter for the Wigner distirbution functionchapter 17

String and field theory with large deviations stochastic algorithm conver-gence

17.1 String theoretic corrections to field theories17.2 Large deviations problems in string theory17.3 Convergence analysis of the LMS algorithm17.4 String interacting with scalar field, gauge field and a gravitational field

8

Chapter 1

Stability theory ofnonlinear systems based onTaylor approximations ofthe nonlinearity

1.1 Stochastic perturbation of nonlinear dynam-ical systems:Linearized analysis

Consider the nonlinear system defined by the differential equation

dx(t)/dt = f(x(t)), t ≥ 0

where f : Rn → Rn is a continuously differentiable mapping. Let x0 be a fixed

point of this system, ief(x0) = 0

Denoting δx(t) to be a small fluctuation of the state x(t) around the stabilitypoint x0, we get on first order Taylor expansion, ie, linearization,

dδx(t)/dt = f ′(x0)δx(t) +O(|δx(t)|2)

If we neglect O(|δx|2) terms, then we obtain the approximate solution to thisdifferential equation as

δx(t) = exp(tA)δx(0), t ≥ 0

whereA = f ′(x0)

is the Jacobian matrix of f at the equilibrium point x0 and hence it follows thatthis linearized system is asymptotically stable under small perturbations in all

9

10CHAPTER 1. STABILITY THEORYOFNONLINEAR SYSTEMS BASED ONTAYLOR APPROXIMATIONSOF THE NONLINEARITY

directions iff all the eigenvalues of A have negative real parts. By asymptoti-cally stable, we meant that limt→∞δx(t) = 0. The linearized system is criticallystable at x0 if all the eigenvalues of A have non-positive real part, ie, an eigen-value being purely imaginary is also allowed. By critically stable, we mean thatδx(t), t ≥ 0 remains bounded in time.

Stochastic perturbations: Consider now the stochastic dynamical systemobtained by adding a source term on the rhs of the above dynamical system

dx(t)/dt = f(x(t)) + w(t)

where w(t) is stationary Gaussian noise with mean zero and autcorrelation

Rw(τ) = E(w(t + τ)w(t))

Now it is meaningless to speak of a fixed point, so we choose a trajectory x0(t)that solves the noiseless problem

dx0(t)/dt = f(x0(t))

Let δx(t) be a small perturbation around x0(t) so that x(t) = x0(t) + δx(t)solves the noisy problem. Then, assuming w(t) to be small and δx(t) to be ofthe ”same order or magnitude” as w(t), we get approximately upto linear ordersin w(t), the linearized equation

dδx(t)/dt = Aδx(t) + w(t), A = f ′(x0)

which has solution

δx(t) = Φ(t)δx(0) +

∫ t

0

Φ(t− τ)w(τ)dτ, t ≥ 0,

whereΦ(t) = exp(tA)

Assuming δx(0) to be a nonrandom initial perturbation, we get

E(δx(t)) = Φ(t)δx(0)

which converges to zero as t → ∞ provided that all the eigenvalues of A havenegative real part. Moreover, in this case,

E(|δx(t)|2) = Tr

∫ t

0

∫ t

0

Φ(t− t1)Rw(t1 − t2)Φ(t− t2)T dt1dt2

= Tr

∫ t

0

∫ t

0

Φ(t1)Rw(t1 − t2)Φ(t2)T dt1dt2

which converges as t→ ∞ to

Px = Tr

∫ ∞

0

∫ ∞

0

Φ(t1)Rw(t1 − t2)Φ(t2)T dt1dt2

1.2. LARGE DEVIATION ANALYSIS OF THE STABILITY PROBLEM 11

Suppose now that A is diagonalizable with spectral representation

A = −r

k=1

λ(k)Ek

Then

Φ(t) =

r∑

k=1

exp(−tλ(k))Ek

and we get

, Px =

r∑

k,m=1

∫ ∞

0

∫ ∞

0

Tr(Ek.Rw(t1 − t2).Em)exp(−λ(k)t1 − λ(m)t2)dt1dt2

A sufficient condition for this to be finite is that |Rw(t)| remain bounded inwhich case we get when

supt|Rw(t)| ≤ B <∞∀t

that

Px ≤ B

r∑

k,m=1

(Re(λ(k) + λ(m)))−1 ≤ 2Br2/λR(min)

whereλR(min) = minkRe(λ(k))

1.2 Large deviation analysis of the stability prob-lem

Let x(t) be a diffusion process starting at some point x0 within an open setV with boundary S = ∂V . If the noise level is low say parametrized by theparameter ǫ, then the diffusion equation can be expressed as

dx(t) = f(x(t))dt +√ǫg(x(t))dB(t)

Let Pǫ(x) denote the probability of a sample path of the process that starts atx0 end ends at the stop time τǫ = t at which the process first hits the boundaryS. Then, from basic LDP, we know that the approximate probability of such apath is given by

exp(−ǫ−1infy∈SV (x0, y, t))

where

V (x, t) = infu:dx(s)/ds=f(x(s))+g(x(s))u(s),0≤s≤t,x(0)=x0

∫ t

0

u(s)2ds/2

12CHAPTER 1. STABILITY THEORYOFNONLINEAR SYSTEMS BASED ONTAYLOR APPROXIMATIONSOF THE NONLINEARITY

andV (x0, y, t) = infx:x(0)=x0,x(t)=yV (x0, x, t)

The greater the probability of the process hitting S, the lesser the mean valueof the hitting time. This is valid with an inverse proportion. Thus, we can saythat

τǫ ≈ exp(V/ǫ), ǫ→ 0

whereV = infy∈S,t≥0V (x0, y, t)

More precisely, we have the LDP

limǫ→0P (exp((V − δ)/ǫ) < τǫ < exp((V + δ)/ǫ)) = 1

Note that τǫ = exp(V/ǫ) for a given path x implies that the probability of xPǫ(x) = exp(−V/ǫ). This is because of the relative frequency interpretation ofprobability given by Von Mises or equivalently because of ergodic theory: If thepath hits the boundary N(ǫ) times in a total of N0 trials, with the time takento hit being τǫ for each path, then the probability of hitting the boundary isN(ǫ)/N . However, since it takes a time τǫ for each trial hit, it follows that thenumber of hits in time T is T/τǫ. Thus if each trial run in the set of N trials isof duration T , then the probability of hit is

Pǫ = N(ǫ)/N = T/(N.τǫ)

and if we assume that that T,N are fixed, or equivalently that T,N → ∞ withN/T → to a finite nonzero constant, then we deduce from the above,

τǫ ≈ exp(V/ǫ)

in the sense that ǫ.log(τǫ) converges to V as ǫ→ 0.

1.3 LDP for Yang-Mills gauge fields and LDPfor the quantized metric tensor of space-

time

Reference: Thomas Thiemann, ”Modern canonical quantum general relativity”,Cambridge university press.

Let L(qab, q′ab, N,N

a) denote the ADM Lagrangian of space-time. Here,Na defines the diffeomorphism constraint, N the Hamiltonian constraint. Thecanonical momenta are

P ab = ∂L/∂q′ab

The Hamiltonian is thenH = P abq′ab − L

1.3. LDP FORYANG-MILLS GAUGE FIELDS AND LDP FORTHE QUANTIZEDMETRIC TENSOROF SPACE-TIME13

Note that the time derivatives N ′, Na′ of N and Na do not appear in L. QGTRis therefore described by a constrained Hamiltonian with the constraints being

PN = ∂L/∂N ′ = 0,

leading to the equation of motion

H = ∂L/∂N = 0

which is the Wheeler-De-Witt equation, the analog of Schrodinger’s equation inparticle non-relativistic mechanics. It should be interpreted as

H(N)ψ = 0∀N

where

H(N) =

N.Hd3x

The Large deviation problem: If there is a small random perturbation in theinitial conditions of the Einstein field equations, then what will be the statisticsof the perturbed spatial metric qab after time t ? This is the classical LDP. Thequantum LDP is that if there is a small random perturbation in the initial wavefunction of a system and also small random Hamiltonian perturbations as wellas small random time varying perturbations in the potentials appearing in theHamiltonian, then what will be the statistics of quantum averaged observables?

The Yang-Mills Lagrangian in curved space-time: Let Aµ = Aaµτa denotethe Lie algebra valued Yang-Mills connection potential field. Let

Dµ = ∂µ + Γµ

denote the spinor gravitational covariant derivative. Then the Yang-Mills fieldtensor is given by

Fµν = DµAν −DνAµ + [Aµ, Aν ]

The action functional of the Yang-Mills field in the background gravitationalfield is given by

SYM (A) =

gµαgνβTr(FµνFαβ)√−gd4x

Problem: Derive the canonical momentum fields for this curved space-timeYang-Mills Lagrangian and hence obtain a formula for the canonical Hamilto-nian density.

14CHAPTER 1. STABILITY THEORYOFNONLINEAR SYSTEMS BASED ONTAYLOR APPROXIMATIONSOF THE NONLINEARITY

1.4 String in background fields

The Lagrangian is

L(X,X,α, α = 1, 2) = (1/2)gµν(X)hαβXµ,αX

ν,β

+Φ(X) +Hµν(X)ǫαβXµ,αX

ν,β

where a conformal transformation has been applied to bring the string worldsheet metric hαβ to the canonical form diag[1,−1]. Here, Φ(X) is a scalar field.The string action functional is

S[X ] =

Ldτdσ

The string field equations

∂α∂L/∂Xµ,α − ∂L/∂Xµ = 0

are

∂α(gµν(X)hαβXν,β)+ǫ

αβ(Hµν(X)Xν,β),α = (1/2)gρσ,µ(X)habXρ

,aXσ,b+Φ,µ(X)+Hρσ,µ(X)ǫabXρ

,aXσ,b

or equivalently,

gµν,ρ(X)habXρ,aX

ν,b + gµν(X)habXν

,ab + ǫabHµν,ρ(X)Xρ,aX

ν,b + ǫabHµν(X)Xν

,ab

= (1/2)gρσ,µ(X)habXρ,aX

σ,b +Φ,µ(X) +Hρσ,µ(X)ǫabXρ

,aXσ,b

These differential equations can be put in the form

ηabXµ,ab = δ.Fµ(Xρ, Xρ

,a, Xρ,ab)

where δ is a small perturbation parameter and ((ηab)) = diag[1,−1] The prob-lem is to derive the correction to string propagator caused by the nonlinearperturbation term. The propagator is

Dµν(τ, σ|τ ′, σ′) =< 0|T (Xµ(τ, σ).Xν(τ ′, σ′))|0 >

and we see that it satisfies the following fundamental propagator equations: let

x = (τ, σ), x′ = (τ ′, σ′), ∂0 = ∂τ , ∂1 = ∂σ

Then,∂0D

µν(x|x′) = δ(x0 − x′0) < [Xµ(x), Xν(x′)] >

+ < T (Xµ,0(x).X

ν(x′)) >

=< T (Xµ,0(x).X

ν(x′) >

and likewise

∂20Dµν(x|x′) = δ(x0 − x

′0) < [Xµ,0(x), X

ν(x′)] >

1.4. STRING IN BACKGROUND FIELDS 15

+ < T (Xµ,00(x).X

ν(x′)) >

= δ(x0 − x′0) < [Xµ

,0(x), Xν(x′)] >

+∂21Dµν(x|x′) + δ. < T (Fµ(X,X,aX,b)(x).X

ν(x′)) >

or equivalentlyηab∂a∂bD(x|x′) =

δ. < T (Fµ(X,X,aX,b)(x).Xν(x′)) >

+δ(x0 − x′0) < [Xµ

,0(x), Xν(x′)] > −−−(1)

Now consider the canonical momenta

Pµ(x) = ∂L/∂Xµ,0 = gµν(X(x))Xν

,0(x)

−2Hµν(X(x))Xν,1(x)

Using the canonical equal time commutation relations

[Xµ(τ, σ), Xν(τ, σ′)] = 0

so that[Xµ(τ, σ), Xν

,1(τ, σ′)] = 0

and[Xµ(τ, σ), Pν(τ, σ

′)] = iδνµδ(σ − σ′)

we therefore deduce that

[Xµ(τ, σ), gνρ(X(τ, σ′))Xρ,0(τ, σ

′)] = iδνµδ(σ − σ′)

or equivalently,

[Xµ(τ, σ), Xν,0(τ, σ

′)] = igµν(X(τ, σ))δ(σ − σ′)

and therefore (1) can be expressed as

Dµν(x|x′) = ηab∂a∂bDµν(x|x′) =

δ. < T (Fµ(X,X,aX,b)(x).Xν(x′)) >

−i < gµν(X(x)) > δ2(x− x′)−− − (2)

Now suppose we interpret the quantum string field Xµ as being the sum of alarge classical component Xµ

0 (τ, σ) and a small quantum fluctuating componentδXµ(τ, σ), then we can express the above exact relation as

Dµν(x|x′) = Xµ0 (x)X

ν0 (x

′)+ < T (δXµ(x).δXν(x′)) >

and further we have approximately upto first order of smallness

Fµ(X,X ′, X ′′) = Fµ(X0, X0,a, X0,b)+Fµ,1ρ(X0, X

′0, X

′′0 )δX

ρ+Fµ,2ρa(X,X′, X ′′)δXρ

,a+Fµ,3ρab(X,X

′, X ′′)δXρ,ab

16CHAPTER 1. STABILITY THEORYOFNONLINEAR SYSTEMS BASED ONTAYLOR APPROXIMATIONSOF THE NONLINEARITY

and hence upto second orders of smallness, (2) becomes

(Xµ0 (x)X

ν0 (x

′)) + < T (δXµ(x).δXν(x′)) >

= δFµ(X0, X0,a, X0,b)(x)Xν0 (x

′)

+δ.Fµ,1ρ(X0, X′0, X

′′0 )(x) < T (δXρ(x)δXν(x′)) >

+δ.Fµ,2ρa(X,X′, X ′′)(x) < T (δXρ

,a(x).δXν(x′)) >

+δ.Fµ,3ρab(X,X′, X ′′)(x) < T (δXρ

,ab(x).δXν(x′)) >

−i < gµν(X(x)) > δ2(x− x′)

To proceed further, we require the dynamics of the quantum fluctuation partδXµ. To get this, we apply first order perturbation theory to the system

ηabXµ,ab = δ.Fµ(Xρ, Xρ

,a, Xρ,ab)

to getηabXµ

0,ab = 0

ηabδXµ,ab = δ.Fµ(X0, X

′0, X

′′0 )

and solve this to express δXµ as the sum of a classical part (ie a particularsolution to the inhomogeneous equation) and a quantum part (ie the generalsolution to the homogeneous equation). The quantum component is

δXµq (τ, σ) = −i

n6=0

(aµ(n)/n)exp(in(τ − σ))

where[aµ(n), aν(m)] = nηµνδ[n+m]

Note that in particular, Xµ0 (x) = 0 and hence the approximate corrected

propagator equation becomes

< T (δXµ(x).δXν(x′)) >

= δFµ(X0, X0,a, X0,b)(x)Xν0 (x

′)

+δ.Fµ,1ρ(X0, X′0, X

′′0 )(x) < T (δXρ(x)δXν(x′)) >

+δ.Fµ,2ρa(X,X′, X ′′)(x) < T (δXρ

,a(x).δXν(x′)) >

+δ.Fµ,3ρab(X,X′, X ′′)(x) < T (δXρ

,ab(x).δXν(x′)) >

−igµν(X0(x))δ2(x− x′)

provided that we neglect the infinite term

gµν,αβ(X0) < δXα(x)δXβ(x) >

1.4. STRING IN BACKGROUND FIELDS 17

if we do not attach the perturbation tag δ to the F -term, ie, we do not regardF as small, then the unperturbed Bosonic string X0 satisfies

Xµ0 = Fµ(X0, X

′0, X

′′0 )

and then the corrected propagator Dµν satisfies the approximate equation

Dµν(x|x′) =

+δ.Fµ,1ρ(X0, X′0, X

′′0 )(x) < T (δXρ(x)δXν(x′)) >0

+δ.Fµ,2ρa(X,X′, X ′′)(x) < T (δXρ

,a(x).δXν(x′)) >0

+δ.Fµ,3ρab(X,X′, X ′′)(x) < T (δXρ

,ab(x).δXν(x′)) >0

−iηµνδ2(x− x′)

provided that we assume that the metric is a very small perturbation of thatof Minkowski flat space-time. Here, < . >0 denotes the unperturbed quantumaverage, ie, the quantum average in the absence of the external field Hµν andalso in the absence of metric perturbations of flat space-time. We write

Dµν0 (x|x′) =< T (δXµ(x).δXν(x′)) >0

Now observe that in the absence of such perturbations,

δXµ(x) = −i∑

n6=0

(aµ(n)/n)exp(in(τ − σ))

so< T (δXµ(x).δXν(x′)) >0=

−∑

n>0

(θ(τ − τ ′)/n)exp(−in(τ − τ ′ − σ + σ′))

−∑

n>0

(θ(τ ′ − τ)/n).exp(−in(τ ′ − τ − σ′ + σ))

< T (δXµ,0(x).δX

ν(x′)) >0=

−∑

n>0

(θ(τ − τ ′)/n)(−in)exp(−in(τ − τ ′ − σ + σ′))

−∑

n>0

(θ(τ ′ − τ)/n).(in)exp(−in(τ ′ − τ − σ′ + σ))

= i∑

n>0

(θ(τ − τ ′)exp(−in(τ − τ ′ − σ + σ′))

−i∑

n>0

(θ(τ ′ − τ)exp(−in(τ ′ − τ − σ′ + σ))

< T (δXµ,1(x).δX

ν(x′)) >0=

18CHAPTER 1. STABILITY THEORYOFNONLINEAR SYSTEMS BASED ONTAYLOR APPROXIMATIONSOF THE NONLINEARITY

−∑

n>0

(θ(τ − τ ′)/n)(in)exp(−in(τ − τ ′ − σ + σ′))

−∑

n>0

(θ(τ ′ − τ)/n).(−in)exp(−in(τ ′ − τ − σ′ + σ))

= −i∑

n>0

(θ(τ − τ ′)exp(−in(τ − τ ′ − σ + σ′))

+i∑

n>0

(θ(τ ′ − τ)exp(−in(τ ′ − τ − σ′ + σ))

= − < T (δXµ,0(x).δX

ν(x′)) >0

Note that we have the alternate expression

< T (δXµ,1(x).δX

ν(x′)) >0=

∂1Dµν0 (x|x′)

because differentiation w.r.t the spatial variable σ commutes with the time or-dering operation T . Further,

< T (δXµ,00(x).δX

ν(x′)) >0

=< T (δXµ,11(x).δX

ν(x′)) >0

= ∂21Dµν0 (x|x′)

< T (δXµ,01(x).δX

ν(x′)) >0

= ∂1 < T (δXµ,0(x).δX

ν(x′)) >0

= ∂1[i∑

n>0

(θ(τ − τ ′)exp(−in(τ − τ ′ − σ + σ′))

−i∑

n>0

(θ(τ ′ − τ)exp(−in(τ ′ − τ − σ′ + σ))] =

i∑

n>0

(θ(τ − τ ′)(in)exp(−in(τ − τ ′ − σ + σ′))

−i∑

n>0

(θ(τ ′ − τ)(−in)exp(−in(τ ′ − τ − σ′ + σ)) =

−∑

n>0

(θ(τ − τ ′)n.exp(−in(τ − τ ′ − σ + σ′))

−∑

n>0

(θ(τ ′ − τ)n.exp(−in(τ ′ − τ − σ′ + σ))

Chapter 2

String theory and largedeviations

2.1 Evaluating quantum string amplitudes using

functional integration

Assume that external lines are attached to a given string at space-time points(τr, σr), r = 1, 2, ...,M . Note that the spatial points σr are fixed while theinteraction times τr vary. Let Pr(σ) denote the momentum of the rth externalline at σ. At the point σ = σr where it is attached, this momentum is specifiedas Pr(σr). Thus noting that

X(τ, σ) = −i∑

n6=0

a(n)exp(in(τ − σ))/n

we have

Pr(σr) = X,τ (σr) =∑

n6=0

a(n)exp(−inσr)

This equation constrains the values of a(n). We thus define the numbers Prnby

Pr(σr) =∑

n

Prnexp(−inσr)−−− (1)

and the numbers Prn characterize the momentum of the external line attachedat the point σr. Note that for reality of the momentum, we must have Prn =Pr,−n if we assume that the P ′

rns are real and therefore

Pr(σr) = 2∑

n>0

Prncos(nσr)

19

20 CHAPTER 2. STRING THEORY AND LARGE DEVIATIONS

Note that Pr(σ) is the momentum field corresponding to the rth external lineand this is given to us. Thus, the P ′

rns are determined by the equation

Pr(σ) = 2∑

n>0

Prncos(nσ)

Equivalently, we fix the numbers Prnr,n corresponding to the momenta carriedby the external lines and then obtain the momentum Pr(σr) of the line attachedat σr using (1).

The large deviation problem: Express the Hamiltonian density of the stringhaving Lagrangian density (1/2)∂αX

µ∂αXµ in terms of Xµ, Pµ = ∂0Xµ. Show

that it is given by

H0 = (1/2)PµPµ + (1/2)∂1Xµ∂1Xµ

In the presence of a vector potential field Aµ(σ), the charged string field has theHamiltonian density H with Pµ replaced by Pµ + eAµ. Write down the stringfield equations for this Hamiltonian and assuming that the vector potential is aweak Gaussian field, evaluate the rate function of the string field Xµ.

LDP in general relativistic fluid dynamics.

2.2 LDP for Poisson fields

Small random perturbations in the metric field having a statistical model ofthe sum of a Gaussian and a Poisson field affect the fluid velocity field. LetX1(x), x ∈ Rn be a zero mean Gaussian random field with correlation R(x, y) =E(X1(x)X1(y)) and let N(x), x ∈ R

n be a Poisson random field with meanE(N(dx)) = λ(x)dx. Consider the family of random Poisson measure Nǫ(E) =ǫaN(ǫ−1E) for any Borel subset E of Rn. We have

E[exp(

Rn

f(x)N(dx))]

= exp(

Rn

λ(x)(exp(f(x)) − 1)dx)

Thus,

Mǫ(f) = Eexp(

f(x)Nǫ(dx))

= exp(

ǫaf(x)N(dx/ǫ))) = exp(

ǫaf(ǫx)N(dx))

= exp(

λ(x)(exp(ǫaf(ǫx))− 1)dx)

Thus,

log(Mǫ(ǫ−af)) =

λ(x)(exp(f(ǫx)) − 1)dx

2.3. LDP FOR STRINGS PERTURBEDBYMIXTURE OFGAUSSIAN AND POISSONNOISE21

= ǫ−n∫

λ(ǫ−1x)(exp(f(x)) − 1)dx

Then,limǫ→0ǫ

n.log(Mǫ(ǫ−af))

= λ(∞)

(exp(f(x)) − 1)dx

assuming thatλ(∞) = limǫ→0λ(x/ǫ)

exists. More generally, suppose we assume that this limit exists and dependsonly on the direction of the vector x ∈ Rn so that we can write

λ(∞, x) = limǫ→0λ(x/ǫ)

then we get for the Gartner-Ellis limiting logarithmic moment generating func-tion

limǫ→0ǫn.log(Mǫ(ǫ

−af))

=

λ(∞, x)(exp(f(x)) − 1)dx = Λ(f)

2.3 LDP for strings perturbed by mixture ofGaussian and Poisson noise

Consider the string field equations

Xµ(σ) = fµk (σ)√ǫW k(τ) + gµk ǫdNk(τ)/dτ

whereW k are independent white Gaussian noise processes and Nk are indepen-dent Poisson processes. Assume that the rate of Nk is λk/ǫ. Calculate then theLDP rate functional of the string process Xµ. Consider now a superstring withLagrangian density

L = (1/2)∂αXµ∂αXµ + ψρα∂αψ

Add to this an interaction Lagrangian

ǫαβBµν(X)Xµ,αX

ν,β

an another interaction Lagrangian

Jα(X)ψραψ

Assume that Bµν(X) and Jα(X) are small Gaussian fields. Calculate the jointrate function of the Bosonic and Fermionic string fields Xµ, ψ.

22 CHAPTER 2. STRING THEORY AND LARGE DEVIATIONS

Chapter 3

ordinary, partial andstochastic differentialequation models forbiological systems,quantumgravity, Hawking radiation,quantum fluids,statisticalsignal processing, grouprepresentations, largedeviations in statisticalphysics

3.1 A lecture on biological aspects of virus at-tacking the immune system

We discuss here a mathematical model based on elementary fluid dynamics andquantum mechanics, how a virus attacks the immune system and how antibodiesgenerated by the immune system of the body and the vaccination can wage acounter attack against this virus thereby saving the life. According to recent

23

24CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

researches, a virus consists of a spherical body covered with dots/rods/globsof protein. When the virus enters into the body, as such it is harmless butbecomes dangerous when it lodges itself in the vicinity of a cell in the lung wherethe protein blobs are released. The protein blobs thus penetrate into the cellfluid where the ribosome gives the command to replicate in the same way thatnormal cell replication occurs. Thus a single protein blob is replicated severaltimes and hence the entire cell is filled with protein and thus gets destroyedcompletely. When this happens to every cell in the lung, breathing is hamperedand death results. However, if our immune system is strong enough, then theblood generates antibodies the moment it recognizes that a stranger in the formof protein blobs has entered into the cell. These antibodies fight against theprotein blobs and in fact the antibodies lodge themselves on the boundary ofthe cell body at precisely those points where the protein from the virus haspenetrated thus preventing further protein blobs from entering into the cell.Whenever this fight between the antibodies and the virus goes on, the bodydevelops fever because an increase in the body temperature disables the virusand enables the antibodies to fight against the proteins generated by the virus.If however the immune system is not powerful enough, sufficient concentration ofantibodies is not generated and the viral protein dominates over the antibodiesin their fight finally resulting in death of the body. The Covid vaccine is preparedby generating a sample virus in the laboratory within a fluid like that presentwithin the body and hence antibodies are produced within the fluid by thisartifical process in the same way that the small-pox vaccine was invented byEdward Jenner by injecting the small-pox virus into a cow to generate therequired antibodies. Now we discuss the mathematics of the virus, protein, cellsand antibodies.

Assume that the virus particles have massesm1, ...,mN and charges e1, ..., eN .The blood fluid has a velocity field v(t, r) whose oxygenated component cir-culates to all the organs of the body and whose de-oxygenated component isdirected toward the lungs. This circulation is caused by the pumping of theheart. Assume that the heart pumps blood with a generation rate g(t, r) havingunits of density per unit time or equivalently mass per unit volume per unittime. Assume also that the force of this pumping is represented by a forcedensity field f(t, r). Also there may be present external electromagnetic fieldsE(t, r),B(t, r) within the blood field which interact with the current field withinthe blood caused by non-zero conductivity σ of the blood. This current densityis according to Ohm’s law given by

J(t, r) = σ(E(t, r) + v(t, r) ×B(t, r))

The relevant MHD equations and the continuity equation are given by

ρ(∂tv + (v,∇)v) = −∇p+ η∇2v + f + σ(E + v ×B)×B

∂tρ+ div(ρ.v) = g

Sometimes a part of the electromagnetic field within the blood is of internalorigin, ie, generated by the motion of this conducting blood fluid. Thus, if we

3.1. A LECTUREON BIOLOGICALASPECTS OF VIRUS ATTACKING THE IMMUNE SYSTEM25

write Eext, Bext for the externally applied electromagnetic field and Eint, Bintfor the internally generated field, then we have

Eint(t, r) = −∇φint − ∂tAint, Bint = ∇×Aint

where

Aint(t, r) = (µ/4π)

J(t− |r − r′|/c, r′)d3r′/|r − r′|,

Φint(t, r) = −c2∫ t

0

divAintdt

The total electromagnetic field within the blood fluid is the sum of these twocomponents:

E = Eext + Eint, B = Bext +Bint

We have control only over Eext, Bext and can use these control fields to ma-nipulate the blood fluid velocity appropriately so that the virus does not enterinto the lung region. Let rk(t) denote the position of the kth virus particle. Itsatisfies the obvious differential equation

drk(t)/dt = v(t, rk(t))

with initial conditions rk(0) depending on where on the surface of the bodythe virus entered. If the virus is modeled as a quantum particle with nucleusat the classical position rk(t) at time t, then the protein molecules that arestuck to this virus surface have relative positions (rkj , j = 1, 2, ...,M and theirjoint quantum mechanical wave function ψk(t, rkj , j = 1, 2, ...,M) will satisfythe Schrodinger equation

ih∂tψk(t, rkj , j = 1, 2, ...,M) =

M∑

j=1

(−h2/2mkj)∇2rkjψk(t, rkj , j = 1, 2, ...,M)

+

M∑

j=1

ekVk(t, rkj − rk(t))ψk(t, rkj , j = 1, 2, ...,M)

where Vk(t, r) is the binding electrostatic potential energy of the viral nucleus.There may also be other vector potential and scalar potential terms in the aboveSchrodinger equation, these potentials coming from the prevalent electromag-netic field. Some times it is more accurate to model the entire swarm of virusesas a quantum field rather than as individual quantum particles but the result-ing dynamics becomes more complicated as it involves elaborate concepts fromquantum field theory. Suppose we have solved for the joint wave functions ψkof the protein blobs on the kth virus. Then the average position of the lth blobon the kth viral surface is given by

< rkl > (t) =

rklψk(t, rkj , j = 1, 2, ...,M)ΠMj=1d3rkj , 1 ≤ k ≤M

26CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

We consider the time at which these average positions of the lth protein blob onthe kth viral surface hits the boundary S = ∂V of a particular cell. This timeis given by

τ(k, l) = inft ≥ 0 :< rkl > (t) ∈ SAt this time, replication of this protein blob due to cell ribosome action begins.Let λ(n)dt denote the probability that in the time interval [t, t+ dt], n proteinblobs will be generated by the ribosome action on a single blob. Let Nkl(t)denote the number of blobs due to the (k, l)th one present at time t. Then for

Pkln(t) = Pr(Nkl(t) = n), n = 1, 2, ...

we have the Chapman-Kolmogorov Markov chain equations

dPkln(t)/dt =∑

1≤r<nPklr(t)λ(kl, r, n)− Pkln(t)

1≤r<nPkln(t)

where λ(kl, r, n)dt is the probability that in time [t, t + dt], n blobs will begenerated from r blobs. Clearly

λ(klr, n) = rλ(n − r)

because each protein blob generates blobs indepndently of the others and hencethe probability of two blobs generating more than one each is zero. In an-other discrete time version of blob replication we can make use of the theory ofbranching processes.

Let X(n) denote the number of blobs at time n. At time n + 1, each blobsplits into several similar blobs, each splitting being independent of the others.let ξ(n, r) denote the number of blobs into which the rth blob splits. Then, wehave the obvious relation

X(n+ 1) =

X(n)∑

r=1

ξ(n, r)

Hence, ifFn(z) = E(zX(n)

is the moment generating function of X(n) and if

φ(z) = E(zξ(n,r))

is the moment generating function of the number of blobs into which each blobsplits, then we get the recursion

Fn+1(z) = E(φ(z)X(n)) = Fn(φ(z))

In order to mimick this dynamics in the continuous time situation, we assumethat there are X(t) blobs at time t. Each blob independently of the others,waits for a time T that has a probability density f(t) and then replicates into

3.1. A LECTUREON BIOLOGICALASPECTS OF VIRUS ATTACKING THE IMMUNE SYSTEM27

N blobs. Usually, we assume f(t) to be the exponential density. The aim is tocalculate the moment generating function F (t, z) = E(zX(t)) of the number ofblobs at time t. Assuming f to be exponential with parameter λ, we find thatX(t+ dt) will be the

The effect of quantum gravity on the motion of the virus globules: Sincethe size of the virus is on the nano/quantum scale, the effects of gravity on itmotion will correspondingly be quantum mechanical effects. Consider the ADMaction for quantum gravity obtained as the integral

S[qab, N,Na] =

L(qab, q′ab, N,N

a)d4x

where qab are the spatial components of the metric tensor in the embedded3-dimensional space at each time t and N,Na appear in the Lagrangian butnot their time derivatives. The qab are six canonical position fields and thecorresponding canonical momentum fields are

P ab = ∂L/∂q′ab

The ADM Hamiltonian is

H =

(P abq′ab − L)d3x

and these turn out to be fourth degree polynomials in the momenta but highlynonlinear in the position fields. Let eia denote a tetrad basis for the threedimensional embedded manifold. Thus,

qab = eiaeib

The metric relative to this tetrad basis is therefore the identity δij and the groupof transformations of this tetrad that preserve the metric is therefore SO(3)which can be regarded as SU(2) using the adjoint map. Note that SU(2) is infact the covering group of SO(3). The spin connection involves representing thespatial connection in the SU(2)-spin Lie algebra representation. The form ofthis connection can in terms of the spatial connection Γabc can be derived fromthe fact that the corresponding covariant derivative Db should annihilate thetetrad eia. Thus if Γ

ijb denotes the spin connection, then

0 = Dbeia = eia,b − Γcbae

ic + Γijbe

ja = 0

or equivalently,Γijb = −eaj (eia,b − Γcbae

ic) = −eajeia:b

where a double dot denotes the usual spatial covariant derivative. Equivalently,we can lower the index i using the identity SO(3) metric and write

Γijb = −eajeia:b = eaj:beia = eja:beai = −Γjib

28CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

Thus, the spin connection Γija is antisymmetric w.r.t (ij) and hence can beexpressed as

Γija = Γka(τk)ij = Γka(τk)ij

where τi, i = 1, 2, 3 is the standard basis for the Lie algebra of SO(3) whichsatisfy the usual angular momentum commutation relations:

[τi, τj ] = ǫ(ijk)τk

Note that (τi)jk = ǫ(ijk) upto a proportionality constant. We observe that thespatial connection coefficients of the embedded 3-D manifold are

Γabc = (1/2)(qab,c + qac,b − qbc,a)

and the spatial curvature tensor is constructed out of these in the usual way.Our aim is to develop the calculus of general relativity in a form that resemblesthe calculus of the Yang-Mills non-Abelian gauge theories. This means that wemust introduce a non-Abelian SO(3) connection Γ = Γiτi so that the curvaturetensor has the representation

R = dΓ + Γ ∧ Γ = [d+ Γ, d+ Γ]

where Γ is an SU(2)-Lie algebra valued one form. Note that

qab,c = (eiaeib),c = eia,ce

ib + eiae

ib,c

In fact, Cartan’s equations of structure precisely formulate the expression forthe curvature tensor in the language of non-Abelian gauge theory. We write

Γk = Γkadxa

or equivalently,Γ = Γkτk = Γkaτkdx

a

and hence

R = dΓ + [Γ,Γ] = Γka,bτkdxb ∧ dxa + ΓkaΓ

lb[τk, τl]dx

a ∧ dxb

= (Γma,b − ΓkaΓlbǫ(klm))τmdx

b ∧ dxa

= Rmbaτmdxb ∧ dxa

whereRmba = (1/2)(Γma,b − Γmb,a)− ΓkaΓ

lbǫ(klm)

is the Riemann curvature tensor in the spin representation.When the metric fluctuation components δqab, δN

a, δN are quantum fieldsis a coherent state of the gravitational field, we can write the corresponding fluc-tuations in the space-time components of the metric tensor along the followinglines:

gµν = qµν +N2nµnν

3.1. A LECTUREON BIOLOGICALASPECTS OF VIRUS ATTACKING THE IMMUNE SYSTEM29

whereqµν = qabX

µ,aX

ν,b

T µ = Xµ,0 = Nµ +N.nµ = NaXµ

,a +N.nµ

where Na, N satisfy the orthogonal relations

gµν(Tµ −NaXµ

,a)Xν,b = 0,

gµνnµnν = 1

We can equivalently write

gµν = qabXµ,aX

ν,b +N2(Xµ

,0 −NaXµ,a).(X

ν,0 −N bXν

,b)

When the metric gµν fluctuates, it results correspondingly in the fluctuation ofqab and N,Na (these are totally ten in number corresponding to the fact thatthe original metric tensor gµν has ten components). The coordinate mappingfunctions Xµ(x) are assumed to be fixed and non-fluctuating in this formalism.Thus, the fluctuations in the metric tensor gµν can be expressed in terms of thecorresponding fluctuations in qab, N

a, N as

δgµν = δqabXµ,aX

ν,b

+2N.δN.(Xµ,0 −NaXµ

,a).(Xν,0 −N bXν

,b)

−N2δNaXµ,a(X

ν,0 −N bXν

,b)

−N2((Xµ,0 −NaXµ

,a)Xν,bδN

b

Given the metric quantum fluctuations, the corresponding quantum fluctuationsin the equation of motion of the virus particle will be

d2δXµ/dτ2 + 2Γµαβ(x)δXαdXα/dτ

δΓµαβ(X)(dXα/dτ).(dXβ/dτ)

Γµαβ,ρ(X)δXρ(dXα/dτ).(dXβ/dτ) = 0

The fluctuations in the particle position are quantum observables evolving intime. According to the Heisenberg picture, these observables evolve dynamicallyin time according to the above equation with the state of the gravitationalfield being prescribed. So solving the above equation for δXµ(τ) in terms ofthe metric fluctuation field δgµν(X), we can in principle evaluate the mean,correlation and higher order moments of the particle position field in the givenstate of the gravitational field. Once the mean and correlations of the virusposition are known, we can calculate the probability dispersion of this positionat the first time that the mean position of this virus first hits a given cell. Thisdispersion gives us an estimate of what range relative to its mean position will beaffected by the virus. We could also give a classical stochastic description of the

30CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

dynamics of the virus but since the virus size is also of the atomic dimensions,such a quantum probabilistic description is more accurate.

A cruder model for the virus motion would be to express its trajectory asthe sum of a classical and a quantum trajectory in accordance with Newton-Heisenberg equation of motion with a force that is the sum of a classical com-ponent and a purely quantum fluctuating component:

d2/dt2(r(t)) = f0(r(t)) + δf(r(t))

Solving this differential equation using first order perturbation theory,

r(t) = r0(t) + δr(t)

with r0(t) a purely classical trajectory and δr(t) a quantum, ie, operator valuedtrajectory, we get

d2r0(t)/dt2 = f0(r0(t)),

d2δr(t)/dt2 = f ′0(r0(t))δr(t) + δf(r0(t))

Here δf(r) is an operator valued field and f0(r) is a classical c-number field.Expressing the first order perturbed trajectory in state variable form gives us

dδr(t)/dt = δv(t), dδv(t)/dt = f ′0(r0(t))δr(t) + δf(r0(t))

or equivalentlyd

dt

(

δr(t)δv(t)

)

=

(

0 I

f ′0(r0(t)) 0

)(

δr(t)δv(t)

)

+

(

0

δf(r0(t))

)

Now we can also include damping terms in this model by allowing the force tobe both position and velocity dependent:

d2r0(t)+δr(t))/dt2 = f0(r0(t)+δr(t), r

′0(t)+δr

′(t))+δf(r0(t)+δr(t), r′0(t)+δr

′(t))

which gives using first order perturbation theory,

d2r0(t)/dt2 = f0(r0(t), r

′0(t)),

d2δr(t)/dt2 = f0,1(r0(t), r′0(t))δr(t) + F0,2(r0(t), r

′0(t))δr

′(t) + δf(r0(t), r′0(t))

After introducing the state vector as above, and solving the linearized equationsusing the usual state transition matrix obtained in terms of the Dyson seriesexpansion, we get

δr(t) =

∫ t

0

Φ(t, s)δf(r0(s), r′0(s))ds

3.2. MORE ON QUANTUM GRAVITY 31

where Φ(t, s) is a classical c-number function and δf(r,v) is a quantum operatorvalued function of phase space variables, ie, a quantum field on R6.

MATLAB simulation studies:[1] Simulation of branching processes in discrete time.[2] Simulation of blood flow in an electromagnetic field.Continuous time branching processes: Let X(t) denote the number of par-

ticles at time t. Then X(t + dt) will equal X(t) + ξ(dt) where conditioned onX(t), ξ(dt) takes the value r with probability X(t)λ(r)dt for r = 1, 2, ... and thevalue 0 with probability 1−X(t)

r λ(r)dt. This is because, each particle hasthe probability ξ(r)dt of giving birth to r offspring. This gives us the recursion

E(zX(t+dt)) =∑

r≥1

E(zX(t)zrX(t))λ(r)+

+E((1−∑

r

X(t)λ(r)dt)zX(t))

So writingF (t, z) = E(zX(t)

we get

F (t+ dt, z) = φ(z)z∂F (t, z)/∂z + F (t, z)− z(∂F (t, z)/∂z)φ(1)

whereφ(z) =

r

λ(r)zr

and hence we derive the following differential equation for the evolution of thegenerating function of the number of particles at time t:

∂F (t, z)/∂t = z(φ(z)− φ(1))∂F (t, z)/∂z

3.2 More on quantum gravity

LetEai =

det(q)eai , q = ((qab))

Then,

det(E) = det((Eai )) = det((Eia))−1 = det(q)3/2det(e)−1 = det(q)

Note that,

qab = eiaeib, det(q) = det(e)2,

det(q) = det(e) = det((eia))

Db(Eai ) = 0

follows fromDbe

ia = 0

32CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

and the expression

Eai = det(e)eai = ǫ(a1, a2, a3))e1a1e

2a2e

3a3e

ai

Note also that defining

F ia = ǫ(aa2a3)ǫ(ii2i3)ea2i2ea3i3

we geteai F

ia = ǫ(a1a2a3)ǫ(i1i2i3)e

a1i1ea2i2 e

a3i3

= 6.det((eai )) = 6.det(e)−1

The factor of 6 comes because of the identity

ǫ(i1i2i3)ǫ(i1i2i3) = 6

More generally,ebiF

ia = ǫ(aa2a3)ǫ(i1i2i3)e

bi1e

a2i2ea3i3

= ǫ(aa2a3)ǫ(ba2a3)det(e)−1 = δba.6.det(e)

−1

Thus,F ia = 6.det(e)−1eia

Taking the inverse gives

F ai = 6−1det(e)eai = 6−1.Eai

or equivalently,Eai = 6.F ai

We can derive another alternate expression for F ai or equivalently for Eai . Let

Gai = ǫ(aa2a3)ǫ(ii2i3)ei2a2e

i3a3

Then,eibG

ai = ǫ(aa2a3)ǫ(ba2a3)det(e) = 6δab .det(e)

ThusGai = 6.eai det(e) = 6.Eai

In particular,DaE

ai = 0

or equivalently,(√qeai ),a − ΓjiaE

aj = 0

Note that this is the same as

√q(eai:a − Γjiae

aj ) = 0

which is the same as √qDae

ia = 0

3.3. LARGE DEVIATION PROPERTIESOF VIRUS COMBATWITH ANTIBODIES33

ie

Daeia = 0

Substituting for the spin connection

Γjia = Γjia = Γka(τk)ji = Γkaǫ(kji)

we get

Eai,a − Γkaǫ(kji)Eaj = 0−−− (1)

In other words, the covariant derivative of Eai w.r.t the spin connection Γkavanishes. This result is of fundamental importance because it states that theseposition fields Eai which are called the Ashtekar position fields, are covariantlyflat w.r.t the spin connection. We wish to express the SO(3) spin connectionΓka in terms of the Ashtekar canonical position fields Eai . To this end, we notethat (1) can be expressed as

Γkaǫ(kji) =

Γabc = (1/2)((eiaeib),c + (eiae

ic),b − (eice

ib),a)

References[1] Thomas Thiemann, ”Modern Canonical Quantum General Relativity”,

Cambridge University Press.[2] M.Green, J.Schwarz and E.Witten, ”Superstring Theory”, Cambridge

University Press.[3] Vijay Upreti, G.Sagar, H.Parthasarathy and V.Agarwal, Technical re-

port, NSUT on mathematical models for the Corona virus and their MATLABsimulation.

3.3 Large deviation properties of virus combat

with antibodies

A virus is a small spherical blob around which the blood fluid moves. Let n(t, r)denote the concentration in terms of number of virus blobs per unit volume andlet g(t, r) denote the net rate of virus generation per unit volume. The virusgradient current density is given by

Jv(t, r) = −D∇n(t, r)

and hence the conservation law for the virus number is given by

divJv + ∂tn = g

or equivalently,

∂tn(t, r) −∇2n(t, r) = g(t, r)

34CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

Now let us take the drift current also into account caused by the presence ofcharge on the virus body interacting with the external electric field/ Let v(t, r)denote the velocity of the fluid in which the viurs particles are immersed. Then,if E(t, r) is the electric field and µ the virus mobility, we have that the totaldrift plus diffusion current density is given by

J = nµE −D∇n

and its conservation law is given by

divJ + ∂tn = g

or equivalently,∂tn−D∇2n+ µndivE + µE.∇n = g

Let q denote the charge carried by a single virus particle. Then the virus chargedensity is given by

ρ = qn

and by Gauss’ law,divE = −ρ/ǫ

so our equation of continuity for the virus particles is given by

∂tn−D∇2n+ (µq/ǫ)n2 = g

This is the pde satisfied by the virus number density. If g(t, r) is the sum of arandom Gaussian field and a random Poisson field both of small amplitude, thenthe problem is to calculate the rate functional of the field n(t, r). Using thisrate functional, we can predict the probability of the rare event that the virusconcentration in the fluid will exceed the critical concentration level defined bythe solution to the above pde without the generation term by an amount morethan a given threshold. Note that the mobility/drift term for the current isobtained as follows: let v(t) denote the velocity of a given virus particle and mits mass, γ the viscous damping coefficient. Then, the equation of motion of thevirus particle is given by

mv′(t) = −γv(t) + qE(t)

and at equilibrium, v′ = 0 so that

v(t) = qE(t)/γ = µE(t), µ = q/γ

Now let us consider the motion of the virus particle taking into account randomfluctuating forces. Its equation of motion is

mdv(t) = −γv(t) + qE(t) + σ.dB(t)

where B is standard three dimensional Brownian motion. The solution to thisstochastic differential equation is given by

v(t) =

exp(−γ(t− s)/m)(q/m)E(s)ds+ σ

∫ t

0

exp(−γ(t− s)/m)dB(s)

3.3. LARGE DEVIATION PROPERTIESOF VIRUS COMBATWITH ANTIBODIES35

The corresponding total drift current density is given by

Jd(t, r) = nv = n(t, r)(

exp(−γ(t−s)/m)(q/m)E(s, r)ds+σ

∫ t

0

exp(−γ(t−s)/m)dB(s, r))

where now B(t, r) is a Gaussian random field, ie, a zero mean Gaussian fieldhaving correlations

E(B(t, r).B(s, r′)) = min(t, s)RB(r, r′)

Note that the pdf p(t, v) of v(t), ie, of a single virus particle satisfies the Fokker-Planck equation

∂tp(t, v) = γdiv(vp(t, v))− qE(t).∇vp(t, v) + (σ2/2)∂2p(t, v)

p(t, v)d3v gives us the probability that the virus particle will have a velocity inthe range d3v around v. n(t, r)p(t, v)d3rd3v is the number of particles havingposition in the range d3r around r and having a velocity in the range of d3varound v.

MATLAB simulations for virus dynamics based on a stochastic differentialequation model:

[a] Simulation of blood velocity flow: The heart pumping force field per unitvolume can be represented as

f(t, r) = f0(t)K(r)

where f0(t) is a periodic function and K(r) is a point spread function that isa smeared version of the Dirac δ-function at the point r0, the position of theheart. The equation of motion of a virus particle in this force field is given by

dr(t) = v(t)dt,mdv(t) = −γv(t)dt+ f(t, r(t))dt + σdB(t)−−(1)

The corresponding pdf is p(t, r, v) in position space. If n0 is the total numberof virus particles in the fluid, then n0p(t, r, v)d

3rd3v equals the total number ofvirus particles in the position range d3r and having velocity in the range d3vcentred around (r, v). It satisfies the Fokker-Planck equation

∂tp(t, r, v) = −v.∇rp(t, r, v)−f(t, r).∇vp(t, r, v)++γ.divv(v.p(t, r, v))+(σ2/2)∇2vp(t, r, v)−−−(2)

equations (1) and (2) can be discretized respectively in time and in space-timeand simulated. For example, simulation of (1) will be based on the discrete timemodel

r(t+ δ) = r(t) + δ.v(t),

v(t+ δ) = v(t)− (γδ/m)v(t) + f(t, r(t))δ + σ√δW (t)

where W (t), t = 1, 2, ... are iid standard normal R3-valued random vectors.

36CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

Simulation of blood flow in the body: The force exerted by the heart islocalized around the position of the heart and as such this force density fieldcan be represented as

f(t, r) = f0(t)K(r)

where f0(t) is a periodic function of time and K(r) is a smeared version ofthe Dirac δ-function centred around the heart’s position. The fluid-dynamicalequations to be simulated are

ρ(∂tv(t, r) + (v,∇)v(t, r)) = −∇p+ f + η∇2v

div(ρ.v) + ∂tρ = g

where ρ(t, r) is the virus density and g(t, r) is the rate of generation of virusmass per unit volume. Assuming an equation of state

p = F (ρ)

we get a set of four pde’s for the four fields ρ, v by replacing ∇p with F ′(ρ)∇ρ.Standard CFD (Computational fluid dynamics) methods can be used to dis-cretize this dynamics and solve it in MATLAB.

The spread of virus from a body within a room: Let C(t) denote the concen-tration of the virus at a certain point within the room. It increases due to thepresence of a living organism carrying the virus and decreases due to the pres-ence of ventilation, medicine etc. other effects. Thus, the differential equationfor its dynamics can be expressed in the form

dC(t)/dt = I(t)− β(X(t))C(t)

where I(t) denotes the rate of increase due to the external living organism andX(t) denotes the other factors like medicine, ventilation etc. Formally, we cansolve this equation to give

C(t) = Φ(t, 0))C(0) +

∫ t

0

Φ(t, s)I(s)ds

where

Φ(t, s) = exp(−∫ t

s

β(X(τ))dτ)

This dynamical equation can be generalized in two directions, one, by includinga spatial dependence of the virus concentration and describing the dynamicsusing a partial differential equation, and two, by including stochastic terms.Let C(t, r) denote the virus concentration in the room at the spatial location r

and at time t. The virus diffuses from a region of higher concentration to a regionof lower concentration in such a way that if D denotes the diffusion coefficient,the virus diffusion current density, ie, number of virus particles flowing per unitarea per unit time at (t, r) is given by

J(t, r) = −D.∇C(t, r)

3.3. LARGE DEVIATION PROPERTIESOF VIRUS COMBATWITH ANTIBODIES37

Let X(t, r) denote the external factors at the point r at time t. Its effect on thedecrease in the virus concentration at the point r′ at time t can be describedvia an ”influence function” F (r′, r′) which usually decreases with increase in thespatial distance |r′ − r|. So we can to a good degree of approximation, describethe concentration dynamics as

∂tC(t, r) = I(t, r)− divJ(t, r) −∫

F (r, r′)β(X(t, r′))C(t, r′)d3r′

= I(t, r) +D∇2C(t, r)−∫

F (r, r′)β(X(t, r′)C(t, r′)d3r′

In the particular case, if

F (r, r′) = δ3(r− r′)

so that the external agents influence the decrease in virus only at the pointwhere they are located, the above equation simplifies to

∂tC(t, r) = I(t, r) +D∇2C(t, r) − β(X(t, r))C(t, r)

This equation can be solved easily using Fourier transform methods. Moregenerally, we can consider a situation when the influence of external agentslocated at different points r1, ..., rn takes place via nonlinear interactions soVolterra interaction terms are added to the dynamics to give

∂tC(t, r) =

= I(t, r)+D∇2C(t, r)−∑

n≥1

Fn(r, r1, ..., rn)β(X(t, r1), ..., X(t, rn))C(t, r1)...C(t, rn)d3r1...d

3rn

Of course, this equation does not describe memory in the action of externalagents. To take into account such memory effects, we further modify it to

∂tC(t, r) =

= I(t, r) +D∇2C(t, r)

−∑

n≥1

τj≥0,j=1,2,...,n

Fn(r, r1, ..., rn)β(τ1, ..., τn, X(t− τ1, r1), ..., X(t− τn, rn))

.C(t− τ1, r1)...C(t − τn, rn)d3r1...d

3rndτ1...dτn

We could also consider a model which takes the delay in the effect of externalagents to be caused on account of the finite propagation velocity for the influenceof virus at the point r′ to take place at r. Virus at r′ has a concentration C(t, r′)at time t and its effect propagates at a speed of v so that its effect at time t atthe point r will be described by a term C(t − |r − r′|/v, r′). Our model thenassumes the form

∂tC(t, r) = I(t, r) +D∇2C(t, r)

38CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

−∑

n≥1

Fn(r, r1, ..., .rn)β(r1, ..., rn, X(t−|r−r1|/v, r1), ..., X(t−|r−rn|/v), rn)

.C(t− |r− r1|/v, r1)..C(t− |r− rn|/v, rn)d3r1..d3rn

Remark: Consider the wave equation pde with source and nonlinear terms:

∂2t F (t, r) − c2∇2F (t, r) + δ.

H(τ, r, F (t− τ, r′))dτd3r′ = s(t, r)

Solving this using first order perturbation theory gives

F (t, r) = F0(t, r) + δ.F1(t, r) +O(δ2)

where

∂2t F0(t, r) − c2∇2F0(t, r) = s(t, r),

∂2t F1(t, r) − c2∇2F1(t, r) = −∫

H(τ, r, F0(t− τ, r′))dτd3r′

The causal solution to this can be expressed in terms of retarded potentials:

F0(t, r) = −∫

s(t− |r− r′|/c, r′)d3r′/4π,

F1(t, r) = (4π)−1

H(τ, r′, F0(t− |r− r′|/c− τ, r′′))dτd3r′d3r′′

It is clear from this expression therefore that delay terms coming from the finitespeed of propagation of the virus effects can be described using second orderpartial differential equations in space-time like the wave equation with sourceterms.

Large deviation problems: Express the above pde as

∂tC(t, r) = L(C(t, r)) + s(t, r) + ǫ.w(t, r)

where L is a nonlinear spatio-temporal integro-differential operator and w is thenoise field. Note that the operatorL depends upon the external medicine/ventilationetc. fields X(t, r). So we can write L = L(X). The problem is to design theseexternal fields X so that the probability of the concentration field exceeding agiven threshold at some point in the room is a minimum. This probability canfor small ǫ be described using the LDP rate function.

3.4. LARGE DEVIATIONS PROBLEMS IN QUANTUM FLUID DYNAMICS39

3.4 Large deviations problems in quantum fluid

dynamics

The Navier-Stokes equation for an incompressible fluid can be expressed as

∂tv(t, r) = L(v)(t, r) + f(t, r)− ρ−1∇p(t, r)

div(v) = 0

where L is the nonlinear Navier-Stokes vector operator:

L(v) = −(v,∇)v + ηρ−1∇2v

We can writev = curlψ, divψ = 0,Ω = curlv = −∇2ψ

and then get

∂tΩ + curl(Ω× v) = ηρ−1∇2Ω+ g(t, r), g = curlf

Let G denote the Green’s function of the Laplacian. Then

∂tψ = −G.curl(∇2ψ × curlψ) + ηρ−1∇2ψ −G.g

Formally, we can express this as

∂tψ = F (ψ) + h

where now F is a nonlinear functional of the entire spatial vector field ψ andh = −G.g = −G.curlf is the modified force field. Here F (ψ) is a functional ofψ at different spatial points but at the same time t which is why after spatialdiscretization, it can be regarded as a nonlinear state variable system. Ideallyspeaking, we should therefore write

∂tψ(t, r) = F (r, ψ(t, r′), r′ ∈ D) + h(t, r)

where D is the region of three dimensional space over which the fluid flows.Note that

F (ψ) = −G.curl(∇2ψ × curlψ) + ηρ−1∇2ψ

Quantization: Introduce an auxiliary field φ(t, r) and define an action functional

S[ψ, φ] =

φ.partialtψd3rdt −

φ.F (ψ)d3rdt−∫

φ.hd3rdt

Note that∫

φ.∂tψd3rdt = (1/2)

(φ.∂tψ − ψ.∂tφ)d3rdt

φ.F (ψ)d3rdt = −∫

(Gφ).curl(∇2ψ × curlψ)d3rdt

40CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

=

curl(Gφ).(∇2ψ × curlψ)d3rdt

Quantization: The canonical momentum fields are

Pψ = δS/δ∂tψ = φ/2,

Pφ = δS/δ∂tφ = −ψ/2Then the Hamiltonian is given by

H =

(Pψ .∂tψ + Pφ.∂tφ)d3r − S

=

φ.F (ψ)d3rdt+

φ.hd3rdt

Surprisingly, no kinetic term, ie terms involving the canonical momenta appearhere. The Hamiltonian equations are then

dPψ/dt = −δH/δψ

dψ/dt = δH/δPψ = 0

and likewise for φ. Note that these equations imply that ψ(t, r) = ψ(r) is timeindependent and hence Pψ varies linearly with time.

Second quantization of the Schrodinger equation: Let

L = i(ψ∗∂tψ − ψ∂tψ∗)− a(∇ψ∗)(∇ψ)

be the Lagrangian density. Assume that ψ and ψ∗ are independent canonicalposition fields. Then, the corresponding canonical momentum fields are

Pψ = ∂L/∂∂tψ = iψ∗,

Pψ∗ = ∂L/∂∂tψ∗ = −iψ

The Hamiltonian density corresponding to L is then

H = Pψ.∂tψ + Pψ∗ .∂tψ∗ − L =

Pψ∂tψ + Pψ∗∂tψ∗ − L =

a∇ψ∗.∇ψand hence the second quantized Hamiltonian density of the free Schrodingerparticle is

H =

Hd3x = a

∇ψ∗.∇ψ.d3x = −a∫

ψ∗∇2ψ.d3x

where of course, to get agreement with experiment,

a = h2/2m

3.4. LARGE DEVIATIONS PROBLEMS IN QUANTUM FLUID DYNAMICS41

where h is Planck’s constant divided by 2π. In the presence of an externalpotential, this Hamiltonian becomes

H =

(−aψ∗∇2ψ + V (t, r)ψ∗ψ)d3r

The canonical commutation relations are

[ψ(t, r), Pψ(t, r′)] = ihδ3(r − r′)

or equivalently,[ψ(t, r), ψ∗(t, r′)] = hδ3(r − r′)

We expand the wave operator field ψ(t, r) in terms of the stationary state energyfunctions of the first quantized Schrodinger Hamiltonian operator

−a∇2 + V

to get

ψ(t, r) =∑

n

a(n)un(r)exp(−iω(n)t), ω(n) = E(n)/h

so that the above commutation relations give

n,m

[a(n), a(m)∗]un(r)um(r′)∗exp(−i(ω(n)− ω(m))t) = hδ2(r − r′)

This implies that[a(n), a(m)∗] = δn,m

since∑

n

un(r)un(r′)∗ = δ3(r − r′)

Actually, since the electronic field is Fermionic, the above commutation relationsshould be replaced with anticommutation relations and hence all the abovecommutators should actually be anticommutators. This picture does not showthe existence of the positron which is because the Schrodinger equation is anon-relativistic equation. If we use the Dirac relativistic wave equation, thenpositrons automatically appear.

Large deviations in quantum fluid dynamics: Let ψ(t, r) be an operatorvalued function of time with f and hence g, h being classical random fields.Then solving

∂tψ(t, r) = F (r, ψ(t, r′), r′ ∈ D) + h(t, r)

for ψ(t, r), we obtain ψ(t, r) in terms of its initial values ψ(0, r) and the clas-sical source h. Note that ψ(0, r) is an operator valued function of the spatialcoordinates. This solution can be obtained perturbatively. Hence, formally, wecan write

ψ(t, r) = Q(ψ(0, .), h(s, .), s ≤ t)

42CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

Assuming h to be a ”small” classical random field, we can ask about the statisticsof ψ(t, r) in a given quantum state ρ, ie for things like the quantum moments

Tr(ρ.ψ(t1, r1)...ψ(tN , rN ))

This will be a functional of h(s, .), s ≤ max(t1, .., tN ) and hence this momentwill be a classical random variable and we can ask about the large deviationproperties of this moment after replacing h by ǫh with ǫ→ 0.

3.5 Lecture schedule for statistical signal pro-

cessing

[1] Revision of the basic concepts in linear algebra and probability theory.[a] Probability spaces, random variables, probability distributions, charac-

teristic functions, independence, convergence of random variables, emphasis onweak convergence of random variables in a metric space.

[2] Basic Bayesian detection and estimation theory.[3] Various kinds of optimal estimators for parameters: Maximum likeli-

hood estimator for random and non-random parameters, maximum aposterioriestimator for random parameters, nonlinear minimum mean square estimatoras a conditional expectation, best linear minimum mean square estimator forrandom parameters, linear minimum unbiased minimum variance estimators ofparameters in linear models, expanding the nonlinear mmse as a Taylor se-ries. Emphasis on the orthogonality principle for optimal minimum linear andnonlinear mean square estimators of random parameters.

[4] Estimating parameters in linear and nonlinear time series models: AR,MA, ARMA models.

[5] Recursive estimation of parameters in linear models, the RLS algorithm.[6] Estimation of random signals in noise: The orthogonality principle, the

FIR Wiener filter, the optimal noncausal Wiener filter—derivation based onspectral factorization method of Wiener and on the innovations method of Kol-mogorov.

[7] Linear prediction theory. The infinite order Wiener predictor. Kol-mogorov’s formula for the minimum prediction error energy in terms of thepower spectral density of the process. Finite order forward and backward linearprediction of stationary processes. The Levinson-Durbin algorithm and latticefilters. Significance of the reflection coefficients from the viewpoint of stability.Lattice filter realizations. Lattice to ladder filter conversion and vice-versa.

[8] An introduction to Brownian motion and stochastic differential equations.Application to real time nonlinear filtering theory. The Kalman-Bucy filter incontinuous and discrete time. The Kushner-Kallianpur nonlinear filter and theextended Kalman filter. The disturbance observer in extended Kalman filtering.Applications of the extended Kalman filter to robotics.

[9] Mean and variance propagation in stochastic differential equations fornonlinear systems.

3.5. LECTURE SCHEDULE FOR STATISTICAL SIGNAL PROCESSING43

[10] Stochastic differential equations driven by Poisson fields.[11] Estimating the frequencies of harmonic signals using time series methods

and using high resolution eigensubspace estimators like the Pisarenko harmonicdecomposition, MUSIC and ESPRIT algorithm.

[12] Estimating bispectra and polyspectra of harmonic processes using mul-tidimensional versions of the MUSIC and ESPRIT algorithm.

[13] An introduction to large deviation theory for evaluating the effects oflow amplitude noise on the probabilities of deviation of the system from thestability zone.

[a] Large deviations for iid normal random variables.[b] Cramer’s theorem for empirical means of independent random variables.[c] Sanov’s theorem for empirical distributions of independent random vari-

ables.[d] Large deviations problems in binary hypothesis testing. Evaluation of

the asymptotic miss probability rate given that the false alarm probability rateis arbitrarily close to zero using the optimal Neyman-Pearson test.

[e] The Gibbs conditioning principle for noninteracting and interacting par-ticles:Given that the empirical distribution for iid random variables satisfiescertain energy constraints corresponding to noninteracting and interacting po-tentials, evaluate the most probable value assumed by this empirical distribu-tion using Sanov’s relative entropy formula for the probability distribution ofthe empirical distribution.

[f] Diffusion exit from a domain. Calculate the mean exit time of a diffusionprocess from a boundary given that the driving white Gaussian noise is weak.

[14] Large deviations for Poisson process driven stochastic differential equa-tions. Let N(t), t ≥ 0 be a Poisson process with rate λ. Consider the scaledprocess Nǫ(t) = ǫ.N(t/ǫ). Then Nǫ is a Poisson process whose jumps are inunits of ǫ and whose rate is λ/ǫ. Consider the stochastic differential equation

dX(t) = F (X(t))dt+ dNǫ(t), t ≥ 0

The rate function of the process Nǫ, ǫ→ 0 is evaluated as follows:

Λ(f) = ǫ.logE[exp(ǫ−1

∫ T

0

f(t)dNǫ(t))]

= λ.

∫ T

0

(exp(f(t)) − 1)dt

so the corresponding rate function is

IN (X) = supf(

∫ T

0

f(t)X ′(t)− λ.

∫ T

0

(exp(f(t))− 1)dt

and this can be expressed in the form

IN (X) =

∫ T

0

J(X ′(t))dt

44CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

whereJ(u) = supv(uv − λ.(exp(v) − 1))

and then the rate function of the solution X(t) to the sde over the time interval[0, T ] is given by

IX(X) =

∫ T

0

J(X ′(t)− F (X(t)))dt

More generally, consider a spatial Poisson field N(t, dξ) with a rate measure ofλ.dF (ξ), ie

E(N(t, E)) = λ.tF (E)

where E takes values in the Borel sigma field of Rn. X(t) is assumed to satisfythe sde

dX(t) = F (X(t))dt+

g(X(t), u(t), ξ)N(dt, dξ)

where u(t) is an input signal field. Now scale the Poisson in both time andamplitude to obtain the following sde satisfied by X(t):

dXǫ(t) = F (Xǫ(t))dt+ ǫ

g(Xǫ(t), u(t), ξ).N(dt/ǫ, dξ), t ≥ 0

We require to calculate the rate function of the processXǫ over the time interval[0, T ]. To this end, we consider first a more general situation: Let N(dξ) bea spatial Poisson field on Rn which we write formally as N(dx1, ...dξn). Weassume a constant rate field measure, ie,

E(N(dξ1, ..., dξn)) = λ.dξ1...dξn

Consider a different scaling along the different axes:

Nδ1...δn(dξ1, ..., dξn) = δ1...δnN(dξ1/δ1, ..., dξn/δn)

Then,

Mδ1...δn(f) = Eexp(

f(ξ1, ..., ξn)Nδ1...δn(dξ1, ..., dξn) =

= exp(λ.

(exp(δ1...δn.f(δ1ξ1, ...δnξn))− 1)dξ1...dξn)

= exp(λ.(δ1...δn)−1

(exp(δ1...δn.f(ξ1, ...ξn))− 1)dξ1...dξn)

from which we get for the scaled logarithmic moment generating function

δ1...δn.log(Mδ1...δn((δ1...δn)−1f))

λ

(exp(f(ξ1, ...ξn))− 1)dξ1...dξn

so the rate function of this scaled Poisson field is given by

I(ν) = supf (

f(ξ1, ..., ξn)dν(ξ1, ..., dξn)− λ

(exp(f(ξ1, ..., ξn))− 1)dξ1...dξn)

3.6. BASIC CONCEPTS IN GROUPTHEORYAND GROUP REPRESENTATIONTHEORY45

3.6 Basic concepts in group theory and group

representation theory

[1] Show that SL(2,C) and SU(2,C) are simply connected.Hint: SU(2,C) is topologically isomorphic to S3 via the representation

g =

(

α β−β α

)

where α, β ∈ C are arbitrary subject to the only constraint |α|2+ |β|2 = 1. S3 issimply connected, ie, any closed loop in S3 can be deformed continuously to anysingle point. Note that S2 is also simply connected but S2 without the northpole is not simply connected because a loop around the north pole cannot bedeformed into a point within this loop continuously without allowing the loopto intersect the north pole at some stage.

To see that G = SL(2,C) is simply connected, simply use the Iwasawa de-composition G = KAN with K Maximally compact = SU(2,C) and A Abeliandiagonal, N unipotent upper triangular and hence topologically, G is isomor-phic to SU(2,C)xR4 whose fundamental group is the same as that of SU(2,C)which is zero since SU(2,C) is simply connected.

[2] Show that the centre of the covering group of G = SL(2,R) is isomorphicto Z the additive group of integers. For again by the Iwasawa decomposition,G = KAN where now K = SO(2) and hence, G is topologically isomorphic toSO(2)xR2 whose fundamental group is that of SO(2) which is Z.

(Problem taken from V.S.Varadarajan, ”Lie groups, Lie algebras and theirrepresentations”, Springer).

Problem: Let G be a real analytic group and let π be a finite dimensional ir-reducible representation of G in a vector space V . Let B = v1, ..., vn be a basisfor V and let π denote the representation π in the basis B, ie, we use the samenotation for the representation and its matrix representation in the basis B. LetF denote the vector space of all functions on G spanned by πij(g), 1 ≤ i, j ≤ n.Show that dimF = n2, ie, the πij are all n2 linear independent functions on G.When G is compact, this result follows from the Schur orthogonality relations.How does one do this problem for any real analytic group ?

Problem: Deduce from the above result that the set of all finite linear com-binations

g∈G f(g)π(g) with f a complex function on G vanishing at all buta finite number of points equals the complete matrix algebra Mn(C).

Let A be the algebra of all finite linear combinations of the π(g), g ∈ G.This is the group algebra corresponding to the representation π of G. Notethat we are assuming that π is an irreducible representation of G. Note thatA is a subalgebra of the full matrix algebra Mn(C). We wish to prove thatA =Mn(C). Suppose T commutes with A, ie

T ∈Mn(C), T π(g) = π(g)T∀g ∈ G

Then R(T ) and N (T ) are π(G) − invariant, ie, A invariant subspaces. Byirreducibility of π, it then follows that R(T ) = V and N(T ) = 0, ie, T is non-

46CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

singular. Now let c be an eigenvalue of T . Then N(T − c) is a π-invariantsubspace and hence by irreducibility of π, N(T − c) = Cn. Hence, T = cI.This shows that the centre of A consists precisely of all scalar multiples of theidentity and hence A is the full matrix algebraMn(C). Now we are in a positionto prove that the functions πij(g), 1 ≤ i, j ≤ n form a linearly independent set.Suppose not. Then

i,j c(i, j)πij(g) = 0∀g ∈ G where some c(i, j) is non-zero.This equation can be expressed as

Tr(Cπ(g)) = 0∀g ∈ G

and hence since the set of linear combinations of π(g), g ∈ G equals Mn(C), itfollows that

Tr(CX) = 0∀X ∈Mn(C)

from which we deduce that C = 0. The claim is completely proved.Note: This problem was taken from V.S.Varadarajan, ”Lie groups, Lie alge-

bras and their representations”, Springer.

3.7 Some remarks on quantum gravity using the

ADM action

ExpressKab = Xµ

,aXν,b∇µnν

in terms of qab, qab,c, qab,0.

∇µnν = nν,µ − Γσµνnσ

Xµ,aX

ν,bnν,µ = Xν

,bnν,a

= −nνXν,ab

sincenνX

ν,b = 0

Xµ,aX

ν,bΓ

σµνnσ

= Xµ,aX

ν,bΓσµνn

σ

= (1/2)Xµ,aX

ν,b(qσµ,ν + qσν,µ − qµν,σ)n

σ

Xµ,aX

ν,bqσµ,ν =

Xµ,aqσµ,b = qσa,b −Xµ

,abqσµ

Thus,Xµ,aX

ν,bqσµ,νn

σ = qσa,bnσ

= qσa,b(Xσ,0 −N cXσ

,c)

= qσa,bXσ,0 −N c(qca,b − qσaX

σ,bc)

3.8. DIFFUSION EXIT PROBLEM 47

Likewise,

Xµ,aX

ν,bqσν,µn

σ = qσb,anσ

Xµ,aX

ν,bqµν,σn

σ

= Xµ,aX

ν,bqµν,σ(X

σ,0 −N cXσ

,c)

= Xµ,aX

ν,bqµν,0 −N cXµ

,aXν,bqµν,c

= qab,0 − qµν(Xµ,aX

ν,b),0 −N c(qab,c − qµν(X

µ,aX

ν,b),c)

= qab,0 − qaνXν,b0 − qµbX

µ,a0 −N cqab,c +N cqaνX

ν,bc +N cqµbX

µ,ac

NowqaνX

ν,b0 = qab,0 − qaν,0X

ν,b

qµbXµ,a0 = qab,0 − qµb,0X

µ,a

Alternately,

qaνXν,b0 = qa0,b − qaν,bX

ν,0

qµbXµ,a0 = qb0,a − qµb,0X

µ,0

Now,

gµν(Xµ,0 −NaXµ

,a)Xν,b = 0

gives

q0b = Naqab

So we can substitute this expression for qa0 in the above equations.

Hamilton-Jacobi equation and large deviations:

3.8 Diffusion exit problem

Consider the diffusion exit problem in which the process X(t) ∈ Rn satisfies thesde

dX(t) = f(X(t))dt+√ǫg(X(t))dB(t)

The process starts at x within a connected open set G with boundary ∂G andlet τ(ǫ) denote the first time at which the process hits the boundary. Then, weknow (Reference:A.Dembo and O.Zeitouni, ”Large deviations, Techniques andApplications”) that

limǫ→0ǫ.log(E(τ(ǫ)) = V (x)

where

V (x) = inf(V (t, x, y) : y ∈ ∂G, t ≥ 0)

where

V (t, x, y) = inf∫ t

0

|u(s)|2/2 : y = x+

∫ t

0

f(X(s))ds+

∫ t

0

g(X(s))u(s)ds

48CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

Suppose g(X(s)) is an invertible square matrix. Then,

V (T, x, y) = inf∫ T

0

(X ′(t)−f(X(t))T g(X(t))T g(X(t))(X ′(t)−f(X(t))dt : X(0) = x,X(T ) = y

The problems of calculating V (T, x, y) therefore amounts more generally to com-puting the extremum of the functional

S[T,X ] =

∫ T

0

L(X(t), X ′(t))dt

subject to the boundary conditions X(0) = x,X(T ) = y. Denoting this infi-mum by S(T, x, y), we know from basic classical mechanics that S satisfies theHamilton-Jacobi equation

∂tS(t, x, y) +H(x, ∂yS(t, x, y)) = 0

where H(x, p) the Hamiltonian is defined as the Legendre transform of theLagrangian

H(x, p) = infy(py − L(x, y))

or equivalently,

H(x, p) = py − L(x, y), p = ∂yL(x, y)

It is now clear that in the context of our diffusion exit problem,

L(x, y) = (y − f(x))T g(x)T g(x)(y − f(x))

and

V (t, x, y) = S(t, x, y)

Then

V (x, y) = inft≥0V (t, x, y) = inft≥0S(t, x, y) = S(x, y)

where S(x, y) satisfies the stationary Hamilton-Jacobi equation

H(x, ∂y(S(x, y)) = 0

We note that it is hard to compute the exact value of E(τ(ǫ)) for ǫ > 0 be-cause this involves computing the Green’s function of the diffusion generator.However, computing the limiting value of ǫ.logE(τ(ǫ)) as ǫ → 0 is simpler as itinvolves only computing the stationary solution of the Hamilton-Jacobi equa-tion.

3.9. LARGE DEVIATIONS FORTHE BOLTZMANNKINETIC TRANSPORT EQUATION49

3.9 Large deviations for the Boltzmann kinetic

transport equation

f,t(t, r, p) + (p,∇r)f(t, r, p) + (F (t, r, p),∇p)f(t, r, p) =

(w(p+ q, q)f(t, r, p+ q)− w(p, q)f(t, r, p))d3q

The rhs is interpreted as follows: w(p + q, p)d3q is the probability of a particlehaving momentum centred around p + q in the momentum volume d3q to getscattered to the momentum p after losing a momentum q and likewise w(p, q)d3qis the probability of a particle having momentum p to get scattered to a momen-tum centred around q within a momentum volume of d3q. Making appropriateapproximations gives us

w(p+ q, q)f(t, r, p+ q) ≈ w(p, q)f(t, r, p) + (q,∇p)(w(p, q)f(t, r, p))

+(1/2)Tr(qqT∇p∇Tp )w(p, q)f(t, r, p)

It follows that∫

(w(p+ q, q)f(t, r, p+ q)− w(p, q)f(t, r, p))d3q

≈ (q,∇p)(w(p, q)f(t, r, p)) + (1/2)Tr(qqT∇p∇Tp )(w(p, q)f(t, r, p))

Now letting∫

q.w(p, q)d3q = A(p),

qqTw(p, q)3q = B(p)

we can write down the approximated kinetic transport equation as

∂tf(t, r, p) + (p,∇r)f(t, r, p) + (F (t, r, p),∇p)f(t, r, p) =

divp(A(p)f(t, r, p)) + (1/2)Tr(∇p∇TpB(p)f(t, r, p))

which is the Fokker-Planck equation that one encounters in diffusion processesdescribed in the form of stochastic differential equations driven by Brownianmotion.

The large deviation problem: If the scattering probability function w(p, q)has small random fluctuations, the we can ask what would be the deviationin the resulting solution to the particle distribution function f ? Of course, toanswer this question, we must expand the solution f in powers of a perturbationparameter tag δ that is attached the random vector and matrix valued functionsA(p), B(p). Specifically, assuming w(p, q) to be of order δ, we have that A(p)and B(p) are random functions respectively of orders δ and δ2.

50CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

3.10 Orbital integrals for SL(2,R) with applica-tions to statistical image processing

G = SL(2,R)

Iwasawa decomposition of G is

G = KAN

where

K = u(θ) : 0 ≤ θ < 2π, u(θ) =(

cos(θ) −sin(θ)sin(θ) cos(θ)

)

A = diag(a, a−1) : a ∈ R− 0

N = (

1 n0 1

)

: n ∈ R

Define

H = diag(1,−1), X =

(

0 10 0

)

Y =

(

0 01 0

)

Then,[H,X ] = 2X, [H,Y ] = −2Y, [X,Y ] = H

Consider the following element in the universal enveloping algebra of G:

Ω = 4XY +H2 − 2H

Then,

[H2, X ] = 2(XH+HX) = 2(2HX−2X) = 4HX−4X, [XY,X ] = −XH = −(HX−2X) = −HX+2X

Thus,[Ω, X ] = 0

[H2, Y ] = −2(HY + Y H) = −2(2HY + 2Y ) = −4HY − 4Y, [XY, Y ] = HY

Thus,[Ω, Y ] = 0

and hence[Ω, H ] = [Ω, [X,Y ]] = 0

Thus, Ω belongs to the centre of the universal enveloping algebra.Now consider the orbital integral

Ff (h) = ∆(h)

G/A

f(xhx−1)dx, h ∈ A

3.10. ORBITAL INTEGRALS FOR SL(2,R)WITH APPLICATIONS TO STATISTICAL IMAGE PROCESSING51

Noting thatG = KNA

we can writeG/A = KN

and hence∫

G/A

f(xhx−1)dx =

K×Nf(unhn−1u−1)dudn

=

N

f(nhn−1)dn

where

f(x) =

K

f(uxu−1)du

Now letnhn−1 − h =M,n ∈ N, h ∈ A

It is clear that M is strictly upper triangular since n is unipotent, ie, upper-triangular with ones on the diagonal. Then, for fixed h,

dM = d(nhn−1) = |a− 1/a|dn = ∆(h)dn

(Simply write

n =

(

1 m0 1

)

,m ∈ R

and note that dn = dm and

n−1 =

(

1 −m0 1

)

h = diag(a, 1/a)

and observe that

nhn−1 − h =

(

0 (1/a− a)m0 0

)

) Note that for a = exp(t) and h = diag[a, 1/a], we have

∆(h) = |et − e−t|

This gives∫

N

f(nhn−1)dn =

n

f(M + h)(dn/dM)dM = ∆(h)−1

n

f(M + h)dM

where n is the Lie algebra of N and consists of all strictly upper triangular realmatrices. Finally, using Weyl’s integration formula,∫

G

f(x)dx =

A

dh|∆(h)|2∫

G/A

f(xhx−1)dx =

A×n

|∆(h)|f(M + h)dhdM

52CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

=

R×R

|exp(t)−exp(−t)|f(a(t)+sX)dtds, a(t) = diag(exp(t), exp(−t)) = exp(tH)

Reference: V.S.Varadarajan, ”Harmonic Analysis on Semisimple Lie Groups”,Cambridge University Press.

3.11 Loop quantum gravity (LQG)

Here, first we introduce the SO(3) connection coefficients that realize the co-variant derivatives of mixed spinors and tensors in such a way that the mixedcovariant derivative of the tetrad becomes zero. In fact this equation defines thespinor connection coefficients in terms of the spatial Christoffel connection coef-ficients. We then introduce the quantities Kj

a which are obtained by consideringthe spatial components Kab of the covariant derivative of the unit normal fourvector to the three dimensional hypersurface at each time t that has been em-bedded into R4. The Ashtekar momentum fields Eaj are obtained by scaling thetetrad eaj with the square root of the determinant of the spatial metric qab where

qab = ejaejb. The tetrad basis matrix eaj or rather its inverse matrix eja is used

to contract the K ′abs to obtain the Kj

a. The fact that that K ′abs are symmetric

is manifested in the equation that Kjaejb or equivalently K

jaE

jb is antisymmetric

in (a, b) and this constraint is called the Gauss constraint. By Combining thiswith the fact that the spinor covariant derivative of Eja or Eaj vanishes, we areable to obtain fundamental relation which states that the covariant derivativeof Eja or Eaj with respect to another connection called the Immirzi-Sen-Barbero-

Ashtekar connection obtained by adding the the spinor connection Γja with anyscalar multiple of Kj

a vanishes. This new connection is denoted by Aja. Fur-ther, in accordance with the ADM action, the canonical position fields are thespatial metric coefficients qab and the corresponding canonical momentum fieldsP ab are obtained in terms of the Kab, namely the spatial components of thecovariant derivative of the unit normal to the embedded three dimensional hy-persurface. Ashtekar proved that if we replace these canonical position fieldsby the Aja and the canonical momentum fields by the Eaj then because the

spin connection Γja are homogeneous functions of the Eaj , it follows that thesenew variables also satisfy the canonical commutation relations and hence canserve as canonical position and momentum fields. Moreover, he showed that theHamiltonian constraint operator when expressed in term of these new canonicalvariables is just polynomial functional in these fields and hence all the renor-malizability problems are overcome. Further the Gauss constraint in these newvariables is simplified to the form that it states that the covariant derivative ofthe Eaj vanishes. This formalism means that the new connection Aja behaveslike a Yang-Mills connection for general relativity, the curvature of this con-nection being immediately related after appropriate contraction with the Eaj tothe Einstein-Hilbert action. This means that one can discretize space into agraphical grid and replace the curvature terms in the Einstein-Hilbert action by

3.12. GROUPREPRESENTATIONS IN RELATIVISTIC IMAGE PROCESSING53

parallel displacements of the connection Aja around loops in the graph and theconnection terms Aja integrated over edges by SO(3) holonomy group elements.In other words, the whole programme of quantizing the Einstein-Hilbert actionusing such discretized graphs becomes simply that of quantizing a function ofSO(3) holonomy group elements, one element associated with each edge of agraph and a holonomy group element corresponding to the curvature obtainedby parallely transporting the connection around a loop.

3.12 Group representations in relativistic image

processing

For G = SL(2,R), we’ve seen that

G

f(x)dx =

∆(t)f(a(t) + sX)dtds,∆(t) = |exp(t)− exp(−t)|

Now suppose that f, g are two functions on G. Let χ be the character of arepresentation of G. Consider

I(f, g) =

G×Gf(x)g(y)χ(y−1x)dxdy

Let u ∈ G and observe that

I(fou−1, gou−1) =

G×Gf(u−1x)g(u−1y)χ(y−1x)dxdy

=

G×Gf(x)g(y)χ(y−1u−1ux)dxdy =

G×Gf(x)g(y)χ(y−1x)dxdy

= I(f, g)

Thus, I(f, g) is an invariant function of two image fields defined on G. I can beevaluated using the above integration formula. Note that this formula is truefor any function χ on G, not necessarily a character. Now consider

J(f, g) =

G×Gf(x)g(y)χ(yx−1)dxdy

We shall show that J is invariant whenever χ is a class function an in particular,when it is a character. We have

J(fou−1, gou−1) =

f(u−1x).g(u−1y)χ(yx−1)dxdy

=

f(u−1x).g(u−1y)χ(yx−1)dxdy

54CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

=

f(x)g(y)χ(uyx−1u−1)dxdy = J(f, g)

because χ is a class function ie,

χ(uxu−1) = χ(x)∀u, x ∈ G

Some other formulae regarding orbital integrals, Fourier transforms andWeyl’s character formula. Let G be a compact Lie group and consider Weyl’scharacter formula for an irreducible representation of G:

χ =

s∈W ǫ(s)exp(s.(λ + ρ))∑

s∈W ǫ(s)exp(s.ρ)

where ρ = (1/2)∑

α∈∆+α and λ is a dominant integral weight. Note that the

denominator can expressed as

∆1(exp(H)) =∑

s∈Wǫ(s).exp(s.ρ(H))

= Πα∈∆+(exp(α(H)/2)− exp(−α(H)/2))

= exp(ρ(H))Πα∈∆+(1− exp(−α(H))) = exp(ρ(H))∆(exp(H))

where

∆(H) = Πα∈∆+(exp(α(H)) − 1)

so that

∆(H) = Πα∈∆+(exp(−α(H))− 1) = c.Πα∈∆+

(1− exp(−α(H)))

with c a constant that equals ±1. Note also that H varies over the Cartan-subalgebra a and since G is a compact group, it follows that α(H) is pureimaginary for all α ∈ ∆ and H ∈ a. Now, Weyl’s integration formula gives forany function f on G,

χ(f) =

f(g)χ(g)dg =

C.

A×G/A|∆(h)|2f(xhx−1)χ(h)dhdx

= C.

A

∆(h)Ff (h)χ(h)dh

where Ff , the orbital integral of f , is defined by

Ff (h) = ∆(h)

G/A

f(xhx−1)dx

3.12. GROUPREPRESENTATIONS IN RELATIVISTIC IMAGE PROCESSING55

Substituting for χ(h) from Weyl’s character formula, we get

χ(f) = C.

a

Ff (exp(H)).∑

s∈Wǫ(s)exp(s.λ(H))dH

= C|W |∫

a

Ff (exp(H))exp(λ(H))dH = C|W |Ff (λ)

where g denotes Fourier transform of a function g defined on a. This formulashows that the ordinary Euclidean Fourier transform of the orbital integral onthe Cartan subalgebra evaluated at a dominant integral weight equals the char-acter of the corresponding irreducible representation when the group is compact.

Weyl’s dimension formula:

u(H) = χ(H)∆(H) =∑

s∈Wǫ(s)χsm1...mn

(H)

where χ on the right is given by

χm1...mn(H) = exp(im1θ1 + ...+ imnθn)

with (θ1, ..., θn) denoting the standard toral coordinates of H ∈ a. Define thedifferential operator

D = Πr>si−1(∂θr − ∂θs)

Then,

(Du)(0) = Πr>s(mr −ms)∑

s∈Wχs(0) = |W |Πr>s(mr −ms)

on the one hand and on the other, using the fact that

∆(H) =∑

s∈Wǫ(s).χsn,n−1,...,1(H)

whereχn−1,...,0(H) = exp(inθ1 + ...+ i2θ2 + iθ1)

we getD∆(0) = |W |Πn≥r>s≥1(r − s)

and hence(Du)(0) = χ(0)(D∆(0)) = d(χ).|W |.Πr>s(r − s)

whered(χ) = χ(0)

is the dimension of the irreducible representation. Equating these two expres-sions, we obtain Weyl’s dimension formula

d(χ) = d(m1, ...,mn) =Πr>s(mr −ms)

Πr>s(r − s)

56CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

This argument works for the compact group SU(n) where a dominant integralweight is parametrized by an n tuple of non-negative integers (m1 ≥ m2 ≥ ... ≥m1 ≥ 0). In the general case of a compact semisimple Lie group G, we useWeyl’s character in the form

χ(H) =

s∈W ǫ(s)exp(s.(λ + ρ)(H))

∆(H)

∆(H) =∑

s∈Wǫ(s).exp(s.ρ(H)) = Πα∈∆+

(exp(α(H)/2)− exp(−α(H)/2))

Then defineD = Πα∈∆+

∂(Hα)

We getD∆(0) = |W |.Πα∈∆+

ρ(Hα)

Thus, on the one hand,

D(χ∆)(0) = χ(0)D∆(0) = d(χ).|W |.Πα∈∆+ρ(Hα)

and on the other,

D(∆.χ)(0) = |W |Πα∈∆+(λ+ ρ)(Hα)

Equating these two expressions gives us Weyl’s dimension formula for the irre-ducible representation of any compact semisimple Lie group corresponding tothe dominant integral weight λ in terms of the roots:

d(χ) = d(λ) =Πα∈∆+

(λ + ρ)(Hα)

Πα∈∆+ρ(Hα)

=Πα∈∆+

< λ+ ρ, α >

Πα∈∆+< ρ, α >

3.13 Approximate solution to the wave equation

in the vicinity of the event horizon of aSchwarchild black-hole

This calculation would enable us to expand the solution to the wave equationin the vicinity of a blackhole in terms of ”normal modes” or eigenfunctionsof the Helmholtz operator in Schwarzchild space-time. The coefficients in thisexpansion will be identified with the particle creation and annihilation oper-ators which would therefore enable us to obtain a formula for the blackholeentropy assuming that it is in a thermal state. It would enable us to obtain the

3.13. APPROXIMATE SOLUTION TOTHEWAVE EQUATION IN THE VICINITY OF THE EVENTHORIZONOFA SCHWARCHILD BLACK-HOLE57

blackhole temperature at which it starts radiating particles in the Rindler space-time metric which is an approximate transformation of the Schwarzchild metric.By using the relationship between the time variable in the Rindler space-timeand the time variable in the Schwarzchild space-time, we can thus obtain thecharacteristic frequencies of oscillation in the solution of the wave equation inSchwarzchild space-time in terms of the the gravitational constant G and themassM of the blackhole. These characteristic frequencies can be used to calcu-late the critical Hawking temperature at which the blackhole radiates in termsof G,M using the formula for the entropy of a Gaussian state.

Let

g00 = α(r) = 1− 2GM/c2r, g11 = α(r)−1/c2 = c−2(1− 2GM/rc2)−1,

g22 = −r2/c2, g33 = −r2sin2(θ)/c2

The Schwarzchild metric is then

dτ2 = gµνdxµdxν , x0 = t, x1 = r, x2 = θ, x3 = φ

or equivalently,

dτ2 = α(r)dt2 − α(r)−1dr2/c2 − r2dθ2/c2 − r2sin2(θ)dφ2/c2

The Klein-Gordin wave equation in Schwarzchild space-time is

(gµνψ,ν√−g),µ + µ2√−gψ = 0

whereµ = mc/h

with h denoting Planck’s constant divided by 2π. Now,

√−g = (g00g11g22g33)1/2 = r2sin(θ)/c3

so the above wave equation expands to give

α(r)−1r2sin(θ)ψ,tt − ((c2/α(r))r2sin(θ)ψ,r),r

−(c2/r2)r2(sin(θ)ψ,θ)),θ − (c2/r2sin2(θ))(r2sin(θ)ψ,φφ + µ2r2sin(θ)ψ = 0

This equation can be rearranged as

(1/c2)∂2t ψ = (α(r)/r2)((r2/α(r))ψ,r),r

+(α(r)/r2)((sin(θ))−1(sin(θ)ψ,θ),θ + sin(θ)−2ψ,φφ)− µ2α(r)ψ

This wave equation can be expressed in quantum mechanical notation as

(1/c2)∂2t ψ = (α(r)/r2)((r2/α(r))ψ,r),r

−(α(r)/r2)L2ψ − µ2α(r)ψ

58CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

where

L2 = −(1

sin(θ)

∂θsin(θ)

∂θ+

1

sin2(θ)

∂2

∂φ2)

is the square of the angular momentum operator in non-relativistic quantummechanics. Thus, we obtain using separation of variables, the solution

ψ(t, r, θ, φ) =∑

nlm

fnl(t, r)Ylm(θ, φ)

where fnl(t, r) satisfies the radial wave equation

(1/c2)∂2t fnl(t, r) = (α(r)/r2)((r2/α(r))fnl,r),r

−((α(r)/r2)l(l + 1) + µ2α(r))fnl(t, r)

where l = 0, 1, 2, .... We write

fnl(t, r) = exp(iωt)gnl(r)

Then the above equation becomes

(−ω2/c2)gnl(r) = (α(r)/r2)((r2/α(r))g′nl(r))′

−((α(r)/r2)l(l + 1) + µ2α(r))gnl(r)

For each value of l, this defines an eigenvalue equation and on applying theboundary conditions, the possible frequencies ω assume discrete values sayω(n, l), n = 1, 2, .... These frequenices will depend upon GM and µ since α(r)depends on GM . Let gnl(r) denote the corresponding set of orthonormal eigen-functions w.r.t the volume measure

c3√−gd3x = r2sin(θ)dr.dθ.dφ

The general solution can thus be expanded as

ψ(t, r, θ, φ) =∑

nlm

[a(n, l,m)gnl(r)Ylm(θ, φ)exp(iω(nl)t)+a(n, l,m)∗gnl(r)Ylm(θ, φ)exp(−iω(nl)t)]

where n = 1, 2, ..., l = 0, 1, 2, ...,m = −l,−l + 1, ..., l − 1, l. The Hamiltonianoperator of the field can be expressed as

H =∑

nlm

hω(nl)a(n, l,m)∗a(n, l,m)

and the bosonic commutation relations are

[a(nlm), a(n′l′m′)∗] = hω(nl)δ(nn′)δ(ll′)δ(mm′)

3.14. MORE ON GROUP THEORETIC IMAGE PROCESSING 59

An alternate analysis of Hawking temperature.The equation of motion of a photon, ie null radial geodesic is given by

α(r)dt2 − α(r)−1dr2 = 0

or

dr/dt = ±α(r) = ±(1− 2m/r),m = GM/c2

The solution is∫

rdr/(r − 2m) = ±t+ c

so that

2m.r + 2mln(r − 2m) = ±t+ c

The change in time t when the particle goes from 2m+0 to 2m−0 is purely imag-inary an is given by δt = 2mπi since ln(r − 2m) = ln|r − 2m|+ i.arg(r − 2m)with arg(r − 2m) = 0 for r > 2m and = π for r < 2m. Thus, a photonwave oscillating at a frequency ω has its amplitude changed from exp(iωt) toexp(iω(t+ 2mπi)) = exp(iωt− 2mπω). The corresponding ratio of the proba-bility when it goes from just outside to just inside the critical radius is obtainedby taking the modulus square of the probability amplitude and is given byP = exp(−4mπω). The energy of the photon within the blackhole is hω (where his Planck’s constant divided by 2π) and hence by Gibbsian theory in equilibriumstatistical mechanics, its temperature T must be given by P = exp(−hω/kT ).Therefore

hω/kT = 4πmω

or equivalently,

T = 4πmk/h = 8πGMk/hc2

This is known as the Hawking temperature and it gives us the critical temper-ature of the interior of the blackhole at photons can tunnel through the criticalradius, ie the temperature at which the blackhole can radiate out photons.

3.14 More on group theoretic image processing

Problem: Compute the Haar measure on SL(2,R) in terms of the Iwasawadecomposition

x = u(θ)a(t)n(s)

where

u(θ) =

(

cos(θ) sin(θ)−sin(θ) cos(θ)

)

= exp(θ(X − Y )),

a(t) = exp(tH) = diag(et, e−t), n(s) = exp(sX) =

(

1 s0 1

)

60CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

Solution: We first express the left invariant vector fields corresponding toH,X, Y as linear combinations of ∂/∂t, ∂/∂θ, ∂/∂s and then the reciprocial ofthe associated determinant gives the Haar density. Specifically, writing

H∗ = f11(θ, t, s)∂/∂θ + f12(θ, t, s)∂/∂t+ f13(θ, t, s)∂/∂s,

X∗ = f21(θ, t, s)∂/∂θ + f22(θ, t, s)∂/∂t+ f23(θ, t, s)∂/∂s,

Y ∗ = f31(θ, t, s)∂/∂θ + f32(θ, t, s)∂/∂t+ f33(θ, t, s)∂/∂s,

for the left invariant vector fields corresponding to H,X, Y respectively, it wouldfollow that the Haar measure on SL(2,R) is given by

F (θ, t, s)dθdtds, F = |det((fij))|

or more precisely, the Haar integral of a function φ(x) on SL(2,R) would begiven by

SL(2,R)

φ(x)dx =

φ(u(θ)a(t)n(s))F (θ, t, s)dθdtds

We haved/ds(u(θ)a(t)n(s)) = u(θ)a(t)n(s)X

soX∗ = ∂/∂s

Then,

d/dt(u(θ)a(t)n(s)) = u(θ)a(t)Hn(s) = u(θ)a(t)n(s)n(s)−1Hn(s)

Now,

n(s)−1Hn(s) = exp(−s.ad(X))(H) = H − s[X,H ] = H − 2sX

SoH∗ − 2sX∗ = ∂/∂t

Finally,

d/dθ(u(θ)a(t)n(s)) = u(θ)(X−Y )a(t)n(s) = u(θ)a(t)n(s)(n(s)−1a(t)−1(X−Y )a(t)n(s))

and

a(t)−1(X − Y )a(t) = exp(−t.ad(H))(X − Y ) = exp(−2t)X − exp(2t)Y

n(s)−1a(t)−1(X − Y )a(t)n(s) = n(s)−1(exp(−2t)X − exp(2t)Y )n(s)

= exp(−2t)X − exp(2t).exp(−s.ad(X))(Y )

exp(−s.ad(X))(Y ) = Y − s[X,Y ] + (s2/2)[X, [X,Y ]] = Y − sH − s2X

soexp(−2t)X∗ − exp(2t)(Y ∗ − sH∗ − s2X∗) = ∂/∂θ

3.14. MORE ON GROUP THEORETIC IMAGE PROCESSING 61

Problem: Solve these three equations and thereby express X∗, Y ∗, H∗ aslinear combinations of ∂/∂s, ∂/∂t, ∂/∂θ.

Problem: Suppose I(f, g) is the invariant functional for a pair of image fieldsf, g defined on SL(2,R). If f changes by a small random amount and so doesg, then what is the rate functional of the change in I(f, g) ? Suppose that foreach irreducible character χ of G, we have constructed the invariant functionalIχ(f, g) for a pair of image fields f, g. Let G denote the set of all the irreduciblecharacters of G. Consider the Plancherel formula for G:

f(e) =

G

χ(f)dµ(χ)

where µ is the Plancherel measure on G. In other words, we have

δe(g) =

G

χ(g)dµ(χ)

Then we can write

Iχ(f, g) =

G×Gf(x)g(y)χ(yx−1)dxdy, χ ∈ G

Let ψ(g) be any class function on G. Then, by Plancherel’s formula, we have

ψ(g) = ψ ∗ δe(g) =∫

G

δe(h−1g)ψ(h)dh =

χ(h−1g).ψ(h)dµ(χ)dh =

G

ψ ∗ χ(g).dµ(χ)

and hence the invariant Iψ associated with a pair of image fields can be expressedas

Iψ(f, g) =

f(x)g(y)ψ(yx−1)dxdy =

G×G×Gψ ∗ χ(yx−1)f(x)g(y)dxdy.dµ(χ)

=

ψ(h)χ(h−1yx−1)f(x)g(y)dxdy.dµ(χ)dh

=

G×Gψ(h)Iχ(f, g, h)dh.dµ(χ)

where we define

Iχ(f, g, h) =

G×Gf(x)g(y)χ(yx−1h)dxdy, h ∈ G

Note that (f, g) → Iχ(f, g, h) is not an invariant. We define the Fourier trans-form of the class function ψ by

ψ(χ) =

G

ψ(h)χ(h−1)dh

62CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

Then, we have

δe(g) =

χ(g)dµ(χ),

For any function f(g) on G, not necessarily a class function,

f(g) =

δe(h−1g)f(h)dh =

χ(h−1g)f(h)dhdµ(χ)

Formally, let πχ denote an irreducible representation having character χ. Then,defining

f(πχ) =

G

f(g)π(g−1)dg =

G

f(g−1)π(g)dg

(Note that we are assumingG to be a unimodular group. In the case of SL(2,R),this anyway true), we have

f(g) =

Tr(πχ(h−1g))f(h)dhdµ(χ)

=

G

Tr(f(πχ)πχ(g))dµ(χ)

and hence,

Iχ(f, g, h) =

G×Gf(x)g(y)χ(yx−1h)dxdy

=

f(x)g(y)Tr(πχ(y)πχ(x−1)πχ(h))dxdy

= Tr(g0(πχ)f(πχ)πχ(h))

where

g0(x) = g(x−1), x ∈ G

Then, for a class function ψ,

Iψ(f, g) =

ψ(h)Tr(g0(πχ)f(πχ)πχ(h))dhdµ(χ)

but,

ψ(χ) =

ψ(g)Tr(πχ(g−1))dg =

Tr(ψ(πχ))

We have,∫

ψ(h)πχ(h)dh =

ψ0(h)πχ(h−1)dh

= ψ0(πχ)

3.14. MORE ON GROUP THEORETIC IMAGE PROCESSING 63

We have, on the one hand

ψ(x) =

ψ(h)χ(h−1x)dhdµ(χ)

and since ψ is a class function, ∀x, y ∈ G,

ψ(x) = ψ(yxy−1) =

ψ(h)χ(h−1yxy−1)dh.dµ(χ)

=

ψ(h)χ(y−1h−1yx)dh.dµ(χ)

=

ψ(h)χ1(h, x, a)dµ(χ)

where

χ1(h, x, a) =

G

a(y)χ(y−1h−1yx)dy =

G

a(y)χ(h−1yxy−1)dy

where a(x) is any function on G satisfying,

G

a(x)dx = 1

Remark: SL(2,R) under the adjoint action, acts as a subgroup of the Lorentztransformations in a plane, ie, in the t−y−z space. Indeed, represent the space-time point (t, x, y, z) by

Φ(t, x, y, z) =

(

t+ z x+ iyx− iy t− z

)

Then define (t′, x′, y′, z′) by

Φ(t′, x′, y′, z′) = A.Φ(t, x, y, z)A∗

where

A ∈ SL(2,R)

Clearly, by taking determinants, we get

t′2 − x

′2 − y′2 − z

′2 = t2 − x2 − y2 − z2

and then choosing A = u(θ) = exp(θ(X−Y )), a(v) = exp(vH), n(s) = exp(sX)successively gives us the results that

t′ + z′ = t+ z.cos(2θ) + x.sin(2θ), t′ − z′ = t− z.cos(2θ)− x.sin(2θ)

x′ + iy′ = x.cos(2θ)− z.sin(2θ)

64CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

which means that

t′ = t, x′ = x.cos(2θ)− z.sin(2θ), z′ = x.sin(2θ) + z.cos(2θ), y′ = y

for the case A = u(θ), which corresponds to a rotation of the x−z plane aroundthe y axis by an angle 2θ. The case A = a(v) gives

x′ + iy′ = x+ iy, t′ + z′ = e2v(t+ z), t′ − z′ = e−2v(t− z)

or equivalently,

x′ = x, y′ = y, t′ = t.cosh(2v) + z.sinh(2v), z′ = t.sinh(2v) + z.cosh(2v)

which corresponds to a Lorentz boost along the z direction with a velocity ofu = −tanh(2v). Finally, the case A = n(s) gives

t′ = t+ sx+ s2(t− z)/2, x′ = x+ s(t− z), z′ = z + sx+ s2(t− z), y′ = y

This corresponds to a Lorentz boost along a definite direction in the xz plane.The direction and velocity of the boost is a function of s. To see what this oneparameter group of transformations corresponding to A = n(s) corresponds to,we first evaluate it for infinitesimal s to get

t′ = t+ sx, x′ = x+ s(t− z), z′ = z + sx, y′ = y

This transformation can be viewed as a composite of two infinitesimal transfor-mations, the first is

x→ x− sz, z → z + sx, t → t

This transformation corresponds to an infinitesimal rotation of the xz planearound the y axis by an angle s. The second is

x→ x+ st, t→ t+ sx

which corresponds to an infinitesimal Lorentz boost along the x axis by a velocityof s. To see what the general non-infinitesimal case corresponds to, we assumethat it corresponds to a boost with speed of u along the unit vector (nx, 0, nz).Then the corresponding Lorentz transformation equations are

(x′, z′) = γ(u)(xnx + znz − ut)(nx, nz) + (x, z)− (xnx + znz)(nx, nz)

t′ = γ(u)(t− xnx − znz)

The first pair of equation is in component form,

x′ = γ(xnx + znz − ut)nx + x− (xnx + znz)nx,

z′ = γ(xnx + znz − ut)nz + z − (xnx + znz)nz

To get the appropriate matching, we must therefore have

1 + s2/2 = γ(u), s = −γ(u)nx,−s2/2 = γ(u)nz,

3.14. MORE ON GROUP THEORETIC IMAGE PROCESSING 65

which is impossible. Therefore, this transformation cannot correspond to asingle Lorentz boost. However, it can be made to correspond to the compositionof a rotation in the xz plane followed by a Lorentz boost in the same plane. LetLx(u) denote the Lorentz boost along the x axis by a speed of u and let Ry(α)denote a rotation around the y axis by an angle α counterclockwise. Then, wehave already seen that the finite transformation corresponding to A = n(s) isgiven by

L(s) = limn→∞(Lx(−s/n)Ry(−s/n))n = exp(−s(Kx +Xy))

where Kx is the infinitesimal generator of Lorentz boosts along the x axis whileXy is the infinitesimal generator of rotations around the y axis. Specifically,

Kx = L′x(0), Xy = R′

y(0)

The characters of the discrete series of SL(2,R). Letm be a nonpositive integer.Let em−1 be a highest weight vector of a representation πm of sl(2,R). Then

π(X)em−1 = 0, π(H)em−1 = (m− 1)em−1

Defineπ(Y )rem−1 = em−1−2r, r = 0, 1, 2, ...

Then

π(H)em−1−2r = π(H)π(Y )rem−1 = ([π(H), π(Y )r] + π(Y )rπ(H))em−1

= (m− 1− 2r)π(Y )rem = (m− 1− 2r)em−1−2r, r ≥ 0

Thus, considering the module V of SL(2,R) defined by the span of em−2r, r ≥ 0,we find that

Tr(π(a(t)) = Tr(π(exp(tH)) =∑

r≥0

exp(t(m− 1− 2r)) =exp((m− 1)t)

1− exp(−2t)

=exp(mt)

exp(t)− exp(−t)Now,

π(X)em−1−2r = π(X)π(Y )rem−1 = [π(X), π(Y )r ]em−1

= (π(H)π(Y )r−1 + ....+ π(Y )r−1π(H))em−1

= [(−2(r−1)π(Y )r−1+π(Y )r−1π(H)−2(r−2)π(Y )r−1+π(Y )r−1π(H)+...+π(Y )r−1π(H)]em−1

= [−r(r − 1) + r(m− 1)]π(Y )r−1em−1 = r(m− r)em+1−2r

Also note thatπ(Y )em−1−2r = em−3−2r, r ≥ 0

Problem: From these expressions, calculate Tr(π(u(θ)) where u(θ) = exp(θ(X−Y )). Show that it equals exp(imθ)/(exp(iθ) − exp(−iθ)). In this way, we areable to determine all the discrete series characters for SL(2,R).

66CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

3.15 A problem in quantum information theory

Let K be a quantum operation, ie, a CPTP map. Its action on an operator Xis given in terms of the Choi-Kraus representation by

K(X) =∑

a

EaXE∗a

Define the following inner product on the space of operators:

< X, Y >= Tr(XY ∗)

Then, calculate K∗ relative to this inner product in terms of the Choi-Krausrepresentation of K.

Answer:

< K(X), Y >= Tr(K(X)Y ∗) = Tr(∑

a

EaXE∗aY

∗) =∑

a

Tr(XE∗aY

∗Ea)

= Tr(X.(∑

a

E∗aY Ea)

∗) = Tr(X.K∗(Y )∗) =< X,K(∗(Y ) >

whereK∗(X) =

a

E∗aXEa

Problem: Let A,B be matrices A square and B rectangular of the appropri-ate size so that

X =

(

A BB∗ I

)

≥ 0

Then deduce thatA ≥ BB∗

Solution: For any two column vectors u, v of appropriate size

(u∗, v∗)x

(

uv

)

≥ 0

This givesu∗Au+ u∗Bv + v∗B∗u+ v∗v ≥ 0

Takev = Cu

where C is any matrix of appropriate size. Then, we get

u∗(A+BC + C∗B∗ + C∗C)u ≥ 0

this being true for all u, we can write

A+BC + C∗B∗ + C∗C ≥ 0

for all matrices C of appropriate size. In particular, taking C = −B∗ gives us

A−BB∗ ≥ 0

3.16. HAWKING RADIATION 67

3.16 Hawking radiation

Problem: For a spherically symmetric blackhole with a charge Q at its centre,solve the static Einstein-Maxwell equations to obtain the metric. Hence, cal-culate the equation of motion of a radially moving photon in this metric anddetermine the complex time taken by the photon to travel from just outsidethe critical blackhole radius to just inside and denote this imaginary time byi∆. Note that ∆ will be a function of the parameters G,M,Q of the blackhole.Deduce that if T is the Hawking temperature for quantum emission of photonsfor such a charged blackhole, then at frequency ω,

exp(−hω/kT ) = exp(−ω∆)

and hence that the Hawking temperature is given by

T = h/k∆

Evaluate this quantity.

LetD(ρ|σ) = Tr(ρ.(log(ρ)− log(σ)))

the relative entropy between two states ρ, σ defined on the same Hilbert space.We know that it is non-negative and that if K is a quantum operation, ie, aCPTP map, then

D(ρ|sigma) ≥ D(K(ρ)|K(σ))

Consider three subsystems A,B,C and let ρ(ABC) be a joint state on thesethree subsystems. Let ρm = IB/d(B) the completely mixed state on B. IB isthe identity operator on HB and d(B) = dimHB. Let K denote the quantumoperation corresponding to partial trace on the B system. Then consider theinequality

D(ρ(ABC)|ρm ⊗ ρ(AC)) ≥ D(K(ρ(ABC))|K(ρm ⊗ ρ(AC)))

= D(ρ(AC)|ρ(AC)) = 0

Evaluating gives us

Tr(ρ(ABC)log(ρ(ABC)) − Tr(ρ(ABC).(log(ρ(AC)) − log(d(B))) ≥ 0

or equivalently,−H(ABC) +H(AC) + log(d(B)) ≥ 0

Now, let K denote partial tracing on the C-subsystem. Then,

D(ρ(ABC)|ρm ⊗ ρ(AC)) ≥ D(K(ρ(ABC))|K(ρm ⊗ ρ(AC)))

gives−H(ABC) +H(AC) + log(d(B)) ≥ D(ρ(AB)|Im ⊗ ρ(A))

68CHAPTER 3. ORDINARY, PARTIAL AND STOCHASTIC DIFFERENTIAL EQUATIONMODELS FOR BIOLOGICAL SYSTEMS,QUANTUMGRAVITY, HAWKING RADIATION, QUANTUM FLUIDS,STATISTICAL SIGNAL PROCESSING, GROUP REPRESENTATIONS, LARGE DEVIATIONS IN STATISTICAL PHYSICS

= −H(AB)− Tr(ρ(AB)(log(ρ(A)) − log(d(B)))

= −H(AB) +H(A) + d(B)

givingH(AB) +H(AC) −H(A)−H(ABC) ≥ 0

Let the metric of space-time be given in spherical polar coordinates by

dτ2 = A(r)dt2 −B(r)dr2 − C(r)(dθ2 + sin2(θ)dφ2)

The equation of the radial null geodesic is given by

A(r) −B(r)(dr/dt)2 = 0

or equivalently for an incoming radial geodesic,

dr/dt = −√

A(r)/B(r) = −F (r)

say whereF (r) =

A(r)/B(r)

Let F (r) have a zero at the critical radius r0 so that we can write

F (r) = G(r)(r − r0)

where G(r0) 6= 0. Then, the time taken for the photon to go from r0 + 0 tor0 − 0 is given by

∆ =

∫ r0+0

r0−0

dr/((r − r0)G(r)) = G(r0)−1limǫ↓0(log(ǫ)− log(−ǫ))

= −iπ.G(r0)−1

so that the ratio change in the probability amplitude of a photon pulse of fre-quency ω is the quantity exp(−ωπ.G(r0)−1) and equating this to the Gibbsianprobability exp(−hω/kT ) gives us the Hawking temperature as

T = h.G(r0)/πk

Chapter 4

Research Topics onrepresenations,

Using the Schrodinger equation to track the time varying probability densityof a signal. The Schrodinger equation naturally generates a family of pdf’s asthe modulus square of the evolving wave function and by controlling the timevarying potential in which the Schrodinger evolution takes place, it is possibleto track a time varying pdf, ie, encode this time varying pdf into the form theweights of a neural network. From the stored weights, we can synthesize thetime varying pdf of the signal.

4.1 Remarks on induced representations and Haarintegrals

Let G be a compact group. Let χ be a character of an Abelian subgroupN of G in a semidirect product G = HN . Let U = IndGNχ be the inducedrepresentation. Suppose a function f on G is orthogonal to all the matrixelements of U . Then we can express this condition as follows. First therepresentation space V of U consists of all functions h : G → C for whichh(xn) = χ(n)−1h(x)∀x ∈ G,n ∈ N and U acts on such a function to giveU(y)h(x) = h(y−1x), x, y ∈ G. The condition

G

f(x)U(x)dx = 0

gives∫

G

f(x)h(x−1y)dx = 0∀y ∈ G∀h ∈ V

Now for h ∈ V , we have

h(xn) = χ(n)−1h(x), x ∈ G,n ∈ N

69

70 CHAPTER 4. RESEARCH TOPICS ON REPRESENATIONS,

and hence for any representation π of G,∫

h(xn)π(x−1)dx = χ(n)−1h(π)

or∫

h(x)π(n.x−1)dx = χ(n)−1h(π)

orπ(n)h(π) = χ(n−1)h(π), n ∈ N

Define

Pπ,N =

N

π(n)dn

Assume without loss of generality that π is a unitary representation of G. Then,Pπ,N is the orthogonal projection onto the space of all those vectors w in therepresentation space of π for which π(n)w = w∀n ∈ N . Then the above equationgives on integrating w.r.t n ∈ N ,

Pπ,N h(π) = 0, h ∈ V, χ 6= 1

andPπ,N h(π) = h(π), h ∈ V, χ = 1

on using the Schur orthogonality relations for the characters of N . We thereforehave The equation

G

f(x)h(x−1y)dx = 0

gives us∫

G

f(x)h(x−1y)π(y−1)dxdy = 0

or∫

f(x)h(z)π(z−1x−1)dxdz = 0

orh(π)f(π) = 0, h ∈ V

It also implies∫

f(x)h(x−1y)π(y)dxdy = 0

or∫

f(x)h(z)π(xz)dxdz = 0

orf(π)∗h(π)∗ = 0

which is the same as the above. Suppose on the other hand that f satisfies∫

N

f(xny)dn = 0, x, y ∈ G−−− (1)

4.1. REMARKS ON INDUCED REPRESENTATIONSAND HAAR INTEGRALS71

Then, it follows that

G×Nf(xny)π(y−1)dydn = 0

which implies∫

G×Nf(z)π(z−1xn)dzdn = 0

This impliesf(π)π(x)Pπ,N = 0, x ∈ G

It also implies∫

f(xny)π(x−1)dxdn = 0

which implies∫

f(z)π(nyz−1)dzdn = 0

or equivalently,Pπ,Nπ(y)f(π) = 0, y ∈ G

In particular, (1) implies

Pπ,N f(π) = f(π)Pπ,N = 0

for all representations π of G. The condition (1) implies that Pπ(y)f(π) = 0where P = Pπ,N for all representations π and hence for all functions h on

G it also implies that P h(π)f (π) = 0. More precisely, it implies that for allrepresentations π and all functions h on G,

Pπ,N h(π)f(π) = 0− −− (2)

Now suppose that h1 belongs to the the representation space of U = IndGNχ0

where χ0 is the identity character on N , ie, one dimensional unitary represen-tation of N equal to unity at all elements of G. Then, it is clear that h1 can beexpressed as

h1(x) =

N

h(xn)dn, x ∈ N

for some function h on G and conversely if h is an arbitrary function on G forwhich the above integral exists, then h1 will belong to the representation spaceof U . That is because h1 as defined above satisfies

h1(xn) = h1(x), x ∈ G,n ∈ N

The condition that f be orthogonal to the matrix elements of U is that

f(x)U(x−1)dx = 0

72 CHAPTER 4. RESEARCH TOPICS ON REPRESENATIONS,

and this condition is equivalent to the condition that for all h on G,

G

f(x)h1(xy)dx = 0, y ∈ G, h =

N

h(xn)dn

and this condition is in turn equivalent to

f(x)h1(xy)π(y−1)dxdy = 0

or∫

f(x)h1(z)π(z−1x)dxdz = 0

or equivalently,h1(π)f(π)

∗ = 0

for all unitary represenations π of G and for all functions h on G. Now,

h1(π) =

h(xn)π(x−1)dxdn =

h(z)π(nz−1)dzdn = P.h(π)

and hence the condition is equivalent to

Pπ,N h(π)f(π)∗ = 0−−− (3)

for all unitary representations π of G and for all functions h onG. This conditionis the same as (2) if we observe that f(π) can be replaced by f(π)∗ becausef(x) being orthogonal to the matrix elements of U(x) is equivalent to f(x−1)being orthogonal to the same because for compact groups, a finite dimensionalrepresentation is equivalent to a unitary representation which means that U isan appropriate basis is a unitary representation. Thus, we have proved thata necessary and sufficient condition that a function f be orthogonal to all thematrix elements of the representation of G induced by the identity character χ0

on N is that f be a cusp form, ie, it should satisfies

N

f(xny)dn = 0∀x, y ∈ G

Now suppose χ is an arbitary character of N (ie a one dimensional unitary rep-resentation of N). Then let U = IndGNχ. The If h1 belongs to the representationspace of U , then h1(xn) = χ(n)−1h1(x), x ∈ G,n ∈ N . Now let h(x) be anyfunction on G. Define

h1(x) =

N

h(xn)χ(n)dn

Then,

h1(xn0) =

N

h(xn0n)χ(n)dn =

h(xn)χ(n−10 n)dn

= χ(n0)−1h1(x), n0 ∈ N, x ∈ G

4.1. REMARKS ON INDUCED REPRESENTATIONSAND HAAR INTEGRALS73

It follows that h1 belongs to the representation space of U . conversely if h1belongs to this representation space, then

h1(x) =

N

h1(xn)χ(n)dn

Thus we have shown that h1 belongs to the representation space of U = IndGNχiff there exists a function h on G such that

h1(x) =

h(xn)χ(n)dn

Note that this condition fails for non-compact groups since the integral dn onN cannot be normalized in general. Then, the condition for f to be orthogonalto U is that

G

f(x)U(x−1)dx = 0−−− (4)

and this condition is equivalent to∫

f(x)h1(xy)dx = 0

for all h1 in the representation space of U or equivalently,∫

f(x)h(xyn)χ(n)dxdn = 0

for all functions h on G. This condition is equivalent to∫

f(x)h(xyn)χ(n)π(y−1)dxdydn = 0

for all unitary representations π of G which is in turn equivalent to∫

G×G×Nf(x)h(z)χ(n)π(nz−1x)dxdzdn = 0

or equivalently,Pχ,π,N .h(π).f (π)

∗ = 0−−− (5)

for all functions h on G and for all unitary representations π of G where

Pχ,π,N = P =

N

χ(n)π(n)dn

Now consider the following condition on a function f defined on G:∫

N

f(xny)χ(n)dn = 0, x, y ∈ G

This condition is equivalent to∫

G×Nf(xny)χ(n)π(y−1)dydn = 0

74 CHAPTER 4. RESEARCH TOPICS ON REPRESENATIONS,

for all unitary representations π of G or equivalently,

G×Nf(z)χ(n)π(z−1xn)dzdn = 0

and this is equivalent to

f(π)π(x)P = 0, x ∈ G−−− (6)

which is in turn equivalent to

f(π)h(π)P = 0−−− (7)

for all functions h onG as follows by multiplying (5) with h(x−1) and integratingover G. This condition is the same as

Pχ,π,N h(π)∗f(π)∗ = 0

or since h(x) can be replaced by h(x−1), the condition is equivalent to

Pχ,π,N h(π)f (π)∗ = 0− −− (8)

for all functions h on G and all unitary representations π of G. This conditionis the same as (5). Thus, we have established that a necessary and sufficientcondition for f to be orthogonal to all the matrix elements of U = IndGNχ isthat f should satisfy

N

f(xny)χ(n)dn = 0, x, y ∈ G

In other words, f should be a generalized cusp form corresponding to the char-acter χ of N .

4.2 On the dynamics of breathing in the pres-

ence of a mask

Let V denote the volume between the face and the mask. Oxygen passes fromoutside the mask to within which the person breathes in. While breathing out,the person ejects CO2 along with of course some fraction of oxygen. Let no(t)denote the number of oxygen molecules per unit volume at time t and nc(t)denote the number of carbon-dioxide molecules per unit volume at time t in theregion between the face and the mask. Let nr(t) denote the number of moleculesof the other gases in the same region per unit volume. Let λc(t)dt denote theprobability of a CO2 molecule escaping away the volume between the mask andthe face in time [t, t+dt]. Then let gc(t)dt denote the number of CO2 molecules

4.2. ON THE DYNAMICS OF BREATHING IN THE PRESENCEOFAMASK75

being ejected from the nose and mouth of the person per unit time. Then, wehave a simple conservation equation

dnc(t)/dt = −nc(t)λc(t) + gc(t)/V

To this differential equation, we can add stochastic terms and express it as

dnc(t)/dt = −nc(t)λc(t) + gc(t)/V + wc(t)

We can ask the question of determining the statistics of the first time τc at whichthe CO2 concentration nc(t) exceeds a certain threshold Nc. Sometimes therecan be a chemical reaction between the CO2 trapped in the volume with someother gases. Likewise other gases within the volume can react and produce CO2.If we take into account these chemical reactions, then there will be additionalterms in the above equation describing the rate of change of concentration ofCO2. For example, consider a reaction of the form

a.CO2 + b.X → c.Y + d.Z

where a, b, c, d are positive integers. X denotes the concentration of the othergases, ie, gases other than CO2 within the volume. At time t = 0, there arenc(0) molecules of CO2 and nX(0) molecules of X and zero molecules of Y andZ. Let K denote the rate constant at fixed temperature T . Then, we have anequation of the form

dnc(t)/dt = −K.nc(t)anX(t)b −−− (1)

according to Arrhenius’ law of mass action. After time t, therefore, nc(0)−nc(t)molecules of CO2 have reacted with (nc(0)− nc(t))b/a moles of X . Thus,

nX(t) = nX(0)− (nc(0)− nc(t))b/a

from which we deduce

dnX(t)/dt = (b/a)dnc(t)/dt = (−Kb/a)nc(t)a.nX(t)b −−− (2)

equations (1) and (2) have to be jointly solved for nc(t) and nX(t). This can bereduced to a single differential equation for nc(t):

dnc(t)/dt = −K.nc(t)a(nX(0)− (nc(0)− nc(t))b/a)b −−− (3)

Likewise, we could also have a chemical reaction in which CO2 is produced by achemical reaction between two compounds U, V . We assume that such reactionis of the form

pU + qV → r.CO2 + s.M

Let nU (0), nV (0) denote the number of molecules of U and V at time t = 0.Then, the rate of at which their concentrations change with time is again givenby Arrhenius’ law of mass action

dnU (t)/dt = −K1nU (t)pnV (t)

q

76 CHAPTER 4. RESEARCH TOPICS ON REPRESENATIONS,

nV (t) = nV (0)− q(nU (0)− nU (t))/p

and then the number ofCO2 molecules generated after time t due to this reactionwill be given by

nc(t) = (r/p)(nU (0)− nU (t))

and also

nV (t) = nV (0)− (q/p)(nU (0)− nU (t)) = nV (0)− (q/r)nc(t)

and hence

dnc(t)/dt = (−r/p)dnU (t)/dt = (r/p)K1nU (t)p.nV (t)

q

= (K1r/p)(nU (0)− pnc(t)/r)p(nV (0)− (q/r)nc(t))

q

It follows that combining these two processes we can write Now consider thecombination of two such reactions. Writing down explicitly these reactions, wehave

aCO2 + bX → c.Y + d.Z, p.U + q.V → r.CO2 + s.Z

At time t = 0, we have initial conditions on nc(0), nX(0), nU (0), nV (0). Thenthe rate at which CO2 gets generated by the combination of these two reactionsand the other non-chemical-reaction processes of diffusion exit etc. is given by

dnc(t)/dt = −Knc(t)anX(t)b + (K1r/p)nU (t)pnV (t)

q + f(nc(t), t)

where

dnX(t)/dt = (−Kb/a)nc(t)anX(t)b, dnU (t)/dt = −K1nU (t)pnV (t)

q,

dnV (t)/dt = (−K1q/p)nU (t)pnV (t)

q

Here, f(nc(t), t) denotes the rate at which CO2 concentration changes due to theother effects described above. Of course, we could have more complex chemicalreactions in of the sort

a0CO2 + a1X1 + ...+ anXn → b1Y1 + ...+ bmYm,

for which the law of mass action would give

dnc(t)/dt = −Knc(t)a0nX1(t)a1 ...nXn

(t)an ,

nXj(t) = nXj

(0)− (aj/a0)(nc(0)− nc(t)), j = 1, 2, ..., n,

nYj(t) = nYj

(0) + (bj/a0)(nc(0)− nc(t)), j = 1, 2, ...,m

4.3. PLANCHEREL FORMULA FOR SL(2,R) 77

4.3 Plancherel formula for SL(2,R)

Let G = KAN be the usual Iwasawa decomposition and let B denote the com-pact/elliptic Cartan subgroup B = exp(θ(X − Y )), θ ∈ [0, 2π). Let L denotethe non-compact/hyperbolic Cartan subgroup L = exp(tH), t ∈ R. Let GBdenote the conjugacy class of B, ie, GB is the elliptic subgroup gBg−1, g ∈ Gand likewise GL will denote the conjugacy class of L ie, the hyperbolic subgroupgLg−1, g ∈ G. These two subgroups have only the identity element in common.Now let Θm denote the discrete series characters. We know that

Θm(u(θ)) = −sgn(m)exp(imθ)

exp(iθ)− exp(−iθ) ,m ∈ Z, u(θ) = exp(θ(X − Y ))

Θm(a(t)) =exp(mt)

exp(t)− exp(−t) , a(t) = exp(tH)

Weyl’s integration formula gives

Θm(f) =

G

Θm(x)f(x)dx =

∫ 2π

0

Θm(u(θ))Ff,B(theta)∆B(θ)dθ

+

R

Θm(a(t))Ff,L(t)∆L(t)dt

=

sgn(m)exp(imθ)Ff,B(θ)dθ +

R

exp(−|mt|)Ff,L(t)dt

= sgn(m)Ff,B(m) +

R

exp(−|mt|)Ff,L(t)dt

where Ff,B and Ff,L are respectively the orbital integrals of f correspondingto the elliptic and hyperbolic groups and if a(θ) is a function of θ, then a(m)denotes its Fourier transform, ie, its Fourier series coefficients. Thus,

Ff,B(θ) = ∆B(θ)

G/B

f(xu(θ)x−1)dx

Ff,L(a(t)) = ∆L(t)

G/L

f(xa(t)x−1)dx

where∆(θ) = (exp(iθ)− exp(−iθ)),∆(t) = exp(t)− exp(−t)

Now∑

modd

|m|Θm(f) =

∫ 2π

0

∆L(t)Ff,L(t)∑

modd

|m|Θm(a(t))dt

+

R

∆B(θ)Ff,B(θ).∑

modd

|m|Θm(u(θ))dθ

78 CHAPTER 4. RESEARCH TOPICS ON REPRESENATIONS,

Now,∑

modd

|m|Θm(a(t)) =∑

modd

|m|exp(−|mt|)/(exp(t)− exp(−t))

m≥1,modd

mexp(−mt) =∑

k≥0

(2k + 1)exp(−(2k + 1)t)

m≥0

m.exp(−mt) = −d/dt(1− exp(−t))−1 = exp(−t)(1− exp(−t))−2, t ≥ 0

So

k≥0

(2k+1)exp(−(2k+1)t) = exp(−t)(2exp(−2t)(1−exp(−2t))−2+(1−exp(−2t))−1)

= exp(−t)(1+exp(−2t))/(1−exp(−2t))2 = (exp(t)+exp(−t))/(exp(t)−exp(−t))2

Thus,

modd

|m|exp(−|mt|) = 2∑

m≥0

m.exp(−m|t|) = tanh(t) = w1(t)

say.

modd

|m|Θm(u(θ)) =∑

modd

m.exp(imθ)/(exp(iθ)− exp(−iθ)) = w2(θ)

say. Then, the Fourier transform of F ′f,B(θ) (ie Fourier series coefficients)

F ′f,B(m) =

F ′f,B(θ)exp(imθ)dθ =

−∫

Ff,B(θ)(im)exp(imθ)dθ = −imFf,B(m)

Now, we’ve already seen that

Θm(f) = sgn(m)Ff,B(m) +

R

exp(−|mt|)Ff,L(t)dt

So

−imΘm(f) = sgn(m)(−imFf,B(m))− im

exp(−|mt|)Ff,L(t)dt

= sgn(m)F ′f,B(m)− im

exp(−|mt|)Ff,L(t)dt

Multiplying both sides by sgn(m) gives us

−i|m|Θm(f) = F ′f,B(m)− i|m|

exp(−|mt|)Ff,L(t)dt

4.4. SOME APPLICATIONS OF THE PLANCHEREL FORMULA FORA LIE GROUPTO IMAGE PROCESSING79

Summing over odd m gives

−i∑

modd

|m|Θm(f) =∑

modd

F ′f,B(m)− i

w1(t)Ff,L(t)dt

Now,∑

m

F ′f,B(m) = F ′

f,B(0)− i

w1(t)Ff,L(t)dt

Now,

Ff,B(θ) = (exp(iθ)− exp(−iθ))∫

G/B

f(xu(θ)x−1)dx

which gives on differentiating w.r.t θ at θ = 0

F ′f,B(0) = 2if(1)

Thus, we get the celebrated Plancherel formula for SL(2,R),

2f(1) = −∑

m

|m|Θm(f) +∫

w1(t)Ff,L(t)dt

Now we know that the Fourier transform

Ff,L(µ) =

R

Ff,L(t)exp(itµ)dt

equals Tiµ(f), the character of the Principal series corresponding to iµ, µ ∈ R.Thus defining

w1(µ) =

R

w1(t)exp(itµ)dt

the Plancherel formula can be expressed in explicit form:

2f(1) = −∑

m

|m|Θm(f) +∫

R

w(µ)∗.Tiµ(f)dµ

This formula was generalized to all semisimple Lie groups and also to p-adicgroups by Harish-Chandra.

4.4 Some applications of the Plancherel formulafor a Lie group to image processing

Let G be a group and let its irreducible characters be χ, χ ∈ G. let dµ(χ) be itsPlancherel measure:

δe(g) =

χ(g).dµ(χ)

or equivalently,

f(e) =

G

χ(f)dµ(χ)

80 CHAPTER 4. RESEARCH TOPICS ON REPRESENATIONS,

for all f in an appropriate Schwartz space. Consider an invariant for imagepairs of the form

Iχ(f1, f2) =

G×Gf1(x)χ(xy

−1)f2(y)dxdy

Then if ψ is any class function on G, we have

G

χ(ψ).Iχ(f1, f2)dµ(χ) =

f1(x)χ(ψ0)χ(xy−1)f2(y)dxdydµ(χ)

=

f1(x)ψ1(xy−1)f2(y)dxdy

where

ψ1(x) =

χ(ψ0)χ(x)dµ(χ), ψ0(x) = ψ(x−1)

so that

χ(ψ0) =

ψ(x−1)χ(x)dx

Remark: let πχ be an irreducible representation having character χ. Then, wecan write

f(x) =

Tr(f(πχ)πχ(x))dµ(χ)

where

f(πχ) =

f(x)πχ(x−1))dx

Thus, in particular, if f is a class function,

f(x) =

f(y)Tr(πχ(y−1)πχ(x))dµ(χ)dy

=

f(y)χ(y−1x)dµ(χ)dy

as follows from the Plancherel formula.Problem: What is the relationship between ψ1(x) and ψ(x) when ψ is a class

function ?Note that if the group is compact, then if ψ is a class function and π is an

irreducible unitary representation of dimension d, we have

ψ(π) =

ψ(y)π(y)∗)dy =

ψ(xyx−1)π(y−1)dydx

=

ψ(y)π(x−1y−1x)dydx

4.4. SOME APPLICATIONS OF THE PLANCHEREL FORMULA FORA LIE GROUPTO IMAGE PROCESSING81

Now,∫

πab(x−1yx)dx =

πac(x−1)πcd(y)πdb(x)dx

= πcd(y)

πca(x).πdb(x)dx = d−1πcd(y)δ(c, d)δ(a, b)

= d−1πcc(y)δ(a, b) = d−1χ(y)δ(a, b)

where χ = Tr(π) is the character of π. This means that

ψ(π) = d−1χ(ψ)I

Thus, for a class function on a compact group, we have

ψ(g) =∑

χ∈G

d(χ)−1Tr(χ(ψ0)π(g))

=∑

χ∈G

d(χ)−1χ(ψ0)χ(g)

=∑

χ∈G

d(χ)−1χ(g)

ψ(x)χ(x)dx

This is the same as saying that

δg(x) =∑

χ∈G

d(χ)−1χ(g)χ(x)

Is this result true for an arbitrary semisimple Lie group when we replace d(χ) bythe Plancherel measure/formal dimension on the space of irreducible characters? In other words, is the result

δg(x) =

G

χ(g)χ(x)dµ(χ)

true ?

82 CHAPTER 4. RESEARCH TOPICS ON REPRESENATIONS,

Chapter 5

Some problems in quantuminformation theory

A problem in quantum information theory used in the proof of the coding the-orem for Cq channels : Let S, T be two non-negative operators with S ≤ 1, ie,0 ≤ S ≤ 1, T ≥ 0. Then

1− (S + T )−1/2S(S + T )−1/2 ≤ 2− 2S + 4T

or equivalently,

(S + T )−1/2S(S + T )−1/2 ≥ 2S − 1− 4T

To prove this, we define the operators

A =√T ((T + S)−1/2 − 1), B =

√T

Then, the equation(A−B)∗(A−B) ≥ 0

or equivalently,A∗A+B∗B ≥ A∗B +B∗A

gives

((T+S)−1/2−1)T+T ((T+S)−1/2−1) ≤ T+((T+S)−1/2−1)T ((T+S)−1/2−1)

This equation can be rearranged go give

2(T + S)−1/2T + 2T (T + S)−1/2 − 4T ≤ (T + S)−1/2T.(T + S)−1/2

= (T + S)−1/2(T + S − S)(T + S)−1/2 = 1− (T + S)−1/2S(T + S)−1/2

A rearrangement of this equation gives

4T+1+(S+T )−1/2S(S+T )−1/2 ≥ 2(S+T )−1/2S(S+T )−1/2+2(S+T )−1/2T+2T (S+T )−1/2

83

84CHAPTER 5. SOME PROBLEMS IN QUANTUM INFORMATION THEORY

So to prove the stated result, it suffices to prove that

(S + T )−1/2S(S + T )−1/2 + (S + T )−1/2T + T (S + T )−1/2 ≥ S

Let A,B be two states. The hypothesis testing problem amounts to choosingan operator 0 ≤ T ≤ 1 such that

C(T ) = Tr(A(1− T )) + Tr(BT )

is a minimum. Now,C(T ) = 1 + Tr((B −A)T )

It is clear that this is a minimum iff

T = B −A ≤ 0 = A−B ≥ 0

ie T is the projection onto the space spanned by those eigenvectors of B−A thathave positive eigenvalues or equivalently the space spanned by those eigenvectorsof A − B that have non-negative eigenvalues. The minimum value of the costfunction is then

C(min) = Tr(AA < B) + Tr(BA > B)

Note that the minimum is attained when T is a projection.An inequality: Let 0 ≤ s ≤ 1. Then if A,B are any two positive operators,

we haveTr(AA < B) ≤ Tr(A1−sBs)

To see this we note that

Tr(A1−sBs) ≤ Tr((B+(A−B)+)1−sBs) ≤ Tr(B+(A−B)+)

1−s(B+(A−B)+)s)

= Tr(B+(A−B)+) = Tr(B)+Tr((A−B)+) = Tr(B)+Tr((A−B)A−B > 0)= Tr(BA−B < 0) + Tr(A.A−B > 0)

Other related inequalities are as follows:

Tr(A1−sBs) ≥ Tr((B − (B − A)+)1−sBs)

≥ Tr((B − (B −A)+)1−s.(B − (B −A)+)

s) = Tr((B − (B −A)+)

= Tr(B)− Tr((B −A)+) = Tr(B)− Tr((B −A)B > A)= Tr(B.A > B) + Tr(A.B > A)

In particular, when A,B are states, the minimum probability of discriminationerror has the following upper bound:

P (error) = Tr(A.B > A) + Tr(B.A > B) ≤ min0≤s≤1Tr(A1−sBs)

85

= min0≤s≤1exp(φ(s|A|B))

whereφ(s|A|B) = log(Tr(A1−sBs))

To obtain the minimum in the upper bound, we set the derivative w.r.t s tozero to get

d/ds(Tr(A1−sBs) = 0

orTr(log(A)A1−sBs) = Tr(A1−sBslog(B))

It is not clear how to solve this equation for s. Now define

ψ1(s) = log(Tr(A1−sBs))/s, ψ2(s) = log(Tr(A1−sBs)/(1− s)

Then an elementary application of Le’Hopital’s rule gives us

ψ1(0+) = lims↓0ψ1(s) = −D(A|B) = −Tr(A.(log(A)− log(B)),

ψ2(1−) = lims↑1ψ2(s) = −D(B|A) = −Tr(B.(log(B)− log(A))

Consider now the problem of distinguishing between tensor product states A⊗n

and B⊗n. Let Pn(e) denote the minimum error probability. Then, by the above,

Pn(e) ≤ min0≤s≤1exp(logT r(A⊗n(1−s)B⊗ns)

Now,Tr(A⊗n(1−s)B⊗ns) = (Tr(A1−sBs))n

Therefore, we can write

limsupn→∞n−1log(Pn(e)) ≤

min0≤s≤1logT r(A1−sBs)

DefineF (s) = Tr(A1−sBs)

Then,F ′(s) = −Tr(A1−slog(A)Bs) + Tr(A1−sBslog(B))

F ′′(s) = Tr(A1−s(logA)2Bs) + Tr(A1−sBs(log(B))2)

−Tr(A1−slog(A)Bslog(B)) + Tr(A1−s.log(A).Bs.log(B))

= Tr(A1−s(logA)2Bs) + Tr(A1−sBs(log(B))2) ≥ 0

Hence F (s) is a convex function and hence F ′(s) is increasing. Further,

F ′(0) = −Tr(A.log(A))+Tr(A.log(B)) = −D(A|B) ≤ 0, F ′(1) = −Tr(B.log(A))+Tr(B.log(B)) = D(B|A) ≥ 0

Hence F ′(s) changes sign in [0, 1] and hence it must be zero at some s ∈ [0, 1],say at s0 and since F is convex, it attains its minimum at s0.

86CHAPTER 5. SOME PROBLEMS IN QUANTUM INFORMATION THEORY

Let ρ, σ be two states and let

ρ =∑

i

p(i)|ei >< ei|, σ =∑

i

q(i)|fi >< fi|

be their respective spectral decompositions. Define

p(i)| < ei|fj > |2 = P (i, j), q(j)| < ei|fj > |2 = Q(i, j)

Then P,Q are both classical probability distributions. We have for 0 ≤ T ≤ 1,

Tr(ρ(1 − T )) =∑

i,j

< ei|ρ|fj >< fj |1− T |ei >

=∑

i,j

p(i) < ei|fj > (< fj|ei > − < fj|T |ei >)

= 1−∑

i,j

p(i) < ei|fj > t(j, i)

and likewise,

Tr(σT ) =∑

i,j

< ei|σ|fj > t(j, i) =∑

i,j

q(j) < ei|fj > t(j, i)

wheret(j, i) =< fj |T |ei >

So,

Tr(ρ(1− T )) + Tr(σT ) = 1 +∑

i,j

t(j, i)(q(j)− p(i)) < ei|fj > t(j, i)

Now,

|∑

i,j

p(i) < ei|fj > t(j, i)|2 ≤∑

i,j

p(i)|ei|fj > |2.∑

i,j

p(i)|t(j, i)|2

=∑

i,j

p(i)|t(j, i)|2

Thus,

Tr(ρ(1− T )) ≥ 1−∑

i,j

p(i)|t(j, i)|2

Tr(σT ) =∑

i,j

q(j) < ei|fj > t(j, i)

SoTr(σT )2 ≤

i,j

q(j)|t(j, i)|2

87

Then, if c is any positive real number,

Tr(ρ(1− T ))− c.T r(σT ) ≥ 1−∑

i,j

p(i)|t(j, i)|2 − c.∑

i,j

q(j)|t(j, i)|2

Suppose now that T is a projection. Then,

Tr(ρ(1 − T )) =∑

i

p(i) < ei|1− T |ei >=∑

i

p(i) < ei|(1− T )2|ei >

=∑

i,j

p(i)| < ei|1− T |fj > |2 ≥i,j a(i, j)| < ei|1− T |fj > |2

and likewise,

Tr(σT ) =∑

i,j

q(j)| < fj |T |ei > |2 ≥∑

i,j

a(i, j)| < fj|T |ei > |2

wherea(i, j) = min(p(i), q(j))

Now,

| < ei|1− T |fj > |2+ < ei|T |fj > |2 ≥ (1/2)| < ei|1− T |fj > + < ei|T |fj > |2

= (1/2)| < ei|fj > |2

Thus,

Tr(ρ(1− T )) + Tr(σT ) ≥ (1/2)∑

i,j

a(i, j)| < ei|fj > |2

= (1/2)∑

i,j

P (i, j)χp(i)≤q(j) + (1/2)∑

i,j

Q(i, j)χp(i)>q(j)

= (1/2)∑

i,j

P (i, j)χP (i,j)≤Q(i,j) + (1/2)∑

i,j

Q(i, j)χP (i,j)>Q(i,j)

Now, for tensor product states, this result gives

Tr(ρ⊗n(1− Tn)) + Tr(σ⊗nTn) ≥

(1/2)Pn(Pn ≤ Qn) + (1/2)Qn(Pn > Qn)

for any measurement Tn in the tensor product space, ie, 0 ≤ Tn ≤ 1. Now bythe large deviation principle as n→ ∞, we have

n−1.log(Pn(Pn ≥ Qn)) ≈ −infx≤0I1(x)

where I1 is the rate function under Pn of the random family n−1log(Pn/Qn).Evaluating the moment generating function of this r.v gives us

Mn(ns) = EPnexp(s.log(Pn/Qn)) = EPn(Pn/Qn)s

88CHAPTER 5. SOME PROBLEMS IN QUANTUM INFORMATION THEORY

= (EP (P/Q)s)n = (∑

P 1−sQs)n =Ms(P,Q)n

and hence the rate function is

I1(x) = sups(sx− logMs(P,Q)) =

sups(sx − log(∑

ω

P (ω)1−sQ(ω)s))

Then,

infx≤0I1(x) = sups≥0 − log(∑

ω

P (ω)1−sQ(ω)s))

= −infs≥0log(∑

ω

P (ω)1−sQ(ω)s))

This asymptotic result can also be expressed as

liminfn→∞inf0≤Tn≤1[n−1log(Tr(ρ⊗n(1− Tn)) + Tr(σ⊗nTn))]

≥ −inf0≤s≤1[log(∑

P 1−sQs)]− inf0≤s≤1[log∑

Q1−sP s]

Relationship between quantum relative entropy and classical relative en-tropy:

Tr(ρ.log(ρ)) =∑

i

p(i)log(p(i))

Tr(ρ.log(σ)) =∑

i,j

p(i)log(q(j)) < ei|fj > |2

D(ρ|σ) = Tr(ρ.log(ρ)− ρ.log(σ))

Therefore,

D(ρ|σ) =∑

i,j

p(i) < ei|fj > |2(log(p(i)) + log(| < ei|fj > |2))

−∑

i,j

p(i) < ei|fj > |2(log(q(j)) + log(| < ei|fj > |2))

(Note the cancellations)

=∑

i,j

P (i, j)log(P (i, j))−∑

i,j

P (i, j)log(Q(i, j))

= D(P |Q)

Let X be a r.v. with moment generating function

M(λ) = E[exp(λX)]

5.1. QUANTUM INFORMATION THEORYANDQUANTUMBLACKHOLE PHYSICS89

Its logarithmic moment generating function is

Λ(λ) = log(M(λ))

Let µ = E(X) be its mean. Then,

µ = Λ′(0)

Further,Λ′′(λ) ≥ 0∀λ

ie Λ is a convex function. Hence Λ′(λ) is increasing for all λ. Now supposex ≥ µ. Consider

ψ(x, λ) = λ.x− Λ(λ)

Since∂2λψ(x, λ) = −Λ′′(λ) < 0

it follows that ∂λψ(x, λ) is decreasing or equivalently that ψ(x, λ) is concave inλ. Suppose λ0 is such that

∂λψ(x, λ0) = 0

Then it follows that for a fixed value of x, ψ(x, λ) attains its maximum atλ0. Now ∂λψ(x, λ0) = 0 and ∂λψ(x, λ) is decreasing in λ. It follows that∂λψ(x, λ) ≤ 0 for λ ≤ λ0. Hence, ψ(x, λ) is decreasing for λ ≥ λ0. Also forλ < λ0, ∂λψ(x, λ) ≥ 0 (since ∂λψ(x, λ0) = 0 and ∂λψ(x, λ) is decreasing for allλ) which means that ψ(x, λ) is increasing for λ < λ0. Now

∂λψ(x, 0) = x− µ ≥ 0

by hypothesis. Since ∂λψ(x, λ) is decreasing in λ, it follows that ∂λψ(x, λ) ≥ 0for all λ ≤ 0. This means that ψ(x, λ) is increasing for all λ ≥ 0. In particular,

ψ(x, λ) ≤ ψ(x, 0) = x, ∀λ ≤ 0

Thus,I(x) = supλ∈Rψ(x, λ) = supλ≥0ψ(x, λ)

in the case when x ≥ µ. Consider now the case when x ≤ µ.

5.1 Quantum information theory and quantumblackhole physics

Let ρ1(0) be the state of the system of particles within the event horizon of aSchwarzchild blackhole and ρ2(0) the state outside. At time t = 0, the state ofthe total system to the interior and the exterior of the blackhole is

ρ(0) = ρ1(0)⊗ ρ2(0)

90CHAPTER 5. SOME PROBLEMS IN QUANTUM INFORMATION THEORY

Here, as a prototype example, we are considering the Klein-Gordon field ofparticles in the blackhole as well as outside it. Let χ(x) denote this KG field.Its action functional is given by

S[χ] = (1/2)

(gµν(x)√

−g(x)χ,µ(x)χ,ν(x)− µ2√

−g(x)χ(x)2)d4x

= (1/2)

[gµν(x)χ,µ(x)χ,ν(x)− µ2χ(x)2]√

−g(x)d4x

which in the case of the Schwarzchild blackhole reads

S[χ] = (1/2)

[α(r)−1χ2,t−α(r)χ2

,r−r−2χ2,θ−r−2sin(θ)−2χ2

,φ]r2sin(θ)dtdrdθdφ =

Ld4x

The corresponding KG Hamiltonian can also be written down. The canonicalposition field is χ and the canonical momentum field is

π = δS/δχ,t = α(r)−1χ,t

so that the Hamiltonian density is

H = π.χ,t − L =

(1/2)[α(r)−1χ2,t + α(r)χ2

,r + r−2χ2,θ + r−2sin(θ)−2χ2

,φ]

= (1/2)[α(r)π2 + α(r)χ2,r + r−2χ2

,θ + r−2sin(θ)−2χ2,φ + µ2χ2]

The KG field Hamiltonian is then

H =

Hd3x =

Hr2sin(θ)drdθdφ

The radial Hamiltonian: Assume that the angular oscillations of the KGfield are as per the classical equations of motion of the field corresponding toan angular momentum square of l(l+ 1). Then, we can write

χ(t, r, θ, φ) = ψ(t, r)Ylm(θ, φ)

an substituting this into the action functional, performing integration by partsusing

L2Ylm(θ, φ) = l(l+ 1)Ylm(θ, φ),

L2 = −(sin(θ)−1 ∂

∂θsin(θ)

∂θ+

1

sin2(θ)

∂2

∂φ2

gives us the ”radial action functional” as

Sl[φ] = (1/2)

(α(r)−1ψ2,t − α(r)ψ2

,r − l(l + 1)ψ2/r2 − µ2ψ2)r2dtdr

and the corresponding radial Hamiltonian as

Hl(ψ, π) = (1/2)

∫ ∞

0

[α(r)(π2 + ψ2,r) + (µ2 + l(l+ 1)/r2)ψ2]r2dr

where ψ = ψ(t, r), π = π(t, r), the canonical position and momentum fields arefunctions of only time and the radial coordinate.

5.2. HARMONIC ANALYSIS ON THE SCHWARZ SPACE OFG = SL(2,R)91

5.2 Harmonic analysis on the Schwarz space of

G = SL(2,R)

Let fmn(x, µ) denote the matrix elements of the unitary principal series ap-pearing in the Plancherel formula and let φkmn(x) the matrix elements of thediscrete series appearing in the Plancherel formula. Then we have a splittingof the Schwarz space into the direct sum of the discrete series Schwarz spaceand the principal series Schwarz space. Thus, for any f ∈ L2(G), we have thePlancherel formula∫

G

|f(x)|2dx =∑

kmn

d(k)| < f |φkmn > |2 +∫

R

| < f |fmn(., µ) > |2c(mn, µ)dµ

where d(k)′s are the formal dimensions of the discrete series while c(mn, µ)are determined from the wave packet theory for principal series, ie, from theasymptotic behaviour of the principal series matrix elements. The asymptoticbehaviour of the principal and discrete matrix elements is obtained from thedifferential equations

Ω.fmn(x, µ) = −µ2fmn(x, µ)

where Ω is the Casimir element for SL(2,R). We can express x ∈ G as x =u(θ1)a(t)u(θ2) and Ω as a second order linear partial differential operator in∂/∂t, ∂/∂θk, k = 1, 2. Then, observing that

fmn(u(θ1)a(t)u(θ2), µ) = exp(i(mθ1 + nθ2))fmn(a(t), µ)

we get on noting that the restriction of the Casimir operator to R.H is given by

(Ωf)(a(t)) = ∆L(t)−1 d

2

dt2∆L(t)f(a(t))

where∆L(t) = et − e−t

as follows by applying both sides to the finite dimensional irreducible charactersof SL(2,R) restricted to R.H on noting that the restriction of such a character isgiven by (exp(mt)−exp(−mt))/(exp(t)−exp(−t)), we get a differential equationof the form

d2

dt2fmn(a(t), µ)− qmn(t)fmn(a(t), µ) = −µ2fmn(a(t), µ)

where qmn(t) is expressed in terms of the hyperbolic Sinh functions. Fromthis equation, the asymptotic behaviour of the matrix elements fmn(a(t), µ) as|t| → ∞ may be determined. It should be noted that the discrete series modulescan be realized within the principal series modules and hence the asymptotic be-haviour of the matrix elements fmn(a(t), µ) will determine that of the principalseries when µ is non-integral and that of the discrete series when µ is integral.

92CHAPTER 5. SOME PROBLEMS IN QUANTUM INFORMATION THEORY

Specifically, consider a principal series representation π(x). Let em denote itshighest weight vector and define

fm(x) =< π(x)em|em >

where now x is regarded as an element of the Lie algebra g of G. We have forany a ∈ U(g), the universal enveloping algebra of g, the fact that

π(a).em = 0 =⇒ afm(x) = f(x; a) = 0

This means that if we consider the U(g)-module V = U(g)fm and the mapT : π(U(g))em → V defined by T (π(a)em) = a.fm, then it is clear that T is awell defined linear map with kernel

KT = π(a)em : a ∈ U(g), a.fm = 0

Note that π(a)em ∈ KT implies a.fm = 0 implies b.a.fm = 0 implies π(b)π(a)em =π(b.a)em ∈ KT for any a, b ∈ U(g). Thus, KT is U(g) invariant. Then, theU(g)-module V is isomorphic to the quotient module π(U(g))em/KT . However,π(U(g))em is an irreducible (discrete series) module. Hence, the submoduleKT of it must be zero and hence, we obtain the fundamental conclusion thatV = U(g)fm is an irreducible discrete series module. Note that fm is a princi-pal series (reducible since m is an integer) matrix element and this means thatwe have realized the discrete series representations within the principal series.This also means that in order to study the asymptotic properties of the matrixelements of the discrete series, it suffices to study asymptotic properties of dif-ferential operators in U(g) acting on the principal series matrix elements fm.

5.3 Monotonicity of quantum entropy under pinch-

ing

Let |ψ1 >, ..., |ψn > be normalized pure states, not necessarily orthogonal. Letp(i), i = 1, 2, ..., n be a probability distribution and let |i >, i = 1, 2, ..., n be anorthonormal set. Consider the pure state

|ψ >=n∑

i=1

p(i)|ψi > ⊗|i >

Note that this is a state because

< ψ|ψ >=∑

i,j

p(i)p(j) < ψj |ψi >< j|i >=∑

i,j

p(i)p(j) < ψj |ψi > δ(j, i) =∑

i

p(i) = 1

We haveρ1 = Tr2(|ψ >< ψ|) =

i

p(i)|ψi >< ψi|

5.3. MONOTONICITY OF QUANTUM ENTROPY UNDER PINCHING 93

ρ2 = Tr1(|ψ >< ψ| =∑

i

p(i)p(j) < ψj |ψi > |i >< j|

Then by Schmidt’s theorem,

H(ρ1) = H(ρ2) ≤ H(p)

Now let σ1, σ2 be two mixed states and let p > 0, p + q = 1. Consider thespectral decompositions

σ1 =∑

i

p(i)|ei >< ei|, σ2 =∑

i

q(i)|fi > fi|

Then,

pσ1 + qσ2 =∑

i

(pp(i)|ei >< ei|+ qq(i)|fi >< fi|)

So by the above result,

H(pσ1 + qσ2) ≤ H(pp(i), qq(i))

= −∑

i

(pp(i)log(pp(i)) + qq(i)log(qq(i)))

= −p.log(p)− q.log(q) + pH(p(i))+ qH(qi) = H(p, q) + p.H(σ1) + q.H(σ2)

whereH(p, q) = −p.log(p)− q.log(q) + pH(p(i))

Now consider several mixed states σ1, ..., σn and a probability distribution p(i), i =1, 2, ..., n. Then

n∑

i=1

p(i)σi = p(1)σ1 + (1− p(1))

n∑

i=2

p(i)/(1− p(1))σi

So the same argument and induction gives

H(

n∑

i=1

p(i)σi) ≤ H(p(1), 1−p(1))+p(1)H(σ1)+(1−p(1))n∑

i=2

(p(i)/(1−p(1)))H(σi)

+H(p(i)/(1− p(1)), i = 2, 3, ..., n)

= H(p(i), i = 1, 2, ..., n) +

n∑

i=1

p(i)H(σi)

Remark: Let ρ be a state and Pi a PVM. Define the state

σ =∑

i

Piρ.Pi = 1−∑

i

Pi(1− ρ)Pi

Then0 ≤ D(ρ|σ) = Tr(ρ.log(ρ))− Tr(ρ.log(σ))

94CHAPTER 5. SOME PROBLEMS IN QUANTUM INFORMATION THEORY

= −H(ρ)−∑

i

Tr(ρ.Pi.log(Piρ.Pi)Pi) = −H(ρ)−∑

i

Tr(Piρ.Pi.log(Piρ.Pi)

= −H(ρ)−∑

i

Tr(Piρ.Pi.log(∑

j

Pjρ.Pj)) = −H(ρ) +H(σ)

Thus,H(σ) ≥ H(ρ)

which when stated in words, reads, pinching increases the entropy of a state.Remark: We’ve made use of the following facts from matrix theory:

log(∑

i

Pi.ρ.Pi) =∑

i

Pi.log(ρ)Pi

because the P ′is satisfy PjPi = 0, j 6= i. In fact, we write

i

Piρ.Pi = 1−∑

i

Pi(1 − ρ)Pi = 1−A

and uselog(1− z) = −z − z2/2− z3/3− ...

combined with PjPj = Piδ(i, j) to get

log(∑

i

Piρ.Pi) = −A−A2/2−A3/3− ...

where

A =∑

i

Pi(1 − ρ)Pi, A2 =

i

Pi(1− ρ)2Pi, ..., An = −

i

Pi(1− ρ)nPi

so thatlog(1−A) =

i

Pi.log(ρ)Pi

A particular application of this result has been used above in the form

H(∑

i

p(i)p(j) < ψj |ψi > |i >< j|) ≤ H(∑

i,k

p(i)p(j) < ψj |ψi >< k|i >< j|k > |k >< k|)

= H(∑

k

p(k)|k >< k|) = H(p(k)) = −∑

k

p(k).log(p(k))

Chapter 6

Lectures on StatisticalSignal Processing

[1] Large deviation principle applied to the least squares problem of estimatingvector parameters from sequential iid vector measurements in a linear model.

X(n) = H(n)θ + V (n), n = 1, 2, ...

V (n)′s are iid and have the pdf pV . Then θ is estimated based on the measure-ments X(n), n = 1, 2, ..., N using the maximum likelihood method as

θ(N) = argmaxθN−1

N∑

n=1

log(pV (X(n)−H(n)θ))

Consider now the case when H(n) is a matrix valued stationary process in-dependent of V (.). Then, the above maximum likelihood estimate is replacedby

θ(N) = argmaxθ

[ΠNn=1pV (X(n)−H(n)θ)]pH(H(1), ..., H(N))ΠNj=1dH(j)

In the special case when the H(n)′s are iid, the above reduces to

θ(N) = argmaxθN−1

N∑

n=1

log(

pV (X(n)−Hθ)pH(H)dH)

The problem is to calculate the large deviation properties of θ(N), namely the

rate at which P (|θ(N)− θ| > δ) converges to zero as N → ∞.

95

96 CHAPTER 6. LECTURES ON STATISTICAL SIGNAL PROCESSING

6.1 Large deviation analysis of the RLS algo-

rithm

Consider the problem of estimating θ from the sequential linear model

X(n) = H(n)θ + V (n), n ≥ 1

sequentiall. The estimate based on data collected upto time N is

θ(N) = [

N∑

n=1

H(n)TH(n)]−1[

N∑

n=1

H(n)TX(n)]

To cast this algorithm in recursive form, we define

R(N) =

N∑

n=1

H(n)TH(n), r(N) =

N∑

n=1

H(n)TX(n)

Then,

R(N+1) = R(N)+H(N+1)TH(N+1), r(N+1) = r(N)+H(N+1)TX(N+1)

Then,

R(N+1)−1 = R(N)−1−R(N)−1H(N+1)T (I+H(N+1)H(N+1)T )−1H(N+1)R(N)−1

so that

θ(N + 1) = θ(N) +K(N + 1)(X(N + 1)−H(N + 1)θ(N))

whereK(N+1) is a Kalman gain matrix expressible as a function ofR(N), H(N+1). Our aim is to do a large deviation analysis of this recursive algorithm. Define

δθ(N) = θ(N)− θ

Then we have

δθ(N + 1) = δθ(N) +K(N + 1)(X(N + 1)−H(N + 1)θ −H(N + 1)δθ(N))

= δθ(N) +K(N + 1)(V (N + 1)−H(N + 1)δθ(N))

We assume that the noise process V (.) is small and perhaps even non-Gaussian.This fact is emphasized by replacing V (n) with ǫ.V (n). Then, the problemis to calculate the relationship between the LDP rate function of δθ(N) andδθ(N + 1).

6.2. PROBLEM SUGGESTEDBYMYCOLLEAGUE PROF. DHANANJAY97

6.2 Problem suggested by my colleague Prof.

Dhananjay

Large deviation analysis of magnet with time varying magnetic moment fallingthorugh a coil. Assume that the coil is wound around a tube with n0 turns perunit length extending from a height of h to h + b. The tube is cylindrical andextends from z = 0 to z = d. Taking relativity into account via the retardedpotentials, if m(t) denotes the magnetic moment of the magnet, then the vectorpotential generated by it is approximately obtained from the retarded potentialformula as

A(t, r) = µm(t− r/c)× r/4πr3

where we assume that the magnet is located near the origin. This approximatesto

A(t, r) = µm(t)× r/4πr3 − µm′(t)× r/4πcr2

Now Assume that the magnet falls through the central axis with its axis alwaysbeing oriented along the −z direction. Then, after time t, let its z coordinatebe ξ(t). We have the equation of motion

Mξ′′(t) = F (t)−Mg

where F (t) is the z component of the force exerted by the magnetic field gener-ated by the current induced in the coil upon the magnet. Now, the flux of themagnetic field generated by the magnet through the coil is given by

Φ(t) = n0

X2+Y 2≤R2T,h≤z≤h+b

Bz(t,X, Y, Z)dXdY dZ

whereBz(t,X, Y, Z) = z.B(t,X, Y, Z),

B(t,X, Y, Z) = curlA(t,R− ξ(t)z)

whereR = (X,Y, Z)

and where RT is the radius of the tube. The current through the coil is then

I(t) = Φ′(t)/R0

where R0 is the coil resistance. The force of the magnetic field generated bythis current upon the falling magnet can now be computed easily. To do this,we first compute the magnetic field Bc(t,X, Y, Z) generated by the coil usingthe usual retarded potential formula

Bc(t,X, Y, Z) = curlAc(t,X, Y, Z)

Ac(t,X, Y, Z) = (µ/4π)

∫ 2n0π

0

(I(t−|R−RT (x.cos(φ)+y.sin(φ))−z(b/2n0π)φ|/c).

98 CHAPTER 6. LECTURES ON STATISTICAL SIGNAL PROCESSING

.(−x.sin(φ) + y.cos(φ))/|R −RT (x.cos(φ) + y.sin(φ))− z(b/2n0π)φ|)RT dφNote that Ac(t,X, Y, Z) is a function of ξ(t) and also ofm(t) andm′(t). It followsthat I(t) is a function of ξ(t), ξ′(t),m(t),m′(t),m′′(t). Then we calculate theinteraction energy between the magnetic fields generated by the magnet and bythe coil as

Uint(ξ(t), ξ′(t), t) = (2µ)−1

(Bc(t,X, Y, Z), B(t,X, Y, Z))dXdY dZ

and then we get for the force exerted by the coil’s magnetic field upon the fallingmagnet as

F (t) = −∂Uint(ξ(t), ξ′(t), t)/∂ξ(t)Remark: Ideally since the interaction energy between the two magnetic fields

appears in the Lagrangian∫

(E2/c2 − B2)d3R/2µ, we should write down theEuler-Lagrange equations using the Lagrangian

L(t, ξ′(t), ξ′(t)) =Mξ′(t)2/2− Uint(ξ(t), ξ′(t), t)−Mgξ(t)

in the formd

dt∂L/∂ξ′ − ∂L/∂ξ = 0

which yields

Mξ′′(t)− d

dt∂U/∂ξ′ + ∂Uint/∂ξ +Mg = 0

6.3 Fidelity between two quantum states

Let ρ, σ be two quantum states in the same Hilbert space H1. Let P (ρ), P (σ)denote respectively the set of all purifications of ρ and σ w.r.t the same referenceHilbert space H2. Thus, if |u >∈ H1 ⊗H2 is a purification of ρ, ie |u >∈ P(ρ),then

ρ = Tr2|u >< u|and likewise for σ. Then, the fidelity between ρ and σ is defined as

F (ρ, σ) = sup|u>∈P (ρ),|v>∈P (σ)| < u|v > |

Theorem:

F (ρ, σ) = Tr(|√ρ.√σ|)Proof: Let |u >∈ P (ρ). Then, we can write

|u >=∑

i

|ui ⊗ wi >

6.3. FIDELITY BETWEEN TWO QUANTUM STATES 99

where |wi > is an onb for H2 and

ρ =∑

i

|ui >< ui|

Likewise if |v > is a purification of σ, we can write

|v >=∑

i

|vi ⊗ wi >

where

σ =∑

i

|vi >< vi|

But then

< u|v >=∑

i

< ui|vi >

Define the matrices

X =∑

i

|ui >< wi|, Y =∑

i

|vi >< wi|

Then we have

ρ = XX∗, σ = Y Y ∗

Hence, by the polar decomposition theorem, we can write

X =√ρU, Y =

√σV

where U, V are unitary matrices. Then,

< u|v >=∑

i

< ui|vi >= Tr∑

i

|vi >< ui| = Tr(Y X∗) = Tr(√σV U∗√ρ)

= Tr(√ρ√σV U∗) = Tr(

√ρ.√σW )

where W = V U∗ is a unitary matrix. It is then clear from the singular valuedecomposition of

√ρ.√σ that the maximum of | < u|v > | which equals the

maximum of |Tr(√ρ.√σW )| as W varies over all unitary matrices is simply thesum of the singular values of

√ρ.√σ, ie,

F (ρ, σ) = Tr|√ρ√σ|

= Tr[(√ρσ.

√ρ)1/2] = Tr[(

√σ.ρ.

√σ)1/2]

100 CHAPTER 6. LECTURES ON STATISTICAL SIGNAL PROCESSING

6.4 Stinespring’s representation of a TPCP map,

ie, of a quantum operation

Let K be a quantum operation. Thus, for any state ρ,

K(ρ) =

N∑

a=1

Eaρ.E∗a ,

N∑

a=1

E∗aEa = 1

We claim that K can be represented as

K(ρ) = Tr2(U(ρ⊗ |u >< u|)U∗)

where ρ acts in H1, |u >∈ H2 and U is a unitary operator in H1 ⊗H2. To seethis, we define

V ∗ = [E∗1 , E

∗2 , ..., E

∗N ]

Then,V ∗V = Ip

In other words, the columns of V are orthonormal. We assume that

ρ ∈ Cp×p

soV ∈ C

Np×p

Note thatV ρV ∗ = ((EaρE

∗b ))1≤a,b≤N

as an Np × Np block structured matrix, each block being of size p × p. Nowdefine

|u >= [1, 0, ..., 0]T ∈ CN

Then,

|u >< u| ⊗ ρ =

(

ρ 00 0

)

∈ CNp×Np

Since the columns of V are orthonormal, we can add columns to it to make it asquare Np×Np unitary matrix U∗:

U = [V,W ] ∈ CNp×Np, U∗U = INp

Then,

U(|u >< u| ⊗ ρ)U∗ = [V,W ]

(

ρ 00 0

)(

V ∗

W ∗

)

= V ρV ∗

from which it follows that

Tr1(U(|u >< u| ⊗ ρ)U∗) = Tr1(V ρV∗) =

a

EaρE∗a

6.5. RLS LATTICE ALGORITHM, STATISTICAL PERFORMANCE ANALYSIS101

Remark: Consider the block structured matrix

X = ((Bij))1≤i,j≤N

where each Bij is a p× p matrix. Then

Tr1(X) =∑

i

Bii ∈ Cp×p

and

Tr2(X) = ((Tr(Bij)))1≤i,j≤N ∈ CN×N

6.5 RLS lattice algorithm, statistical performanceanalysis

The RLS lattice algorithm involves updating the forward and backward pre-diction parameters simultaneously in time and order. In order to carry outa statistical performance analysis, we first simply look at the block processingproblem involving estimating the forward prediction filter parameter vector aN,pof order p based on data collected upto time N . Let

x(n) = [x(n), x(n−1), ..., x(0)]T , z−rx(n) = [x(n−r), x(n−r−1), ..., x(0), 0, ..., 0]T , r = 1, 2, ...

where these vectors are all of size N + 1. Form the data matrix of order p attime N :

XN,p = [z−1x(n), ..., z−px(n)] ∈ RN+1×p

Then the forward prediction filter coefficient estimate is

aN,p = (XTN,pXN,p)

−1XTN,px(N)

If x(n), n ∈ Z is a zero mean stationary Gaussian process with autocorrelationR(τ) = E(x(n+ τ)x(n)), then the problem is to calculate the statistics of aN,p.To this end, we observe that

N−1(z−ix(N))T (z−jx(N)) =

N−1N∑

n=max(i,j)

x(n− i)x(n− j) ≈ R(j − i)

So we write

N−1(z−ix(N))T (z−jx(N)) = R(j − i) + δRN (j, i)

102 CHAPTER 6. LECTURES ON STATISTICAL SIGNAL PROCESSING

where now δRN (j, i) is a random variable that is a quadratic function of theGaussian process x(n) plus a constant scalar. Equivalently,

N−1XTN,pXN,p = Rp + δRN,p

whereRp = ((R(j − i)))1≤i,j≤p

is the statistical correlation matrix of size p and δRN,p is a random matrix.Likewise,

N−1(z−ix(N))Tx(N) = N−1N∑

n=i

x(n− i)x(n) = R(i) + δRN (0, i)

Thus,N−1XT

N,px(n) = r(1 : p) + δrN (0, 1 : p)

where

r(1 : p) = [R(1), ..., R(p)]T , δrN (1 : p) = [δRN (0, 1), ..., δRN (0, p)]T

6.6 Approximate computation of the matrix el-ements of the principal and discrete seriesfor G = SL(2,R)

These matrix elements fmn(x, µ) for µ non-integral are the irreducible princi-pal series matrix elements and fmm(x,m) = fm(x) for k integral generate theDiscrete series Dm-modules when acted upon by U(g), the universal envelopingalgebra of G. These matrix elements satisfy the equations

fmn(u(θ1)xu(θ2), µ) = exp(i(mθ1 + nθ2))fmn(x, µ),

Ωfmn(x, µ) = −µ2fmn(x, µ)

the second of which, in view of the first and the expression of the Casimir opera-tor Ω in terms of t, θ1, θ2 in the singular value decomposition x = u(θ1)a(t)u(θ2)simplifies to an equation of the form

d2fmn(t, µ)/dt2 − qmn(t)fmn(t, µ) + µ2fmn(t, µ) = 0

where fmn(t, µ) = fmn(a(t), µ). This determines the matrix elements of theprincipal series. For the discrete series matrix elements, we observe that theseare

φkmn(x) =< πk(x)em|en >

6.7. PROOFOF THE CONVERSE PART OF THE CQ CODING THEOREM103

where πk is the reducible principal series for integral k. Replacing k by −kwhere k = 1, 2, ..., the discrete series matrix elements are

φ−kmn(x) =< π−k(x)em|en >,m, n = −k − 1,−k − 2, ...

withπ(X)e−k−1 = 0

Note that we can using the method of lowering operators, write

π−k(Y )re−k−1 = c(k, r)e−k−1−r , r = 0, 1, 2, ...

where c(k, r) are some real constants. Thus, writing

fk+1(x) =< π−k(x)e−k−1|e−k−1 >, x ∈ G,

we have for the discrete series matrix elements,

φ−k,−k−1−r,−k−1−s(x) =< π−k(xYr)e−k−1|π−k(Y s)e−k−1 >

=< π−k(Y∗sxY r)e−k−1|e−k−1 >= fk+1(Y

∗s;x;Y r)

In this way, all the discrete series matrix elements can be determined by applyingdifferential operators on fk+1(x).

Now given an image field h(x) on SL(2,R), in order to extract invariantfeatures, or to estimate the group transformation element parameter from anoisy version of it, we must first evaluate its matrix elements w.r.t the discreteand principal series, ie,

G

h(x)φ−k,−k−1−r,−k−1−s(x)dx,

G

h(x)fmn(x, µ)dx

Now,

G

h(x)φ−k,−k−1−r,−k−1−s(x)dx =

G

h(x)fk+1(Y∗s;x;Y r)dx

=

G

h(Y s;x;Y ∗r)fk+1(x)dx

6.7 Proof of the converse part of the Cq codingtheorem

Let φ(i), i = 1, 2, ..., N be a code. Note that the size of the code is N . Let s ≤ 0and define

ps = argmaxpIs(p,W )

104 CHAPTER 6. LECTURES ON STATISTICAL SIGNAL PROCESSING

whereIs(p,W ) = minσDs(p⊗W |p⊗ σ)

where

p⊗W = diag[p(x)W (x), x ∈ A], p⊗ σ = diag[p(x)sigma, x ∈ A]

Note that

Ds(p⊗W |p⊗ σ] = log(Tr((p⊗W )1−s(p⊗ σ)s)/(−s)

= (1/− s) log sumxTr(p(x)W (x))1−s(p(x)σ)s)

= (1/− s)log∑

x

p(x)Tr(W (x)1−sσs)

so thatlims→0Ds(p⊗W |p⊗ σ) =

x

p(x)Tr(W (x)log(W (x)) − Tr(Wp.log(σ))

whereWp =

x

p(x)Wx

Note that this result can also be expressed as

lims→0Ds(p⊗W |p⊗ σ) =∑

x

p(x)D(W (x)|σ)

We define

f(t) = Tr(tW 1−sx + (1 − t)

y

ps(y)W1−sy )1/(1−s), t ≥ 0

Note thatIs(p,W ) = (Tr(

y

p(y)W 1−sy )1/(1−s))1−s

and since by definition of ps, we have

Is(ps,W ) ≥ Is(p,W )∀p,

it follows thatf(t) ≤ f(0)∀t ≥ 0

Thusf ′(0) ≤ 0

This gives

Tr[(W 1−sx −

y

ps(y)W1−sy ).(

y

ps(y)W1−sy )s/(1−s)] ≤ 0

6.7. PROOFOF THE CONVERSE PART OF THE CQ CODING THEOREM105

or equivalently,

Tr(W 1−sx .(

y

ps(y)W1−sy )s/(1−s)) ≤ Tr(

y

ps(y)W1−sy )1/(1−s)

Now define the state

σs = (∑

y

ps(y)W1−sy )1/(1−s)/T r(

y

ps(y)W1−sy )1/(1−s)

Then we can express the above inequality as

Tr(W 1−sx σss) ≤ [Tr(

y

ps(y)W1−sy )1/(1−s)]1−s

Remark: Recall that

Ds(p⊗W |p⊗ σ] = (1/− s)log∑

x

p(x)Tr(W 1−sx σs)

= (1/− s)log(Tr(Aσs)

whereA =

x

p(x)W 1−sx

Note that we are all throughout assuming s ≤ 0. Application of the reverseHolder inequality gives for all states σ

Tr(Aσs) ≥ (Tr(A1/(1−s)))1−s.(Tr(σ))s = (Tr(A1/(1−s)))1−s

with equality iff σ is proportional to A1/(1−s), ie, iff

σ = A1/(1−s)/T r(A1/(1−s))

Thus,minσDs(p⊗W |p⊗ σ) =

(−1/s)[Tr(∑

x

p(x)W 1−sx )1/(1−s)]1−s

The maximum of this over all p is attained when p = ps and we denote thecorresponding value of σ by σs. Thus,

σs =(∑

x ps(x)W1−sx )1/(1−s)

Tr(∑

x ps(x)W1−sx )1/(1−s)

Now let φ(i), i = 1, 2, ..., N be a code with φ(i) ∈ An. Let Yi, i = 1, 2, ..., Nbe detection operators, ie, Yi ≥ 0,

i Yi = 1. Let ǫ(φ) be the error probabilitycorresponding to this code and detection operators. Then

1− ǫ(φ) = N−1N∑

i=1

Tr(Wφ(i)Yi)

106 CHAPTER 6. LECTURES ON STATISTICAL SIGNAL PROCESSING

= N−1Tr(S(φ)T )

where

S(φ) = N−1diag[Wφ(i), i = 1, 2, ..., N ], T = diag[Yi, i = 1, 2, ..., N ]

Also define for any state σ, the state

S(σ) = N−1diag[σn, ..., σn] = N−1.IN ⊗ σn

We observe that

Tr(S(σn)T ) == N−1∑

i

Tr(σnYi) = Tr(σ.N−1.

N∑

i=1

Yi)

= N−1Tr(σn) = 1/N

where σn = σ⊗n. Then for s ≤ 0 we have by monotonicity of quantum Renyientropy, (The relative entropy between the pinched states cannot exceed therelative entropy between the original states)

N−s(1 − ǫ(φ))1−s = (Tr(S(φ)T ))1−s(Tr(S(σns)T ))s

≤ (Tr(S(φ)T ))1−s(Tr(S(σns)T ))s+(Tr(S(φ)(1−T )))1−s(Tr(S(σns)(1−T )))s

≤ Tr(S(φ)1−sS(σns)s)

Note that the notation σns = σ⊗ns is being used. Now, we have

(Tr(S(φ)1−sS(σns)s)) = (N−1

i

Tr(W 1−sφ(i) σ

sns))

log(Tr(W 1−sφ(i) σ

sns)

=n∑

l=1

log(Tr(W 1−sφl(i)

σss))

= n.n−1n∑

l=1

log(Tr(W 1−sφl(i)

σss))

≤ n.log(n−1.n∑

l=1

Tr(W 1−sφl(i)

σss))

≤ n.log(Tr(∑

y

ps(y)W1−sy )1/(1−s)]1−s)

= n(1− s).log(Tr(∑

x

ps(x)W1−sx )1/(1−s))

Thus,

Tr(S(φ)1−sS(σns)s) ≤ (Tr(

x

ps(x)W1−sx )1/(1−s))n(1−s)

6.7. PROOFOF THE CONVERSE PART OF THE CQ CODING THEOREM107

and hence

N−s(1 − ǫ(φ))1−s ≤ (Tr(∑

x

ps(x)W1−sx )1/(1−s))n(1−s)

or equivalently,

1− ǫ(φ) ≤ exp((s/(1− s))log(N) + n.log(Tr(∑

x

ps(x)W1−sx )1/(1−s)))

= exp((s/(1− s))log(N) + n.log(Tr(∑

x

ps(x)W1−sx )1/(1−s)))

= exp(ns((1− s)log(N)/n+ s−1log(Tr(∑

x

ps(x)W1−sx )1/(1−s)))−−− (a)

where s ≤ 0. Now, for any probability distribution p(x) on A, we have

d/dsT r[(∑

x

p(x)W 1−sx )1/(1−s)] =

(1/(1− s)2)Tr[(∑

x

p(x)W 1−sx )1/(1−s).log(

x

p(x)W 1−sx )]

−(1/(1− s))[Tr(∑

x

p(x)W 1−sx log(Wx))].[Tr(

x

p(x)W 1−sx )s/(1−s)]

and as s→ 0, this converges to

d/dsT r[(∑

x

p(x)W 1−sx )1/(1−s)]|s=0 =

x

p(x)Tr(Wx.(log(Wp)− log(Wx))) = −∑

x

p(x)D(Wx|Wp)

= −(H(Wp)−∑

p

(x)H(Wx)) = −(H(Y )−H(Y |X)) = −Ip(X,Y )

whereWp =

x

p(x)Wx

Write N = N(n) and define

R = limsupnlog(N(n))/n, φ = φn

Also define

Is(X,Y ) = −(1/s(1− s))log(Tr(∑

x

ps(x)W1−sx )1/(1−s)))

Then, we’ve just seen that

lims→0Is(X,Y ) = I0(X,Y ) ≤ I(X,Y )

108 CHAPTER 6. LECTURES ON STATISTICAL SIGNAL PROCESSING

where I0(X,Y ) = Ip0(X,Y ) and I(X,Y ) = suppIp(X,Y ). and

1− ǫ(φn) ≤ exp((ns/(1− s))(log(N(n))/n− Is(X,Y )))

It thus follows that ifR > I(X,Y )

then by taking s < 0 and s sufficiently close to zero we would have

R > Is(X,Y )

and then we would have

limsupn(1 − ǫ(φn)) ≤ limsupnexp((ns/(1− s))(R − Is(X,Y )) = 0

or equivalently,liminfnǫ(φn) = 1

This proves the converse of the Cq coding theorem, namely, that if the rate ofinformation transmission via any coding scheme is more that the Cq capacityI(X,Y ), then the asymptotic probability of error in decoding using any sequenceof decoders will always converge to unity.

6.8 Proof of the direct part of the Cq codingtheorem

Our proof is based on Shannon’s random coding method. Consider the samescenario as above. Define

πi = Wφ(i) ≥ 2NWpn, i = 1, 2, ..., N

where N = N(n), φ(i) ∈ An with φ(i), i = 1, 2, ..., N being iid with probabilitydistribution pn = p⊗n on An. Define the detection operators

Y (φ(i)) = (N∑

j=1

π)−1/2.πi.(N∑

j=1

πj)−1/2, i = 1, 2, ..., N

Then 0 ≤ Yi ≤ 1 and∑N

i=1 Yi = 1. Write

S = πi, T =∑

j 6=iπj

Then,Y (φ(i)) = (S + T )−1/2S(S + T )−1/2, 0 ≤ S ≤ 1, T ≥ 0

and hence we can use the inequality

1− (S + T )−1/2S(S + T )−1/2 ≤ 2− 2S + 4T

6.8. PROOF OF THE DIRECT PART OF THE CQ CODING THEOREM109

Then,

1− Y (φ(i)) =≤ 2− 2πi + 4∑

j:j 6=iπj

and hence

ǫ(φ) = N−1N∑

i=1

Tr(Wφ(i)(1 − Y (φ(i))))

≤ N−1N∑

i=1

Tr(W (φ(i))(2 − 2πi) + 4N−1∑

i6=jTr(W (φ(i))πj)

Note that since for i 6= j, φ(i) and φ(j) are independent random variables inAn, it follows thatW (φ(i)) and πj are also independent operator valued randomvariables. Thus,

E(ǫ(φ)) ≤ (2/N)

N∑

i=1

E(Tr(W (φ(i))W (φ(i)) ≤ 2NWpn))

+(4/N)∑

i6=j,i,j=1,2,...,N

Tr(E(W (φ(i))).EW (φ(j)) > 2NWpn)

= 2∑

xn∈An

pn(xn)Tr(W (xn)W (xn) ≤ 2N.W (pn))

+4(N − 1)∑

xn,yn∈An

pn(xn)pn(yn)Tr(W (xn)W (yn) > 2NW (pn))

= (2/N)∑

xn

pn(xn)Tr(W (xn)W (xn) ≤ 2N.W (pn))

+4(N − 1)∑

yn∈An

pn(yn)Tr(W (pn)W (yn) > 2NW (pn))

≤ 2∑

xn

pn(xn)Tr(W (xn)1−s(2NW (pn))

s)

+4N∑

yn

pn(yn)Tr(W (yn)1−sW (pn)

s(2N)s−1)

for 0 ≤ s ≤ 1. This givesE(ǫ(φn)) ≤

2s+2Ns∑

xn∈An

pn(xn)Tr(W (xn)1−sW (pn)

s)

= 2s+2Ns[∑

x∈Ap(x)Tr(W (x)1−sW (p)s)]n

= 2s+2.exp(sn(log(N(n)/n+ s−1.log(∑

x

p(x)Tr(W (x)1−sW (p)s))))

110 CHAPTER 6. LECTURES ON STATISTICAL SIGNAL PROCESSING

6.9 Induced representations

Let G be a group and H a subgroup. Let σ be a unitary representation of H ina Hilbert space V . Let F be the space of all f : G→ V for which

f(gh) = σ(h)−1f)(g), h ∈ H, g ∈ G

Then define

U(g)f(x) = ρ(g, g−1xH)f(g−1x), g, x ∈ G, f ∈ F

whereρ(g, x) = (dµ(xH)/dµ(gxH))1/2, g, x ∈ G

with µ a measure on G/H . U is called the representation of G induced by therepresentation σ of H . We define

‖ f ‖=∫

G

|f(g)|2dµ(g)

Then, unitarity of U(g) is easy to see:

G/H

‖ U(g)f(x) ‖2 dµ(x) =

G/H

(ρ(g, g−1xH))2|f(g−1x)|2dµ(xH) =

(dµ(g−1xH)/dµ(xH))|f(g−1x)|2dµ(xH) =

G/H

|f(g−1x)|2dµ(g−1xH)

=

G/H

|f(x)|2dµ(xH)

proving the claim. Note that

(ρ(g1, g2xH)ρ(g2, xH))2 = (dµ(g2xH)/dµ(g1g2xH))(dµ(xH)/dµ(g2xH)) =

dµ(xH)/dµ(g1g2xH) = ρ(g1g2, xH)2

ieρ(g1g2, xH) = ρ(g1, g2xH)ρ(g2, xH), g1, g2, x ∈ G

ie, ρ is a scalar cocycle.Remark: |f(x)|2 can be regarded a function on G/H since it assumes the

same value on xH for any x ∈ G. Indeed for x ∈ G, h ∈ H , |f(xh)|2 =|σ(h)−1f(x)|2 = |f(x|2 since σ(h) is a unitary operator.

There is yet another way to describe the induced representation, ie, in a waythat yields a unitary representation of G that is equivalent to U . The methodis as follows: Let X = G/H and let F1 be the space of all square integrable

6.9. INDUCED REPRESENTATIONS 111

functions f : X → V w.r.t the measure µX induced by µ on X . Let γ be anycocycle of (G,X) with values in the space of unitary operators on V , ie,

γ : G×X → U(V )

satisfiesγ(g1g2, x) = γ(g1, g2x)γ(g2, x), g1, g2 ∈ G, x ∈ X

Then define a representation U1 of G in F1 by

U1(g)f(x) = ρ(g, g−1x)γ(g, g−1x)f(g−1x), g ∈ G, x ∈ X, f ∈ F1

Let s : X → G be a section of X = G/H such that s(H) = e. This means thatfor any x ∈ X, s(x)H = x, ie, s(x) belong to the coset x. Define

b(g, x) = s(gx)−1gs(x) ∈ G, g ∈ G, x ∈ X

Then,b(g, x)H = s(gx)−1gx = H

becausegs(x)H = gx, s(gx)H = gx

Now observe that

b(g1, g2x)b(g2, x) = s(g1g2x)−1g1s(g2x)s(g2x)

−1g2s(x) = s(g1g2x)−1g1g2s(x)

= b(g1g2, x), g1, g2 ∈ G, x ∈ X

implying that γ = σ(b) is a (G,X, V ) cocycle, ie,

γ(g1g2, x) = σ(b(g1g2, x)) = σ(b(g1, g2x)b(g2, x))

= σ(b(g1, g2x))σ(b(g2, x)) = γ(g1, g2x)γ(g2, x)

with γ assuming values in the set of unitary operators in V . Now let

f(g) = γ(g,H)−1f1(gH) = T (f1)(g) ∈ V, f1 ∈ F1, g ∈ G

ThenU(g)T (f1)(g1) = U(g)f(g1) = ρ(g, g−1g1H)f(g−1g1)

= ρ(g, g−1g1H)γ(g−1g1, H)−1f1(g−1g1H)

= ρ(g, g−1g1H)(γ(g−1, g1H)γ(g1, H))−1f1(g−1g1H)

On the other hand,

T (U1(g)f1)(g1) = γ(g1, H)−1U1(g)f1(g1H)

= γ(g1, H)−1ρ(g, g−1g1H)γ(g, g−1g1H)f1(g−1g1H)

= ρ(g, g−1g1H)γ(g1, H)−1γ(g, g−1g1H)f1(g−1g1H)

112 CHAPTER 6. LECTURES ON STATISTICAL SIGNAL PROCESSING

Now,

γ(g1, H) = γ(gg−1g1, H) = γ(g, g−1g1H)γ(g−1g1, H)

or equivalently,

γ(g1, H)−1γ(g, g−1g1H) = γ(g−1g1, H)−1

Thus,

T (U1(g)f1)(g1) = ρ(g, g−1g1H)γ(g−1g1, H)−1f1(g−1g1H)

= ρ(g, g−1g1H)T (f1)(g−1g1) = U(g)T (f1)(g1)

In other words,

TU1(g) = U(g)T

Thus, U1 and U are equivalent unitary representations.

6.10 Estimating non-uniform transmission line

parameters and line voltage and currentusing the extended Kalman filter

Abstract: The distributed parameters of a non-uniform transmission line areassumed to depend on a finite set of unknown parameters, for example theirspatial Fourier series coefficients. The line equations are setup taking randomvoltage and current line loading into account with the random loading beingassumed to be white Gaussian noise in the time and spatial variables. By pass-ing over to the spatial Fourier series domain, these line stochastic differentialequations are expressed as a truncated sequence of coupled stochastic differen-tial equations for the spatial Fourier series components of the line voltage andcurrent. The coefficients that appear in these stochastic differential equationsare the spatial Fourier series coefficients of the distributed line parameters andthese coefficients in turn are linear functions of the unknown parameters. Byseparating out these complex stochastic differential equations into their real andimaginary part, the line equations become a finite set of sde’s for the extendedstate defined to be the real and imaginary components of the line voltage andcurrent as well as the parameters on which the distributed parameters depend.The measurement model is assumed to be white Gaussian noise corrupted ver-sions of the line voltage and current sampled at a discrete set of spatial pointson the line and this measurement model can be expressed as a linear transfor-mation of the state variables, ie, of the spatial Fourier series components of theline voltage and current plus a white measurement noise vector. The extendedKalman filter equations are derived for this state and measurement model andour real time line voltage and current and distributed parameter estimates showexcellent agreement with the true values based on MATLAB simulations.

6.11. MULTIPOLE EXPANSIONOF THEMAGNETIC FIELD PRODUCEDBYA TIME VARYING CURRENTDISTRIBUTION113

6.11 Multipole expansion of the magnetic field

produced by a time varying current distri-bution

Let J(t, r) be the current density. The magnetic vector potential generated byit is given by

A(t, r) = (µ/4π)

J(t− |r − r′|/c, r′)d3r′/|r − r′|

= (µ/4π)∑

n≥0

(1/cnn!)

(∂nt J(t, r′))|r − r′|n−1d3r′

For |r| greater than the source dimension and with θ denoting the angle betweenthe vectors r and r′, we have the expansion

|r − r′|α = (r2 + r′2 − 2r.r′)α/2

= rα(1 + (r′/r)2 − 2r.r′/r2)α/2 = rα(1 + (r′/r)2 − 2(r′/r)cos(θ))α/2

= rα∑

m≥0

Pm,α(cos(θ))(r′/r)m

where the generating function of Pm,α(x),m = 0, 1, 2, ... is given by (1 + x2 −2x.cos(θ))α/2. Then,

A(t, r) = (µ/4π)∑

n,m≥0

(1/cnn!)rn−1−m∫

∂nt J(t, r′)r′mPn−1(r.r

′/rr′)d3r′

A useful approximation to this exact expansion is obtained by taking

|r − r′|α ≈ (r2 − 2r.r′)α/2 ≈ rα(1− αr.r′/r2)

= rα − α.rα−2r.r′

Further,

∂nt J(t, r′)(r.r′) = (r′ × ∂nt J(t, r

′))× r + (∂nt J(t, r′).r)r′

In the special case when the source is a wire carrying a time varying currentI(t), we can write

J(t, r′)d3r′ = I(t)dr′

and then the above equation gets replaced by

114 CHAPTER 6. LECTURES ON STATISTICAL SIGNAL PROCESSING

Chapter 7

Some aspects of stochasticprocesses

End semester exam, M.Tech Ist semester, Stochastic processes and

queueing theory

Attempt any five questions, each question carries ten marks

[1] In a single server queue, let the arrivals take place at the times Sn =∑nk=1Xk, n ≥ 1, let the service time of the nth customer be Yn and let his

waiting time in the queue be Wn. Derive the recursive relation

Wn+1 = max(0,Wn + Yn −Xn+1), n ≥ 1

where W1 = 0. If the X ′ns are iid and the Y ′

ns are also iid and independentof Xn, then define Tn = Yn − Xn+1 and prove that Wn+1 has the samedistribution as that of max0≤k≤nFk where Fn =

∑nk=1 Tk, n ≥ 1 and F0 = 0.

When Xn, Yn are respectively exponentially distributed with parameters λ, µrespectively explain the sequence of steps based on ladder epochs and ladderheights by which you will calculate the distribution of the equilibrium waitingtime W∞ = limn→∞Wn in the case when µ > λ. What happens when µ ≤ λ ?

Solution: The (n+1)th customer arrives at the time Sn+1. The previous, ie,the nth customer departs at time Sn+Wn+ Yn. If this time of his departure issmaller than Sn+1, (ie, Sn+Wn+ Yn−Sn+1 =Wn+ Yn−Xn+1 < 0), then the(n+1)th customer finds the queue empty on arrival and hence his waiting timeWn+1 = 0. If on the other hand, the time of departure Sn+Wn+ Yn of the nth

customer is greater than Sn+1, the arrival time of the (n+ 1)th customer, thenthe n+ 1th customer has to wait for a time

Wn+1 = (Sn +Wn + Yn)− Sn+1 =Wn + Yn −Xn+1

before his service commences. Therefore, we have the recursion relation

Wn+1 = max(0,Wn + Yn −Xn+1), n ≥ 0

115

116 CHAPTER 7. SOME ASPECTS OF STOCHASTIC PROCESSES

Note that W1 = 0. This can be expressed as

Wn+1 = max(0,Wn + Tn), Tn − Yn −Xn+1

By iteration, we find

W2 = max(0, T1),W3 = max(0, T2+max(0, T1)) = max(0,max(T2, T1+T2)) = max(0, T2, T1+T2)

W4 = max(0,W3+T3) = max(0, T3+max(0,max(T2, T1+T2)) = max(0,max(T3,max(T2+T3, T1+T2+T3))))

= max(0, T3, T2 + T3, T1 + T2 + T3)

Continuing, we find by induction that

Wn+1 = max(0,n∑

k=r

Tk, r = 1, 2, ..., n)

Now the T ′ks are iid. Therefore, Wn+1 has the same distribution as

max(0,

n∑

k=r

Tn+1−k, r = 1, 2, ..., n)

= max(0,

r∑

k=1

Tk, r = 1, 2, ..., n)

In particular, the equilibrium distribution of the waiting time, ie of W∞ =limn→∞Wn is the same as that of

max(0,

∞∑

k=1

Tk)

provided that this limiting r.v. exists, ie, the provided that the infinite seriesconverges w.p.1. Let Fn =

∑nk=1 Tk, n ≥ 1, F0 = 0

Define τ1 denote the first time k ≥ 1 at which Fk > F0 = 0. For n ≥ 1, letτn+1 denote the first time k ≥ 1 at which Fk > Fτn , ie, the n + 1th time k atwhich Fk exceeds all of Fj , j = 0, 1, ..., k− 1. Then, it is clear that W∞ has thesame distribution as that of limFτn , provided that τn exists for all n. In thegeneral case, we have

P (W∞ ≤ x) = limn→∞P (Fτn ≤ x)

where if for a given ω, τN (ω) exists for some positive integer N = N(ω) butτN+1(ω) does not exist, then we set τk(ω) = τN (ω) for all k > N . It is alsoclear using stopping time theory, that the r.v’s Un+1 = Fτn+1

− Fτn , n = 1, 2, ...are iid and non-negative and we can write

W∞ =

∞∑

n=0

Un, U0 = 0

117

Thus, the distribution of W∞ is formally the limit of F ∗nU where FU is the

distribution of U1 = Fτ1 with τ1 = min(k ≥ 1, Fk > 0). Note also that forx > 0,

1− FU (x) = P (U > x) =

∞∑

n=0

P (maxk≤nFk ≤ x < Fn+1)

[2] Consider a priority queue with two kinds of customers labeled C1, C2 anda single server. Customers of type C1 arrive at the rate λ1 and customers of typeC2 arrive at the rate λ2. A customer of type C1 is served at the rate µ1 while acustomer of type C2 is served at the rate µ2. If a customer of type C1 is presentin the queue, then his service takes immediate priority over a customer of typeC2, otherwise the queue runs on a first come first served basis. Let X1(t) denotethe number of type C1 customers and X2(t) the number of type C2 customersin the queue at time t. Set up the Chapman-Kolmogorov differential equationsfor

P (t, n1, n2) = Pr(X1(t) = n1, X2(t) = n2), n1, n2 = 0, 1, ...

and also set up the equilibrium equations for

P (n1, n2) = limt→∞P (t, n1, n2)

Solution: Let θ(n) = 1ifn ≥ 0 and θ(n) = 0 if n < 0 and let δ(n,m) = 1 ifn = m else δ(n,m) = 0.

P (t+dt, n1, n2) = P (t, n1−1, n2)λ1dt.θ(n1−1)+P (t, n1, n2−1)λ2dt.θ(n2−1)+

P (t, n1 + 1, n2)µ1dt+ P (t, n1, n2 + 1)µ2.dt.δ(n1, 0)

+P (t, n1, n2)(1 − (λ1 + λ2)dt− µ1.dt.θ(n1 − 1)− µ2dt.θ(n2 − 1).δ(n1, 0))

Essentially, the term δ(n1, 0) has been inserted to guarantee that the C2 type ofcustomer is currently being served only when there is no C1 type of customer inthe queue and the term θ(n2 − 1) term has been inserted to guarantee that theC2 type of customer can depart from the queue only if there is at least one suchin the queue. The product δ(n1, 0)θ(n2 − 1) term corresponds to the situationin which there is no C1 type of customer and at least one C2 type of customerwhich means that a C2 type of customer is currently being served and hencewill depart immediately after his service is completed. From these equationsfollow the Chapman-Kolmogorov equations:

∂tP (t, n1, n2) =

P (t, n1 − 1, n2)λ1θ(n1 − 1) + P (t, n1, n2 − 1)λ2.θ(n2 − 1)+

P (t, n1 + 1, n2)µ1 + P (t, n1, n2 + 1)µ2.δ(n1, 0)

−P (t, n1, n2)(λ1 + λ2)− µ1.θ(n1 − 1)− µ2.θ(n2 − 1).δ(n1, 0))

118 CHAPTER 7. SOME ASPECTS OF STOCHASTIC PROCESSES

The equilibrium equations are obtained by setting the rhs to zero, ie ∂tP (n1, n2) =0.

[3] Write short notes on any three of the following:[a] Construction of Brownian motion on [0, 1] from the Haar wavelet func-

tions including the proof of almost sure continuity of the sample paths.Solution: Define the Haar wavelet functions asHn,k(t) = 2(n−1)/2 for (k − 1)/2n ≤ t ≤ k/2n, Hn,k(t) = −2(n−1)/2 for

k/2n < t ≤ (k + 1)/2n whenever k is in the set I(n) = k : kodd, k =0, 1, ..., 2n−1 for n=1,2,... and put H0(t) = 1. Here t ∈ [0, 1]. It is theneasily verified that the functions Hnk, k ∈ I(n), n ≥ 1, H00 form an orthonormalbasis for L2[0, 1]. We set I(0) = 0.

Define the Schauder functions

Snk(t) =

∫ t

0

Hnk(s)ds

Then, from basic properties of orthonormal bases, we have

nk

Hnk(t)Hnk(s) = δ(t− s)

and hence∑

nk

Snk(t)Snk(s) = min(t, s)

Let ξ(n, k), k ∈ I(n), n ≥ 1, ξ(0, 0) be iid N(0, 1) random variables and definethe random functions

BN (t) =N∑

n=0

k∈I(n)ξ(n, k)Snk(t), N ≥ 1

Then since Snk(t) are continuous functions, BN (t) is also continuous for all N .Further, the S′

nks for a fixed n are non-overlapping functions and

maxt|Snk(t)| = 2−(n+1)/2

Thus,

maxt∑

k∈I(n)|Snk(t)| = 2−(n+1)/2

Letb(n) = max(|ξ(nk)| : k ∈ I(n)

Then, by the union bound

P (b(n) > n) ≤ 2n.P (|ξ(nk)| > n) = 2n+1

∫ ∞

n

exp(−x2/2)dx/√2π

≤ K.2n+1n.exp(−n/2)

119

Thus,∑

n≥1

P (b(n) > n) <∞

which means that by the Borel-Cantelli lemma, for a.e.ω, there is a finite N(ω)such that for all n > N(ω), b(n, ω) < n. This means that for a.e.ω,

N∑

n=1

k∈I(n)|ξ(n, k, ω)||Snk(t)|

≤N∑

n=1

n.2−(n+1)/2

which converges as N → ∞. This result implies that the sequence of con-tinuous processes BN (t), t ∈ [0, 1] converges uniformly to a continuous processB(t), t ∈ [0, 1] a.e. Note that continuity of each term and uniform convergenceimply continuity of the limit. That B(t), t ∈ [0, 1] is a Gaussian process withzero mean and covariance min(t, s) as can be seen from the joint characteristicfunction of the time samples of BN (.). Alternately, B(.) being an infinite linearsuperposition of Gaussian r.v’s must be a Gaussian process, its mean zero isobvious since ξ(n, k) have mean zero and the covariance follows from

E(BN (t)BN (s)) =

N∑

n,m=0

k,r∈I(n)E(ξ(nk)ξ(mr))Snk(t)Smr(s)

=

N∑

n,m=0

k,r∈I(n)δ(n,m)δ(k, r)Snk(t)Smr(s)

=

N∑

n=0

k∈I(n)Snk(t)Snk(s) →

∞∑

n=0

k∈I(n)Snk(t)Snk(s) = min(t, s)

[b] L2-construction of Brownianmotion from the Karhunen-Loeve expansion.The autocorrelation of B(t) is

R(t, s) = E(B(t)B(s)) = min(t, s)

Its eigenfunctions φ(t) satisfy

∫ 1

0

R(t, s)φ(s)ds = λφ(t), t ∈ [0, 1]

Thus,∫ t

0

sphi(s)ds+ t

∫ 1

t

φ(s)ds = λφ(t)

120 CHAPTER 7. SOME ASPECTS OF STOCHASTIC PROCESSES

Differentiating,

tφ(t) +

∫ 1

t

φ(s) − tφ(t) = λφ′(t)

Again differentiating−φ(t) = λφ′′(t)

Let λ = 1/ω2. Then, the solution is

φ(t) = A.cos(ωt) +B.sin(ωt)

∫ t

0

sφ(s)ds = A((t/ω)sin(ωt) + (cos(ωt)− 1)/ω2)+

B(−t.cos(ωt)/ω + sin(ωt)/ω2)∫ 1

t

φ(s)ds = A(sin(ω)− sin(ωt))/ω + (B/ω)(cos(ωt)− cos(ω))

Substituting all these expressions into the original integral equation gives

A((t/ω)sin(ωt) + (cos(ωt)− 1)/ω2)+

B(−t.cos(ωt)/ω + sin(ωt)/ω2)

+tA(sin(ω)− sin(ωt))/ω + (Bt/ω)(cos(ωt)− cos(ω))

= (A.cos(ωt) +B.sin(ωt))/ω2

On making cancellations and comparing coefficients,

A.sin(ω) = 0, B.cos(ω) = 0, A/ω2 = 0,

Therefore,A = 0, ω = (n+ 1/2)π, n ∈ Z

and for normalization∫ 1

0

φ2(t)dt = 1

we get

B2

∫ 1

0

sin2(ωt)dt = 1

or(B2/2)(1− sin(2ω)/2ω) = 1

orB =

√2

Thus, the KL expansion for Brownian motion is given by

B(t) =∑

n∈Z

λnξ(n)φn(t)

121

=∑

n∈Z

((n+ 1/2)π)−2ξ(n).√2sin((n+ 1/2)πt)

where ξ(n), n = 1, 2, ... are iid N(0, 1) r.v’s.

[c] Reflection principle for Brownian motion with application to the evalua-tion of the distribution of the first hitting time of the process at a given level.

[d] The Borel-Cantelli lemmas with application to proving almost sure con-vergence of a sequence of random variables.

[e] The Feynman-Kac formula for solving Schrodinger’s equation using Brow-nian motion with complex time.

[f] Solving Poisson’s equation ∇2u(x) = s(x) in an open connected subset ofRn with the Dirichlet boundary conditions using Brownian motion stop times.

[4] A bulb in a bulb holder is changed when it gets fused. Let Xn denotethe time for which the nth bulb lasts. If n is even, then Xn has the exponentialdensity λ.exp(−λ.x)u(x) while if n is odd then Xn has the exponential densityµ.exp(−µ.x)u(x). Calculate (a) the probability distribution of the number ofbulbs N(t) = max(n : Sn ≤ t) changed in [0, t] where Sn = X1 + ...+Xn, n ≥ 1and S0 = 0. Define

δ(t) = t− SN(t), γ(t) = SN(t)+1 − t

Pictorially illustrate and explain the meaning of the r.v’s δ(t), γ(t). What hap-pens to the distributions of these r.v’s as t→ ∞ ?

Solution: Let n ≥ 1.

P (N(t) ≥ n) = P (Sn ≤ t) = P (∑

1≤k≤n,kevenXk +

1≤k≤n,noddXk ≤ t)

Specifically,

P (N(t) ≥ 2n) = P (

n∑

k=1

X2k +

n∑

k=1

X2k−1 ≤ t)

= F ∗n ∗G∗n(t)

whereF (t) = 1− exp(−λt), G(t) = 1− exp(−µt)

Note that with f(t) = F ′(t), g(t) = G′(t), we have

fn∗(t) = λntn−1exp(−λt)/(n− 1)!,

gn∗(t) = µntn−1exp(−µt)/(n− 1)!

Equivalently, in terms of Laplace transforms,

L(f(t)) = λ/(λ+ s), L(g(t)) = µ/(µ+ s)

122 CHAPTER 7. SOME ASPECTS OF STOCHASTIC PROCESSES

so

L(f∗n(t)) = λn/(λ+ s)n, L(g∗n(t)) = µn/(µ+ s)n

and hence

f∗n(t) ∗ g∗n(t) = L−1((λµ)2/(λ+ s)n(µ+ s)n) = ψn(t)

say. Then,

P (N(t) ≥ 2n) =

∫ t

0

ψn(x)dx

Likewise,

P (N(t) ≥ 2n− 1) = P (S2n−1 ≤ t) =

∫ t

0

φn(x)dx

where

φn(t) = gn∗(t) ∗ f∗(n−1)(t) = L−1(λn−1µn/(λ+ s)n−1(µ+ s)n)

δ(t) = t−SN(t) is the time taken from the last change of the bulb to the currenttime. γ(t) = SN(t)+1 − t is the time taken from the current time upto the nextchange of the bulb. Thus, for 0 ≤ x ≤ t,

P (δ(t) ≤ x) = P (SN(t) > t− x) =∑

n≥1

P (Sn > t− x,N(t) = n)

=∑

n≥1

P (t− x < Sn ≤ t < Sn+1) =∑

n≥1

P (t− x < Sn < t,Xn+1 > t− Sn)

=∑

n≥1

∫ t

t−xP (Xn+1 > t− s)P (Sn ∈ ds)

Likewise,

P (γ(t) ≤ x) = P (SN(t)+1 ≤ t+ x) =∑

n≥0

P (Sn+1 ≤ t+ x, Sn ≤ t < Sn+1)

=∑

n≥0

P (Sn ≤ t < Sn+1 ≤ t+ x) =∑

n≥0

∫ t

0

P (Sn ∈ ds)P (t− s < Xn+1 < t+ x)

[5] [a] Let X(t) : t ∈ R be a real stochastic process with autocorrelationE(X(t1).X(t2)) = R(t1, t2). Assume that there exists a finite positive real num-ber T such that R(t1 + T, t2 + T ) = R(t1, t2)∀t1, t2 ∈ R. Prove that X(t) is amean square periodic process, ie,

E[(X(t+ T )−X(t))2] = 0∀t ∈ R

123

Hence show that X(t) has a mean square Fourier series:

limN→∞E[|X(t)−N∑

n=−Nc(n)exp(jnω0t)|2] = 0

where ω0 = 2π/T and

c(n) = T−1

∫ T

0

X(t)exp(−jnω0t)dt

[b] Define a cyclostationary and wide sense cyclostationary process with anexample. Give an example of a wide sense cyclostationary process that is notcyclostationary.

Solution: A process X(t), t ∈ R is said to be cyclostationary if its finite di-mensional probability distribution functions are invariant under shifts by integermultiples of some fixed positive T , ie, if

P (X(t1) ≤ x1, ..., X(tN) ≤ xN ) = P (X(t1 + T ) ≤ x1, ..., X(tN + T ) ≤ xN )

for all x1, ..., xN , t1, ..., tN , N ≥ 1. For example, the process

X(t) =∑

n

c(n)exp(jnωt)

where c(n) are any r.v’s is cyclostationary with period 2π/ω. In fact, this processis a periodic process. A wide sense cyclostationary process is a process whosemean and autocorrelation function are periodic with period T for some fixedT > 0, ie, if

µ(t) = E(X(t)), R(t, s) = E(X(t)X(s))

thenµ(t+ T ) = µ(t), R(t+ T, s+ T ) = R(t, s)∀t, s ∈ R

An example of a WSS process that is not stationary is

X(t) = cos(ω1t+ φ1) + cos(ω2t+ φ2) + cos(ω3t+ φ1 + φ2)

where ω1 + ω2 6= ω3 and φ1, φ2 are iid r.v’s uniformly distributed over [0, 2π).This process has second moments that are invariant under time shifts but itsthird moments are not invariant under time shifts. Likewise, an example of aWS-cyclostationary process that is not cyclostationary is the processX(t)+Y (t)with Y (t) having the same structure as X(t) but independent of it with frequen-cies ω4, ω5, ω6 (and phases φ4, φ5, φ4 + φ5 with φ1, φ2, φ4, φ5 being independentand uniform over [0, 2π)) such that (ω1 + ω2 − ω3)/(ω4 + ω5 − ω6) is irrational.This process is also in fact WSS. To see why it is not cyclostationary, we computeits third moments:

E(X(t)X(t+t1)X(t+t2)) = E(cos(ω3t+φ1+φ2).cos(ω1(t+t1)+φ1).cos(ω2(t+t2)+φ2))

124 CHAPTER 7. SOME ASPECTS OF STOCHASTIC PROCESSES

+E(cos(ω3t+ φ1 + φ2).cos(ω1(t+ t2) + φ1).cos(ω2(t+ t1) + φ2))

= (1/4)(cos((ω1 + ω2 − ω3)t+ ω1t1 + ω2t2)

+cos((ω1 + ω2 − ω3)t+ ω2t1 + ω1t2))

and likewiseE(Y (t)Y (t+ t1)Y (t+ t2)) =

= (1/4)(cos((ω4 + ω5 − ω6)t+ ω4t1 + ω5t2)

+cos((ω4 + ω5 − ω6)t+ ω5t1 + ω4t2))

Thus, ifZ(t) = X(t) + Y (t)

then

E(Z(t)Z(t+t1)Z(t+t2)) = E(X(t)X(t+t1)X(t+t2))+E(Y (t)Y (t+t1)Y (t+t2))

= (1/4)(cos((ω1 + ω2 − ω3)t+ ω1t1 + ω2t2)

+cos((ω1 + ω2 − ω3)t+ ω2t1 + ω1t2))

+(1/4)(cos((ω4 + ω5 − ω6)t+ ω4t1 + ω5t2)

+cos((ω4 + ω5 − ω6)t+ ω5t1 + ω4t2))

which is clearly not periodic in t since (ω1+ω2−ω3)/(ω4+ω5−ω6) is irrational.

[6] Explain the construction of the Lebesgue measure as a countably additivenon-negative set function on the Borel subsets of R whose value on an intervalcoincides with the interval’s length.

Solution: We first define the Lebesgue measure µ on the field F of finitedisjoint unions of intervals of the form (a, b] in (0, 1] as the sum of lengths.Then, we define the outer measure µ∗ on the class of all subsets of (0, 1] as

µ∗(E) = inf∑

n

µ(En) : En ∈ F , E ⊂⋃

n

En

We prove countable subadditivity of µ∗. Then, we say that E is a measurablesubset of (0, 1] if

µ∗(EA) + µ∗(EcA) = µ∗(A)∀A ⊂ (0, 1]

Then, if M denotes the class of all measurable subsets, we prove that M is afield and that µ∗ is finitely additive on M. We then prove that M is a σ fieldand that µ∗ is countably additive on M. Finally, we prove that M containsthe field F and hence the σ-field generated by F , namely, the Borel σ-field on(0, 1].

125

[7] [a] What is a compact subset of Rn ? Give an example and characterize itmeans of (a) the Heine-Borel property and (b) the Bolzano-Weierstrass property.Prove that the countable intersection of a sequence of decreasing non-emptycompact subsets of Rn is non-empty.

A compact subset E of Rn is a closed and bounded subset. By bounded, wemean that

sup|x| : x ∈ E <∞By closed, we mean that if xn ∈ E is a sequence in E such that limxn = x ∈ Rn

exists, then x ∈ E. For example,

E = x ∈ Rn : |x| ≤ 1

It can be shown that E is compact iff every open cover of it has finite subcover,ie, if Oα, α ∈ I are open sets in Rn such that

E ⊂⋃

α∈IOα

then there is a finite subset

F = α1, ..., αN ⊂ I

such that

E ⊂N⋃

k=1

Oαk

(b) What is a regular measure on (Rn,B(Rn))?. Show that every probabilitymeasure on (Rn,B(Rn)) is regular.

Solution: A measure µ on (Rn,B(Rn)) is said to be regular if given anyB ∈ B(Rn) and ǫ > 0, there exists an open set F and a closed set C in Rn suchthat C ⊂ B ⊂ F and µ(F − C) < ǫ. Let us now show that any probabilitymeasure (and hence any finite measure) on (Rn,B(Rn)) is regular. Let P be aprobability measure on the above space. Define C to be the family of all setsB ∈ B(Rn) such that ∀ǫ > 0, there exists a closed set C and an open set F suchthat C ⊂ B ⊂ F and P (F−C) < ǫ. We prove that C is a σ-field and then since itcontains all closed sets (If C is a closed set and Fn = x ∈ Rn : d(x,C) < 1/n,then Fn is open, C ⊂ C ⊂ Fn and P (Fn − C) → 0 because Fn ↓ C), itwill follow that C = B(Rn) which will complete the proof of the result. LetB ∈ C and let ǫ > 0. Then, there exists a closed C and an open F such thatC ⊂ B ⊂ F and P (F − C) < ǫ. Then F c is closed, Cc is open, F c ⊂ Bc ⊂ Cc

and P (Cc − F c) = P (F − C) < ǫ, thereby proving that Bc ∈ C, ie, C is closedunder complementation. Now, let Bn ∈ C, n = 1, 2, ... and let ǫ > 0. Then, foreach n we can choose a closed Cn and an open Fn such that Cn ⊂ Bn ⊂ Fn withP (Fn−Cn) < ǫ/2n. Then,

⋃Nn=1 Cn is closed for any finite integer N ,

⋃∞n=1 Fn

is open and for B =⋃

nBn, we have that

N⋃

n=1

Cn ⊂ B ⊂⋃

n

Fn, N ≥ 1

126 CHAPTER 7. SOME ASPECTS OF STOCHASTIC PROCESSES

and further

P (⋃

n

Fn −N⋃

n=1

Cn) ≤ P ((

N⋃

n=1

(Fn − Cn))⋃ ⋃

n>N

Fn)

≤N∑

n=1

P (Fn − Cn) +∑

n>N

P (Fn) ≤N∑

n=1

ǫ/2n +∑

n>N

P (Fn)

which converges as N → ∞ to ǫ and hence for sufficiently large N , it must besmaller than 2ǫ proving thereby that B ∈ C, ie, C is closed under countableunions. This completes the proof that C is a σ-field and hence the proof of theresult.

(c) What is a consistent family of finite dimensional probability distributionson (RZ+ ,B(RZ+))?

Solution: Such a family is a family of probability distributions F (x1, ..., xN , n1, ..., nN )on RN for each set of positive integers n1, ..., nNand N = 1, 2, ... satisfying

(a)F (xσ1, ..., xσN , nσ1, ..., nσN ) = F (x1, ..., xN , n1, ..., nN )

for all permutations σ of (1, 2, ..., N) and(b)

limxN→∞F (x1, ..., xN , n1, ..., nN ) = F (x1, ..., xN−1, n1, ..., nN−1)

(d) Prove Kolmogorov’s consistency theorem, that given any consistent fam-ily of finite dimensional probability distributions on (RZ+ ,B(RZ+)), there existsa probability space and a real valued stochastic process defined on this spacewhose finite dimensional distributions coincide with the given ones.

Let F (x1, ..., xN ;n1, ..., nN ), n1, ..., nN ≥ 0, x1, ..., xN ∈ R, N ≥ 1 be a con-sistent family of probability distributions. Thus F (x1, ..., xN ; 1, 2, ..., N) = FN (x1, ..., xN )is a probability distribution on R

N with the property that

limxN→∞FN (x1, ..., xN ) = FN−1(x1, ..., xN−1)

The aim is to construct a (countably additive) probability measure P on (RZ+ ,B(RZ+)such that

P (B × RZ+) = PN (B), B ∈ B(RN)

where

PN (B) =

B

FN (x1, ..., xN )dx1...dxN

DefineQ(B × R

Z+) = PN (B), B ∈ B(RN )

Then by the consistency of FN , N ≥ 1, Q is a well defined finitely additiveprobability measure on the field

F =⋃

N≥1

(B(RN )× RZ+)

127

Note thatBN = B(RN)× R

Z+

is a field on RZ+ for every N ≥ 1 and that

BN ⊂ BN+1, N ≥ 1

It suffices to prove that Q is countably additive on F , or equivalently, if BN ∈BN is a sequence such that BN ↓ φ, then Q(BN ) ↓ 0. Equivalently, it sufficesto show that if BN ∈ B(RN) is a sequence such that

BN = BN × RZ+ ↓ φ

thenPN (BN ) ↓ 0

Suppose that this is false. Then, without loss of generality, we may assume thatthere is a δ > 0 such that PN (BN ) ≥ δ∀N ≥ 1. By regularity of probabilitymeasures on RN for N finite, we can find for each N ≥ 1, a compact setKN ∈ B(RN) such that KN ⊂ BN and PN (BN −KN ) < δ/2N . Now consider

KN = (K1 × RN−1) ∩ (K2 ∩ R

N−2) ∩ ... ∩ (KN−1 × R)×KN

ThenKN ⊂ KN ⊂ R

N

and KN being a closed subset of the compact set KN is also compact. KN isclosed because each Kj , j = 1, 2, ..., N is compact and hence closed. Define

KN = KN × RZ+ , N ≥ 1

Then, it is clear that KN is decreasing and that

KN ⊂ BN = BN × RZ+

Further,Q(KN ) = Q(BN)−Q(BN − KN )

Now,Q(BN) = PN (BN ) ≥ δ

and

Q(BN − KN ) = PN (BN −N⋂

k=1

(Kk × RN−k))

= PN (N⋃

k=1

(BN − (Kk × RN−k))

≤ PN (

N⋃

k=1

(Bk −Kk)× RN−k)

128 CHAPTER 7. SOME ASPECTS OF STOCHASTIC PROCESSES

≤N∑

k=1

Pk(Bk −Kk) =

N∑

k=1

δ/2k ≤ δ/2

Thus,Q(KN ) ≥ δ/2

and hence KN is non-empty for each N . Hence, so is KN for each N . NowKN ⊂ BN ↓ φ. Hence KN ↓ φ and we shall obtain a contradiction to this asfollows: Since KN is non-empty, we may choose

(xN1, ..., xNN ) ∈ KN , N = 1, 2, ...

Then,xN1 ∈ K1, (xN1, xN2) ∈ K2, ..., (xN1, ..., xNN ) ∈ KN

Since K1 is compact, xN1, N ≥ 1 has a convergent subsequence say xN1,1.Now, (xN1,1, xN1,2) ∈ K2 and since K2 is compact, this has a convergent subse-quence say (xN2,1, xN2,2). Then, xN2,1 is a subsequence of xN1,1 and hencealso converges. Continuing this way, we finally obtain a convergent sequence(yn,1, ..., yn,N), n = 1, 2, ... in KN such that (yn,1, ...yn,j) ∈ Kj, j = 1, 2, ..., nand of course (yn,1, ..., yn,j), n ≥ 1 converges. Let

yj = limnyn,j , j = 1, 2, ...

Then, it is clear that since Kj is closed,

(y1, ..., yj) ∈ Kj , j = 1, 2, ...

Hence(y1, y2, ...yN) ∈ KN , N ≥ 1

and hence,(y1, y2, ...) ∈ KN , N ≥ 1

which means that(y1, y2, ...) ∈

N≥1

KN

which contradicts the fact that the rhs is an empty set. This proves the Kol-mogorov consistency theorem.

7.1. PROBLEM SUGGESTED BY PROF.DHANANJAY 129

7.1 Problem suggested by Prof.Dhananjay

When the magnet falls within the tube of radius R0 with the coil having n0

turns per unit length and extending from z = h to z = h + a and with thetube extending from z = 0 to z = d, we write down the equations of motionof the magnet. We usually may assume that at time t, the magnetic momentvector of the magnet is m and that its centre does not deviate from the centralaxis of the tube. This may be safely assumed provided that the tube’s radius issmall. However suppose we do not make this assumption so that we aim a themost general possible non-relativistic situation in which the magnetic fields arestrong which may cause the position of the CM of the magnet to deviate fromthe central axis of the cylinder. At time t, assume that the CM of the magnetis located at r0(t) = (ξ1(t), ξ2(t), ξ3(t)). We require a differential equation forξ(t). The magnet at time t is assumed to have experienced a rotation R(t) (a3× 3 rotation matrix). Thus, it moment at time t is given by

m(t) = m0R(t)z

assuming that at time t = 0, the moment pointed along the positive z direction.The magnetic field produced by the magnet at time t is given by

Bm(t, r) = curl(µ/4π)(m(t)× (r− r0(t))/|r− r0(t)|3)

and the flux of this magnetic field through the coil is

Φ(t) = n0

∫ h+a

h

dz

x2+y2≤R20

Bmz(t, x, y, z)dxdy

The angular momentum vector of the magnet is proportional to its magneticmoment. So by the torque equation, we have

dm(t)/dt = γ.m(t)×Bc(t, r0(t))−−− (1)

where Bc(t, r) is the magnetic field produced by the coil. The coil current is

I(t) = Φ′(t)/RL

where RL is the coil resistance. Then, the magnetic field produced by the coilis

Bc(t, r) = (µ/4π)I(t)

∫ 2n0π

0

[(−x.sin(φ)+y.cos(φ))×(r−R0(cos(φ)x+sin(φ)y)−(a/2πn0)φz]

/|r−R0(cos(φ)x + sin(φ)y)− (a/2πn0)φz|3R0dφ

This expression can be used in (1). Note that I(t) depends on Φ(t) which inturn is a function of m(t). Since in fact I(t) is proportional to Φ′(t), it followsthat I(t) is function of m(t) and m′(t). Note that Bc(t, r) is proportional toI(t) and is therefore also a function of m(t) and m′(t). However, we also note

130 CHAPTER 7. SOME ASPECTS OF STOCHASTIC PROCESSES

that Bm(t, r) is also a function of r0(t). Hence, Φ(t) is also a function of r0(t)and hence I(t) is also a function of r0(t) and r′0(t). Thus, Bc(t, r) is a functionof r0(t), r

′0(t),m(t),m′(t), ie, we can express it as

Bc(t, r|r0(t), r′0(t),m(t),m′(t))

Thus, we write (1) as

dm(t)/dt = γ.m(t)×Bc(t, r0(t)|r0(t), r′0(t),m(t),m′(t))−−− (2)

Next, the force exerted on the magnet by the magnetic field produced by thecoil is given by

F(t) = ∇r0(t)(m(t).Bc((t, r0(t)|r0(t), r′0(t),m(t),m′(t))

which means that the rectilinear equation of motion of the magnet must begiven by

Mr′′0 (t) = F(t)−Mgz

However to be more precise, the term V = −m(t).Bc(t, r0(t)|r0(t), r′0(t),m(t),m′(t))represents a potential energy of interaction between the magnet and the reac-tion field produced by the coil. So its recilinear equation of motion should begiven by

d/dt∂L/∂r′0(t)− ∂L/∂r0(t) = 0

where L, the Lagrangian is given by

L =M |r′0(t)|2/2− V −mgz0(t)

Thus the rectilinear equation of motion of the coil is given by

Mr′′0(t) + d/dt(∇r′0(t)m(t).Bc(t, r0(t)|r0(t), r′0(t),m(t),m′(t)))

−∇r0(t)(m(t).Bc(t, r0(t)|r0(t), r′0(t),m(t),m′(t))) +mgz = 0−−− (3)

(2) and (3) constitute the complete set of six coupled ordinary differential equa-tions for the six components of m(t), r0(t).

7.2 Converse of the Shannon coding theorem

Let A be an alphabet. Let u1, ..., uM ∈ An with

Pui= P

where for u ∈ An, Pu(x) = N(x|u)/n, x ∈ A with N(x|u) denoting the numberof times that x has occurred in u. Let νx(y), x ∈ A, y ∈ B be the discretememory channel transition probability from the alphabet A to the alphabet B.Define

Q(y) =∑

x

P (x)νx(y), y ∈ B

7.2. CONVERSE OF THE SHANNON CODING THEOREM 131

let D1, ..., DM be disjoint sets in Bn such that νuk(Dk) > 1 − ǫ, k = 1, 2, ..., n,

where νu(v) = Πnk=1νu(k)(v(k)) for u = (u(1), ..., u(n)), v = (v(1), ..., v(n)). Let

Ek = Dk ∩ T (n, νuk, δ), k = 1, 2, ...,M

where T (n, f, δ) is the set of δ-Bernoulli typical sequences of length n havingprobability distribution f . For u ∈ An with Pu = P , define T (n, νu, δ) to consistof all v ∈ Bn for which

vx ∈ T (N(x|u), νx, δ)∀x ∈ A

Then we claim that if v ∈ T (n, νu, δ), then

v ∈ T (n,Q, δ√a)

The notation used is as follows. For x ∈ A, and u ∈ An, v ∈ Bn, define

vx = (v(k) : u(k) = x, k = 1, 2, ..., n)

Note that vx ∈ BN(x|u). To prove this claim, we note that for y ∈ B,

|N(y|v)− nQ(y)| = |∑

x

N(y|vx)− n∑

x

P (x)νx(y)|

= |∑

x

N(y|vx)−∑

x

N(x|u)νx(y)| ≤∑

x

|N(y|vx)−N(x|u)νx(y)|

≤∑

x

δ√

N(x|u)νx(y)(1 − νx(y))

≤ δ√a.

x

N(x|u)νx(y)(1 − νx(y))

= δ√an.

x

P (x)νx(y)(1 − νx(y))

Now,∑

x

P (x)νx(y) = Q(y),

x

P (x)νx(y)2 ≥ (

x

P (x)νx(y))2 = Q(y)2

and therefore,

|N(y|v)− nQ(y)| ≤ δ√an

Q(y)(1−Q(y)

proving the claim. For v ∈ Ek, we have v ∈ T (n, νuk, δ) and hence by the result

just proved, v ∈ T (n,Q, δ√a).

Now let v ∈ T (n,Q, δ).

132 CHAPTER 7. SOME ASPECTS OF STOCHASTIC PROCESSES

It follows that since

Qn(v) = Πy∈BQ(y)N(y|v)

and since|N(y|v)− nQ(y)| ≤ δ

nQ(y)(1−Q(y)), y ∈ B

we have

nQ(y)− δ√

nQ(y)(1−Q(y)) ≤ N(y|v) ≤ nQ(y) + δ√

nQ(y)(1−Q(y)), y ∈ B

and therefore,

ΠyQ(y)nQ(y)+δ√nQ(y)(1−Q(y)) ≤ Qn(v) ≤ ΠyQ(y)nQ(y)−δ

√nQ(y)(1−Q(y))

Thus summing over all v ∈ T (n,Q, δ), we get

2−nH(Q)−c√nµ(T (n,Q, δ)) ≤ Qn(T (n,Q, δ) ≤ 2−nH(Q)+c√nµ(T (n,Q, δ))

wherec = −δ

y

Q(y)(1−Q(y))log2(Q(y))

and where µ(E) denotes the cardinality of a set E. In particular,

µ(T (n,Q, δ)) ≤ 2nH(Q)+c.sqrtn

It follows that

M∑

k=1

µ(Ek) = µ(⋃

k

Ek) ≤ µ(T (n,Q, δ√a)) ≤ 2nH(Q)+c

√an

On the other hand,νuk

(Dk) > 1− ǫ

and by the Chebyshev inequality combined with the union bound,

νx,N(x|u)(T (N(x|u), νx, δ)) ≥ 1− b/δ2

and henceνu(T (n, νu, δ)) ≥ 1− ab/δ2

for any u ∈ An. In particular,

νuk(Ek) ≥ 1− ab/δ2 − ǫ, k = 1, 2, ...,M

Further, by the above argument, we have for v ∈ T (n, νu, δ),

νx,N(x|u)(vx) ≤ 2N(x|u)H(νx)+c√n

and hence,νu(v) ≤ 2−

∑xN(x|u)H(νx)−c

√n

7.2. CONVERSE OF THE SHANNON CODING THEOREM 133

= 2−nI(P,ν)+c√n

whereI(P, ν) =

x

P (x)H(νx), P = Pu

From this we easily deduce that

νuk(Ek) ≤ 2−nI(P,ν)+c

√nµ(Ek), k = 1, 2, ...,M

Thus,(1− ab/δ2 − ǫ)2nI(P,ν)−c

√n ≤ µ(Ek), k = 1, 2, ...,M

so that

M(1− ab/δ2 − ǫ)2nI(P,ν)−c√n ≤

k

µ(Ek) ≤ 2nH(Q)+c√n

or equivalently,

log(M)/n ≤ H(Q)− I(P, ν) +O(1/√n)

which proves the converse of Shannon’s coding theorem.

134 CHAPTER 7. SOME ASPECTS OF STOCHASTIC PROCESSES

Chapter 8

Problems inelectromagnetics andgravitation

The magnetic field produced by a magnet of given magnetic dipole moment ina background gravitational field taking space-time curvature into account. TheMaxwell equations in curved space-time are

(Fµν√−g),ν = Jµ

√−g

or equivalently(gµαgνβ

√−gFαβ),ν = Jµ√−g

Writinggµν(x) = ηµν + hµν(x)

where η is the Minkowski metric and h is a small perturbation, we have approx-imately

gµν = ηµν − hµν , hµν = ηµαηνβhαβ

and √−g = 1− h/2, h = ηµνhµν

We haveFµν = Aν,µ −Aµ,ν

and writingAµ = A(0)

µ +A(1)µ

where Aµ(0) is the zeroth order solution, ie, without gravity and A(1)µ the first

order correction to the solution, ie, it is a linear functional of hµν , we obtainusing first order perturbation theory

Fµν = F (0)µν + F (1)

µν

135

136CHAPTER 8. PROBLEMS IN ELECTROMAGNETICSAND GRAVITATION

whereF (k)µν = A(k)

ν,µ −A(k)µ,ν , k = 0, 1

gµαgνβ√−g =

(ηµα − hµα)(ηνβ − hνβ)(1 − h/2)

= ηµαηνβ − fµναβ

wherefµναβ = ηµαηνβh/2 + ηµαh

νβ

+ηνβhµα

Thengµαgνβ

√−gFαβ =

ηµαηνβF(0)αβ

+ηµαηνβF(1)αβ − fµναβF

(0)αβ

= F 0µν + F 1µν

whereF 0µν = ηµαηνβF

(0)αβ ,

F 1µν = ηµαηνβF(1)αβ − fµναβF

(0)αβ

AlsoJµ

√−g = J0µ + J1µ, J0µ = Jµ, J1µ = −Jµh/2The gauge condition

(Aµ√−g),µ = 0

becomes0 = (gµν

√−gAν),µ =

[(ηµν − hµν)(1− h/2)Aν ],µ

= [(ηµν − kµν)(A(0)ν +A(1)

ν )],µ

wherekµν = hµν − hηµν/2

and hence equating coefficients of zeroth and first orders of smallness on bothsides gives

ηµνA(0)ν,µ = 0,

(kµνA(0)ν ),µ = ηµνA

(1)ν,µ

Chapter 9

Some more aspects ofquantum informationtheory

9.1 An expression for state detection error

Let ρ, σ be two states and let their spectral decompositions be

ρ =∑

a

|ea > p(a) < ea|, σ =∑

a

|fa > q(a) < fa|

so that < ea|eb >= δ(a, b), < fa|fb >= δ(a, b). The relative entropy betweenthese two states is

D(ρ|σ) = Tr(ρ.(log(ρ)− log(σ)))

=∑

a

p(a)log(p(a))−∑

a,b

p(a)log(qb)| < ea|fb > |2

=∑

a,b

(p(a)| < ea|fb > |2(log(p(a)| < ea|fb > |2)− log(qb| < ea|fb > |2))

=∑

a,b

(P (a, b)log(P (a, b))− log(Q(a, b))) = D(P |Q)

where P,Q are two bivariate probability distributions defined by

P (a, b) = p(a)| < ea|fb > |2, Q(a, b) = q(b)| < ea|fb > |2

Now let 0 ≤ T ≤ 1, ie, T is a detection operator. Consider the detection errorprobabiity

2P (ǫ) = Tr(ρ(1− T )) + Tr(σT )

=∑

a

(p(a) < ea|1− T |ea > −q(a) < fa|T |fa >)

137

138CHAPTER 9. SOMEMORE ASPECTS OFQUANTUM INFORMATIONTHEORY

=∑

a

p(a)(1− < ea|T |ea >)−∑

a,b

q(a)| < fa|eb > |2 < eb|T |fa >

Suppose T is a projection, ie, T 2 = T ∗ = T . Then,

Tr(ρ(1 − T )) =∑

a

p(a) < ea|1− T )|ea >=∑

a

p(a) < ea|(1 − T )2|ea >

=∑

a,b

p(a)| < ea|1− T |fb > |2

Likewise,

Tr(σT ) =∑

a,b

q(b) < ea|T |fb > |2

Hence,

2P (ǫ) =∑

a,b

(p(a)| < ea|1− T |fb > |2 + q(b) < ea|T |fb > |2)

≥∑

a,b

min(p(a), q(b))(| < ea|1− T |fb > |2 + q(b) < ea|T |fb > |2)

≥∑

a,b

(1/2)min(p(a), q(b))| < ea|fb > |2

where we have used2(|z|2 + |w|2) ≥ |z + w|2

for any two complex numbers z, w. Note that

a,b

(1/2)min(p(a), q(b))| < ea|fb > |2 =

(1/2)min0≤t(a,b)≤1,a,b=1,2,...,n

a,b

[p(a)| < ea|fb > |2(1−t(a, b))+q(b)| < ea|fb > |2t(a, b))]

because if u, v are positive real numbers, then as t varies over [0, 1], tu+(1− t)vattains is minimum value of v when t = 0 if u ≥ v and u when t = 1 if u < v.In other words,

mint∈[0,1](tu+ (1 − t)v) = min(u, v)

In other words we have proved that

2P (ǫ) ≥ (1/2)min0≤t(a,b)≤1,a,b=1,2,...,n

a,b

[P (a, b)(1− t(a, b)) +Q(a, b)t(a, b)]

The quantity on the rhs is the minimum error probability for the classical hy-pothesis testing problem between the two bivariate probability distributionsP,Q.

9.2. OPERATORCONVEX FUNCTIONS AND OPERATORMONOTONE FUNCTIONS139

9.2 Operator convex functions and operator mono-

tone functions

[1] Let f be operator convex. Then, if K is a contraction matrix, we have

f(K∗XK) ≤ K∗f(X)K

for all Hermitian matrices X . By contraction we mean that K∗K ≤ I. To seethis define the following positive (ie non-negative) matrices

L =√1−K∗K,M =

√1−KK∗

Then by Taylor expansion and the identity

K(K∗K)m = (KK∗)mK,m = 1, 2, ...,

we haveKL =MK,LK∗ = K∗M

Define the matrix

U =

(

K −ML K∗

)

and observe that since L∗ = L,M∗ =M and

K∗K + L2 = I,−K∗M + LK∗ = 0,−MK +KL = 0,M2 +KK∗ = I

it follows that U is a unitary matrix. Likewise define the unitary matrix

V =

(

K ML −K∗

)

Also define for two Hermitian matrices A,B

T =

(

A 00 B

)

,

Then, we obtain the identity

(1/2)(U∗TU + V ∗TV ) =

(

K∗AK + LBL 00 KBK∗ +MAM

)

Using operator convexity of f , we have

f((1/2)(U∗TU+V ∗TV )) ≤ (1/2)(f(U∗TU)+f(V ∗TV )) = (1/2)(U∗f(T )U+V ∗f(T )V )

which in view of the above identity, results in

f(K∗AK + LBL) ≤ K∗f(A)K + Lf(B)L

andf(KBK∗ +MAM) ≤ Kf(B)K∗ +Mf(A)M

140CHAPTER 9. SOMEMORE ASPECTS OFQUANTUM INFORMATIONTHEORY

In particular, taking B = 0, the first gives

f(K∗AK) ≤ K∗f(A)K + f(0)L2

and choosing A = 0, the second gives

f(KBK∗) ≤ Kf(B)K∗ + f(0)M2

In particular, if in addition to being operator convex, f satisfies f(0) ≤ 0, thenwe get

f(K∗AK) ≤ K∗f(A)K, f(KBK∗) ≤ Kf(B)K∗

for any contraction K. Now we show that if f is operator convex and f(0) ≤ 0,then for operators K1, ...,Kp that satisfy

p∑

j=1

K∗jKj ≤ I

we have

f(

p∑

j=1

K∗jAjKj) ≤

p∑

j=1

K∗j f(Aj)Kj

In fact, defining

K =

K1 0 ... 0K2 0 ... 0.. .. ... ..Kp 0 ... 0

we observe that

K∗K =

( ∑pj=1K

∗jKj 0

0 0

)

≤ I

ie K is a contraction and hence by the above result

f(K∗TK) ≤ K∗f(T )K

TakingT = diag[A1, A2, ..., Ap]

we get

K∗TK =

( ∑pj=1K

∗jAjKj 0

0 0

)

and hence,(

f(∑pj=1K

∗jAjKj) 0

0 f(0)

)

≤( ∑p

j=1K∗j f(Aj)Kj 0

0 0

)

which results in the desired inequality. Note that from the above discussion, wealso have

f(∑

j

KjBjK∗j ) ≤

j

Kjf(Bj)K∗j

9.3. A SELECTION OF MATRIX INEQUALITIES 141

9.3 A selection of matrix inequalities

[1] If A,B are square matrices the eigenvalues of AB are the same as those ofBA. If A is non-singular, then this is obvious since

A−1(AB)A = BA

Likewise if B is non-singular then also it is obvious since

B−1(BA)B = AB

If A is singular, we can choose a sequence cn → 0 so that A+cnI is non-singularfor every n and then

(A+ cnI)−1(A+ cnI)B(A+ cnI) = B(A+ cnI)∀n

Thus, (A+ cnI)B has the same eigenvalues as those of B(A+ cnI) for each n.Taking the limit n → ∞, the result follows since the eigenvalues of (A + cI)Bare continuous functions of c.

[2] Let A be any square diagonable matrix. Let λ1, ..., λn be its eigenvalues.We have for any unit vector u,

| < u,Au > |2 ≤< u,A∗Au >

Let e1, ..., en be a basis of unit vectors of A corresponding to the eigenvaluesλ1, ..., λn respectively. Assume that

|λ1 >≥ |λ2| ≥ ... ≥ |λn|

Let M be any k dimensional subspace and consider the n− k + 1 dimensionalsubspace W = spanek, ek+1..., en. Then dim(M ∩W ) ≥ 1 and hence we canchoose a unit vector u ∈M ∩W . Then writing u = ckek + ...+ cnen, it followsthat

| < u,Au > | = |ckλkek + ...+ cnλnen| ≤ |λk|.(|ck|+ ...+ |cn|)

Suppose we Gram-Schmidt orthonormalize ek, ek+1, ..., en in that order. Theresulting vectors are

wk = ek, wk+1 ∈ span(ek, ek+1), ..., wn ∈ span(ek, ..., en)

Specifically,

wk+1 = r(k + 1, k)ek + r(k + 1, k + 1)ek+1, ..., wn = r(n, k)ek + ...+ r(n, n)en

Then we can write

u = dkwk+...+dmwn, |dk|2+...+|dn|2 = 1, < wr, ws >= δ(r, s), r, s = k, k+1, ..., n

142CHAPTER 9. SOMEMORE ASPECTS OFQUANTUM INFORMATIONTHEORY

and we find that

Awk = λkwk, Awk+1 = r(k + 1, k)λkek + r(k + 1, k + 1)λk+1ek+1,

Awn = r(n, k)λkek + ...+ r(n, n)λnen

Diagonalize A so that

A =

n∑

k=1

λkekf∗k , f

∗k ej = δ(k, j)

Then,

< fk, Aek >= λk

Thus,

|λk| ≤‖ fk ‖ . ‖ Aek ‖Let A⊗ak denote the k-fold tensor product of A with itself acting on the

k fold antisymmetric tensor product of Cn. Then, the eigenvalues of A⊗an

are λi1 ...λik with n ≥ i1 > i2 > ... > ik ≥ 1 and likewise its singular valuesare si1 ...sik , n ≥ i1 > i2 > ... > ik ≥ 1 where s1 ≥ s2 ≥ ... ≥ sn ≥ 0 arethe singular values of A. Now if T is any operator whose eigenvalue havingmaximum magnitude is µ1 and whose maximum singular value is s1, then wehave

|µ1| = spr(T ) = limn ‖ T n ‖1/n≤‖ T ‖= s1

Applying this result to T = A⊗ak gives us

Πki=1|λi| ≤ Πki=1si, k = 1, 2, ..., n

[3] If A,B are such that AB is normal, then

‖ AB ‖≤‖ BA ‖

In fact since AB is normal, its spectral radius coincides with its spectral normand further, since AB and BA have the same eigenvalues, the spectral radii ofAB and BA are the same and finally since the spectral radius of any matrixis always smaller than its spectral norm (‖ T n ‖1/n≤‖ T ‖), the result followsimmediately:

‖ AB ‖= spr(AB) = spr(BA) ≤‖ BA ‖

[1] Show that if S, T ≥ 0 and S ≤ 1, then

1− (S + T )−1/2S(S + T )−1/2 ≤ 2− 2S + 4T

9.3. A SELECTION OF MATRIX INEQUALITIES 143

[2] Show that if Sk, Tk, Rk, k = 1, 2 are positive operators in a Hilbert spacesuch that (a) [S1, S2] = [T1, T2] = [[R1, R2] = 0 and Sk + Tk ≤ Rk, k = 1, 2,,then for any 0 ≤ s ≤ 1, we have Lieb’s inequality:

R1−s1 Rs2 ≥ S1−s

1 Ss2 + T 1−s1 T s2

First we show that this inequality holds for s = 1/2, then we show that if itholds for s = µ, ν, then it also holds for s = (µ + ν)/2 and that will completethe proof.

| < y|S1/21 S

1/22 + T

1/21 T

1/22 |x > |2

≤ (| < S1/21 y, S

1/22 x > |+ | < T

1/21 y, T

1/22 x > |)2

≤ [‖ S1/21 y ‖ . ‖ S1/2

2 x ‖ + ‖ T 1/21 y ‖ . ‖ T 1/2

2 x ‖]2

≤ [‖ S1/21 y ‖2 + ‖ T 1/2

1 y ‖2]

×[‖ S1/22 x ‖2 + ‖ T 1/2

2 x ‖2]=< y, (S1 + T1)y >< x, (S2 + T2)x >≤< y,R1y >< x,R2x >

Then, replacing y by R−1/41 R

1/42 x and x by R

−1/42 R

1/41 x in this inequality gives

us (recall that R1 and R2 are assumed to commute)

< x|R−1/41 R

1/42 (S

1/21 S

1/22 + T

1/21 T

1/22 )R

−1/42 R

1/41 |x >

≤< x,R1/21 R

1/22 x >

Now define the matrices

B = R−1/41 R

1/42 (S

1/21 S

1/22 + T

1/21 T

1/22 , A = R

−1/42 R

1/41

Then it is clear thatAB = S

1/21 S

1/22 + T

1/21 T

1/22

is a Hermitian positive matrix because [S1, S2] = [T1, T2] = 0 and we also havethe result that the eigenvalues of AB are the same as those of BA. Further,from the above inequality, the kth largest eigenvalue of BA is ≤ the kth largest

eigenvalue of R1/21 R

1/22 . It follows therefore that the kth largest eigenvalue of the

Hermitian matrix AB is ≤ the kth largest eigenvalue of the Hermitian matrix

R1/21 R

1/22 . However, from this result, we cannot deduce the desired inequality.

To do so, we note that writing

S1/21 S

1/22 + T

1/21 T

1/22

we have proved that

| < y,Cx > |2 ≤< y,R1y >< x,R2x >

This implies

| < R−1/21 y, CR

−1/22 x > | ≤‖ y ‖ . ‖ x ‖

144CHAPTER 9. SOMEMORE ASPECTS OFQUANTUM INFORMATIONTHEORY

for all x, y and hence

| < y,R−1/21 CR

−1/22 x > |/ ‖ y ‖ . ‖ x ‖≤ 1

for all nonzero x, y. Taking supremum over x, y yields

‖ R−1/21 CR

−1/22 ‖≤ 1

and by normality of

R−1/41 R

−1/42 CR

−1/41 R

−1/42 =

(R1/41 R

−1/42 )(R

−1/21 CR

−1/41 R

−1/42 ) = PQ

whereP = (R

1/41 R

−1/42 ), Q = (R

−1/21 CR

−1/41 R

−1/42 )

we get‖ PQ ‖≤‖ QP ‖=

‖ R−1/21 CR

−1/22 ‖≤ 1

which implies since PQ is positive that

PQ ≤ I

or equivalently

(R−1/41 R

−1/42 CR

−1/41 R

−1/42 ) ≤ I

from which we deduce that (Note that R1 and R2 commute)

C ≤ R1/21 R

1/22

proving the claim.Now suppose it is valid for some s = µ, ν. Then we have

R1−µ1 Rµ2 ≥ S1−µ

1 Sµ2 + T 1−µ1 T µ2 ,

andR1−ν

1 Rν2 ≥ S1−ν1 Sν2 + T 1−ν

1 T ν2 ,

Then, replacing R1 by R1−µ1 Rµ2 , R2 by R1−ν

1 Rν2 , S1 by S1−µ1 Sµ2 , S2 by S1−ν

1 Sν2 ,T1 by T 1−µ

1 T µ2 and T2 by T 1−ν1 T ν2 and noting that these replacements satisfy all

the required conditions of the theorem, gives us

R1−(µ+ν)/21 R

(µ+ν)/22 ≥ S

1−(µ+ν)/21 S

(µ+ν)/22 + T

1−(µ+ν)/21 T

(µ+ν)/22 ,

ie the result also holds for s = (µ + ν)/2. Since it holds for s = 0, 1/2, 1 wededuce by induction that it holds for all dyadic rationals in [0, 1], ie, for all s ofthe form k/2n, k = 0, 1, ..., 2n, n = 0, 1, .... Now every real number in [0, 1] is alimit of dyadic rationals and hence by continuity of the inequality, the theoremhas been proved.

9.3. A SELECTION OF MATRIX INEQUALITIES 145

An application example:Remark: Let T be a Hermitian matrix of size n×n. Let M be an n− k+1-

dimensional subspace and let λ1 ≥ ... ≥ λn be the eigenvalues of T arranged indecreasing order. Then,

supx∈M < x, Tx >≥ λk

and thusinfdimN=n−k+1supx∈N < x, Tx >= λk

Likewise, if M is a k dimensional subspace, then

infx∈M < x, Tx >≤ λk

and hencesupdimN=kinfx∈N < x, Tx >= λk

Thus,λk(AB) = λk(BA) ≤ λk(C), k = 1, 2, ..., n

whereC = R

1/21 R

1/22

Let A,B be two Hermitian matrices. Choose a subspace M of dimensionn− k + 1 so that

λk(A)↓ = supx∈M < x,Ax >

Then,λk(A−B)↓ = infdimN=n−k+1supx∈N < x, (A−B)x >

≤ supx∈M < x, (A −B)x >= supx∈M(< x,Ax > − < x,Bx >)

≤ supx∈M < x,Ax > −infx∈M < x,Bx >

= λk(A)↓ − infx∈M < x,Bx >

λk(A−B)↓ ≤ λk(A)↓ − infx∈M < x,Bx >

≤ λk(A)↓ − λn(B)↓

Now let M be any n− k + 1 dimensional subspace such that

infx∈M < x,Bx >= λn−k+1(B)↓ = λk(B)↑

Then,

λk(A−B)↓ ≤ supx∈M < x, (A−B)x >≤ supx∈M < x,Ax > −infx∈M < x,Bx >

≤ λk(A)↓ − λk(B)↑

This is a stronger result than the previous one.

146CHAPTER 9. SOMEMORE ASPECTS OFQUANTUM INFORMATIONTHEORY

9.4 Matrix Holder Inequality

Now let A,B be two matrices. Consider (AB)⊗ak. its maximum singular valuesquare is

‖ (AB)⊗ak ‖2

which is the maximum eigenvalue of ((AB)∗(AB))⊗ak. This maximum eigen-value equals the product of the first k largest eigenvalues of (AB)∗(AB), ie, theproduct of the squares of the first k largest singular values of AB. Thus, we canwrite

‖ (AB)⊗ak ‖= Πkj=1sj(AB)↓

On the other hand,

‖ (AB)⊗ak ‖=‖ A⊗ak. ‖ B⊗ak ‖

≤‖ A⊗ak ‖ . ‖ B⊗ak ‖Now, in the same way, ≤‖ A⊗ak ‖ equals Πkj=1sj(A)

↓ and likewise for B. Thuswe obtain the fundamental inequality

Πkj=1sj(AB)↓ ≤ Πkj=1sj(A)↓sj(B)↓, k = 1, 2, ..., n

and hence we easily deduce that

s(AB) ≤ s(A).s(B)

which is actually a shorthand notation for

k∑

j=1

sj(AB)↓ ≤k

j=1

sj(A)↓.sj(B)↓, k = 1, 2, ..., n

In particular, taking k = n gives us for p, q ≥ 1 and 1/p+ 1/q = 1,

Tr(|AB|)) =n∑

j=1

sj(AB) ≤ (∑

j

sj(A)p)1/p.(

j

sj(B)q)1/q

Now,

Tr(|A|1/p) =∑

j

sj(A)1/p

and likewise for B. Therefore, we have proved the Matrix Holder inequality

Tr(|AB|) ≤ (Tr(|A|p))1/p.(Tr(|B|q))1/q

for any two matrices A,B such that AB exists and is a square matrix. Infact, the method of our proof implies immediately that if ψ(.) is any unitarilyinvariant matrix norm, then

ψ(AB) ≤ ψ(|A|p).1/p.ψ(|B|q)1/q

9.4. MATRIX HOLDER INEQUALITY 147

This is seen as follows. Since ψ is a unitarily invariant matrix norm,

ψ(A) = Φ(s(A))

where Φ() is a convex mapping from Rn+ into R+ that satisfies the triangleinequality and in fact all properties of a symmetric vector norm and also since

ap/p+ bq/q ≥ ab, a, b > 0

we get for t > 0,

Φ(x.y) = φ(tx.y/t) ≤ φ(tpxp)/p+ φ(yq/tq)/q

≤ tpφ(xp)/p+ φ(yq)/qtq

Now utp + v/tq is minimized when

putp−1 − qv/tq+1 = 0

or equivalently, whentp+q = qv/pu

Substituting this value of t immediately gives

Φ(x.y) ≤ Φ(xp)1/p.Φ(yq)1/q

and hence sinces(A.B) ≤ s(A).s(B)

(in the sense of majorization), it follows that for two matrices A,B,

ψ(A.B) = Φ(s(A.B)) ≤ Φ(s(A).s(B)) ≤ Φ(s(A)p)1/p.Φ(s(B)q)1/q

= ψ(|A|p)1/p.Φ(|B|q)1/q

Note that the eigenvalues (or equivalently the singular values) of |A|p are sameas s(A)p.

Remark: We are assuming that if x, y are positive vectors such that x < yin the sense of majorization, then

Φ(x) ≤ Φ(y)

This result follows since if a < b are two positive real numbers, then

a = tb+ (1− t)0

for t = a/b ∈ [0, 1] and by using the convexity of Φ and the fact that Φ(0) =

0. If x < y, then∑ki=1 xi ≤ ∑k

i=1 yi, k = 1, 2, ..., n and hence writing ξk =∑ki=1 xi, ηk =

∑ki=1 yi, k = 1, 2, ..., n, we get by regarding Φ(x) as a function of

ξk, k = 1, 2, ..., n and noting that convexity of Φ in x implies convexity of Φ inξ and the above result that

Φ(ξ) = Φ(ξ1, ..., ξn) ≤ Φ(η1, ξ2, ..., ξn) ≤Φ(η1, η2, ξ3, ..., ξn) ≤ ... ≤ Φ(η1, ..., ηn) = Φ(η)

148CHAPTER 9. SOMEMORE ASPECTS OFQUANTUM INFORMATIONTHEORY

Chapter 10

Lecture Plan for StatisticalSignal Processing

Course Instructor:Harish Parthasarathy.

[1] Revision of basic probability theory: Probability spaces, random vari-ables, expectation, variance, covariance, characteristic function, convergence ofrandom variables, conditional expectation. (four lectures)

[2] Revision of basic random process theory:Kolmogorov’s existence theo-rem, stationary stochastic processes, autocorrelation and power spectral density,higher order moments and higher order spectra, Brownian motion and Poissonprocesses, an introduction to stochastic differential equations. (four lectures)

[3] Linear estimation theory: Estimation of parameters in linear models andsequential linear models using least squares and weighted least squares methods,estimation of parameters in linear models with coloured Gaussian measurementnoise using the maximum likelihood method, estimation of parameters in timeseries models (AR,MA,ARMA) using least squares method with forgetting fac-tor, Estimation of parameters in time series models using the minimum meansquare method when signal correlations are known. Linear estimation theory asa computation of an orthogonal projection of a Hilbert space onto a subspace,the orthogonality principle and normal equations. (four lectures)

[4] Estimation of non-random parameters using the Maximum likelihoodmethod, estimation of random parameters using the maximum aposteriori method,basic statistical hypothesis testing, the Cramer-Rao lower bound. (four lectures)

[5] non-linear estimation theory: Optimal nonlinear minimum mean squareestimation as a conditional expectation, applications to non-Gaussian models.(four lectures)

[6] Finite order linear prediction theory: The Yule-Walker equations for esti-mating the AR parameters. Fast order recursive prediction using the Levinson-Durbin algorithm, reflection coefficients and lattice filters, stability in terms ofreflection coefficients. (four lectures)

[7] The recursive least squares algorithm for parameter estimation linear

149

150CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

models, simultaneous time and order recursive prediction of a time series usingthe RLS-Lattice filter. (four lectures)

[8] The non-causal Wiener filter, causal Wiener filtering using Wiener’s spec-tral factorization method and using Kolmogorov’s innovations method, Kol-mogorov’s formula for the mean square infinite order prediction error. (fourlectures)

[9] An introduction to stochastic nonlinear filtering: The Kushner-Kallianpurfilter, the Extended Kalman filter approximation in continuous and discretetime, the Kalman-Bucy filter for linear state variable stochastic models. (fourlectures)

[10] Power spectrum estimation using the periodogram and the windowedperiodogram, mean and variance of the periodogram estimate, high resolutionpower spectrum estimation using eigensubspace methods like Pisarenko har-monic decomposition, MUSIC and ESPRIT algorithms. Estimation of the powerspectrum using time series models. (four lectures)

[11] Some new topics:

[a] Quantum detection and estimation theory, Quantum Shannon codingtheory. (four lectures)

[b] Quantum filtering theory (four lectures)

Total: Forty eight lectures.

[1] Large deviations for the empirical distribution of a stationary Markovprocess.

[2] Proof of the Gartner-Ellis theorem.

[3] Large deviations in the iid hypothesis testing problem.

[4] Large deviations for the empirical distribution of Brownian motion pro-cess.

Let Xn, n ≥ 0 be a Markov process in discrete time. Consider

MN (f) = Eexp(N∑

n=0

f(Xn))

Then, given X0 = x

MN(f) = (πf )N (1)

where π is the transition probability operator kernel:

π(g)(x) =

π(x, dy)g(y)

and

πf (g)(x) =

exp(f(x))π(x, dy)g(y)

ie

πf (x, dy) = exp(f(x))π(x, dy)

151

If we assume that the state space E is discrete, then π(x, y) is a stochasticmatrix and πf (x, y) = exp(f(x))π(x, y). Let Df denote the diagonal matrixdiag[f(x) : x ∈ E]. Then

πf = exp(Df )π

andMN (f) = (exp(Df )π)

N (1)

The Gartner-Ellis limiting logarithmic moment generating function is then

Λ(f) = limN→∞N−1.log((exp(Df )π)

N (1)) = log(λm(f))

where λm(f) is the maximum eigenvalue of the matrix exp(Df )π. Now let

I(µ) = supu>0

x

µ(x)log((πu)(x))/u(x))

where µ is a probability distribution on E. We wish to show that

I(µ) = supf(∑

x

f(x)µ(x) − λm(f))

= supf (< f, µ > −λm(f))

152CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

10.1 End-Semester Exam, duration:four hours,

Pattern Recognition

[1] [a] Consider N iid random vectorsXn, n = 1, 2, ..., N where Xn has the prob-

ability density∑K

k=1 π(k)N(x|µk,Σk), ie, a Gaussian mixture density. Derivethe optimal ML equations for estimating the parameters θ = π(k), µk, σk) : k =1, 2, ...,K from the measurements X = Xn : n = 1, 2, ..., N by maximizing

ln(p(X|θ)) =N∑

n=1

ln(

K∑

k=1

π(k)N(Xn|µk,Σk))

[b] By introducing latent random vectors z(n) = z(n, k) : k = 1, 2, ...,K, n =1, 2, ..., N which are iid with P (z(n, k) = 1) = π(k) such that for each n, exactlyone of the z(n, k)′s equals one and the others are zero, cast the ML parameterestimation equation in the recursive EM form, namely as

θm+1 = argmaxθQ(θ, θm)

where

Q(θ, θm) =∑

z

ln(p(X,Z|θ)).p(Z|X, θm)

with Z = z(n) : n = 1, 2, ..., N. Specifically, show that

Q(θ, θm) =∑

n,k

γnk(X, θm)ln(π(k)N(Xn|µ(k),Σk)))p(Z|X, θm)

where

γn,k(X, θm) = E(z(n, k)|X, θm)

=∑

Z

z(n, k)p(Z|X, θm) =∑

z(n,k)

z(n, k)p(z(n, k)|X, θm)

Derive a formula for p(z(n, k)|X, θm). Derive the optimal equations for θm+1

by maximizing Q(θ, θm).

[2] Let Xn, n = 1, 2, ..., N be vectors in RN . Choose K << N and vectors

µ(k) : k = 1, 2, ...,K and a partition

P = E1 ∪ E2 ∪ ... ∪ EK

of 1, 2, ..., N into K disjoint non-empty subsets such that

E =

K∑

k=1

n∈Ek

‖ Xn − µ(k) ‖2

10.1. END-SEMESTEREXAM, DURATION:FOUR HOURS, PATTERNRECOGNITION153

is a minimum. Interpret this ”Clustering principle” from the viewpoint of datacompression and image segmentation. Explain how you would obtain an itera-tive solution to this clustering problem by defining

r(n, k) = 1, n ∈ Ek, r(n, k) = 0, n /∈ Ek

and expressing

E =∑

n,k

r(n, k) ‖ Xn − µ(k) ‖2

Relate this clustering problem to the problem of calculating the µ(k)′s and theπ(k)′s in the Gaussian mixture model. Specifically, compare the equations forcalculating r(n, k), µ(k) in this clustering problem to the equations for calcu-lating the same in the GMM. Explain how the GMM model is a ”smoothenedversion” of the clustering model by remarking that the event Xn ∈ Ek is tobe interpreted in the GMM context by saying that Xn has the N(x|µ(k),Σk)distribution and that

∑Nn=1 r(n, k)/N = card(Ek)/N corresponds to the prob-

ability π(k) of the kth component in the GMM occurring. Derive the explicitvalue that r(n, k) corresponds to in the GMM by showing that the optimal π(k)in the GMM satisfies

π(k) =N∑

n=1

π(k)N(Xn|µk,Σk)∑Km=1 π(m)N(Xn|µ(m),Σm)

Compare this with the quantity π(k) =∑Nn=1(r(n, k)/N),

n,k r(n, k) = N inthe clustering problem.

[3] Let Z(n) : n = 1, 2, ..., N be a Markov chain in discrete state spaceand let π(Z(n), Z(n + 1)|θ) be its one step transition probability distribution.Let π0(Z(0)) denote its initial distribution. Let X = (X(n) : n = 0, 1, ..., N) bethe HMM emission random variables with the conditional probability densityp(X(n)|Z(n), θ) defined.

[a] Show that the EM algorithm for estimating θ recursively amounts tomaximizing Q(θ, θ0) where

Q(θ, θ0) =

N∑

n=0

E(ln(p(X(n)|Z(n), θ))|X, θ0)

+

N−1∑

n=0

E(ln(π(Z(n), Z(n+ 1)|θ0))|X, θ0)

[b] Show that for evaluating the conditional expectation in [a], we require theconditional probabilities p(Z(n)|X, θ0) and p(Z(n), Z(n+1)|X, θ0). Explain howyou would evaluate these conditional probabilities recursively by first provingthe relations

p(X|Z(n), θ0) = p(X(N), X(N−1), ..., X(n)|Z(n), θ0).p(X(n−1), X(n−2), ..., X(0)|Z(n), θ0)

154CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

andp(X|Z(n+ 1), Z(n), θ0) =

p(X(N), ..., X(n+ 1)|Z(n+ 1, θ0).p(X(n− 1), ..., X(0)|Z(n), θ0)

4 [a] How would you approximate (based on matching only first and secondorder statistics) an N -dimensional random vector X by a p << N dimensionalrandom vector S using the linear model

X ≈ WS+ µ+V = X+V, X = WS+ µ

where S,V are zero mean mutually uncorrelated random vectors with covari-ances ΣS ,ΣV respectively and W is an N × p matrix to be determined. Tosolve this problem, you must assume that E(X) is well approximated by µ andCov(X) is well approximated by WΣSW

T +ΣV .

[b] Now consider the maximum likelihood version of the above PCA in whichthe data is modeled by an iid sequence:

Xn = WSn + µ+Vn, n = 1, 2, ..., N

where the S′ns are iid N(0,ΣS) random vectors, the V′

ns are iid N(0,Σv) ran-dom vectors with the two sequences being mutually independent. Calculatethe equations that determine the optimal choice of W, µ that would maximizep(Xn, n = 1, 2, ..., N |W, µ). Also by assuming the S′

ns to be latent randomvectors, derive the EM algorithm recursive equations for estimating W, µ fromthe Xn, n = 1, 2, ..., N .

Solution:p(XnNn=1, SnNn=1|W, µ)

= p(XnNn=1|Sn,W, µ)p(SnNn=1)

= ΠNn=1[fV(Xn −WSn − µ)fS(Sn)]

So in the EM algorithm, the function to be maximized is

Q(θ, θ0)

whereθ = (W, µ), θ0 = (W0, µ0)

and

Q(θ, θ0) =

N∑

n=1

E[log(fV(Xn −WSn − µ))|XmNm=1,W0, µ0)]

+

N∑

n=1

log(fS(Sn))|XmNm=1,W0, µ0)

10.1. END-SEMESTEREXAM, DURATION:FOUR HOURS, PATTERNRECOGNITION155

=

N∑

n=1

E[log(fV(Xn −WSn − µ))|Xn,W0, µ0)]

+

N∑

n=1

log(fS(Sn))|Xn,W0, µ0)

Note that only the first sum in this expression must be maximized since thesecond one does not depend on W, µ. However, if ΣS ,Σv also have to beestimated then the parameter vector would be

θ = (W, µ,ΣS ,Σv)

and then the second term would also count. Note further, that

p(SnNn=1|XnNn=1,W, µ) =

[ΠNn=1fV(Xn −WSn − µ)fS(Sn)]/ΠNn=1fX(Xn|W, µ)

with fX being N(µ,WΣsWT +Σv)

[c] If ΣV = σ2I, then derive the EM recursions for estimating W, µ,ΣS , σ2

and compare with the exact maximum likelihood equations. Explain why thisprocess is called principle component analysis and explain how much data com-pression we have achieved by this process.

[d] Consider now the following data matrix version of PCA. X is a givenN ×M real matrix. Show that it has a singular value decomposition (SVD)

X = UΣV

where U and V are respectively N × N and M ×M real orthogonal matricesand Σ is a diagonal N ×M matrix with positive diagonal entries either N orM in number, depending on whether N or M is smaller. Let r < min(N,M)and then using the SVD, construct an N × M matrix Y of rank r so that‖ X−Y ‖2F is a minimum where ‖ Z ‖F=

Tr(ZTZ) is the Frobenius norm ofthe real matrix Z.

156CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

[5] Consider the binary hypothesis testing problem in which under the hy-pothesis H1, each sample of the iid sequence X(n), n ≥ 1 has the probabilitydensity p1(X) while under H0, each sample of this iid sequence has the prob-ability density p0(X). Show that the Neyman-Pearson criterion of decisionbased on the observations X(n), n = 1, 2, ..., N leads to the test of deciding H1

in the case when N−1∑N

n=1 L(X(n)) ≥ η and deciding H0 in the case when

N−1∑Nn=1 L(X(n)) < η where

L(X) = ln(p1(X)/p0(X))

and the threshold η(N) is such that

0 < α =

Z1

ΠNn=1(p0(X(n))dX(n)) = P (N−1N∑

n=1

L(X(n)) > η|H0)

Here, Z1 is the region where H1 is decided. Estimate this probability by theLDP and calculate using the LDP the value of η for given α. Hence use thisthreshold η to calculate the other error probability

β(N) =

Zc1

ΠNn=1(p1(X(n))dX(n)) = P (N−1N∑

n=1

L(X(n)) < η|H1)

and show that for any α > 0 however small, β(N) if η = η(N) is obtained asabove for the given α, then β(N) converges to zero at the rate of D(p0|p1), ie,

N−1log(β(N)) → D(p0|p1), N → ∞

where

D(p0|p1) =∫

p0(X).log(p0(X)/p1(X)).dX

[6] This problem deals with non-parametric density estimation. Let p1(X), p0(X)be two different probability densities of a random vector X ∈ Rn. To estimatep1 at a given point X , we take samples from this distribution and draw a sphereof minimum radius R1 with centre at X , for which the first sample point fallswithin this sphere. Let V (R1) denote the volume of such a sphere. Then, esti-mate p1(X) as 1/V (R1). Likewise do so for estimating p0(X). Let the minimumradius for this be R0. Justify using the law of large numbers that such a non-parametric density estimate is reliable. Show that given a sample point X , inorder to test whether X came from p1 or from p0, we use the ML criterion,namely decide p1 if p1(X) ≥ p0(X) and decide p0 otherwise, where p denotesthe estimate of p obtained using the above scheme. Show that this MLE methodis the same as the minimum distance method, in the sense that if the first sam-ple point that we decide p1 if R1(X) < R0(X) and decide p0 otherwise whereR1(X) is the distance of the first point taken from the p1 sample from X and

10.1. END-SEMESTEREXAM, DURATION:FOUR HOURS, PATTERNRECOGNITION157

likewise R0(X) is the distance of the first point taken from the p0 sample fromX .

[7] Let a group G act on a set M and let f0, f1 : M → R be two image fieldsdefined on the set M such that

f1(x) = f0(g−10 x) + w(x), x ∈ M

where g0 ∈ G and w is a zero mean Gaussian noise field. Choose a point x0 ∈ Mand replace the above model by

f1(g) = f0(g−10 g) + w(g), g ∈ G

where

f1(g) = f1(gx0), f0(g) = f0(gx0)

Let π be a unitary representation of G in a the vector space Cp. Define theFourier transform of a function F on G evaluated at π by

F(F )(π) =

G

F (g)π(g)dg

where dg is the left invariant Haar measure on G. Show that the above imagemodel on G gives on taking Fourier transforms

F(f1)(π) = π(g0)F(f0)(π) + F(w)(π)

Let H be the subgroup of G consisting of all h ∈ G for which hx0 = x0. Showthat fk(gh) = fk(g), k = 1, 2, g ∈ g, h ∈ H . Hence, deduce that if

PH =

H

π(h)dh

where dh denotes the Haar measure on H , then PH is the orthogonal projectiononto the set of all vectors v ∈ Cp for which π(h)v = v∀h ∈ H provided that H isunimodular, ie, its left invariant Haar measure coincides with its right invariantHaar measure. Deduce further that

F(fk)(π) = F (fk)(π)PH , k = 0, 1

Hence show that the above model for the image Fourier transforms is equivalentto

F(f1)(π)PH = π(g0)F(f0)(π)PH + F(w)(π)PH

From this model, explain how by varying the representation π, we can estimatethe transformation g0 by minimizing

E(g0) =∑

k

W (k) ‖ F(f1)(πk)PH − πk(g0)F(f0)(πk)PH ‖2

158CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

where ‖ . ‖ is the Frobenius norm and W (k), k ≥ 1 are positive weights. Bystarting with an initial guess estimate g00 for g0 and writing g0 = (1 + δX).g00where δX is small and belongs to the Lie algebra of G, derive a set of linearequations for δX and solve them. For this, you can use the approximate identity

π((1 + δX)g00) = π(exp(δX)g00) = exp(dπ(δX))π(g00) = (1 + dπ(δX))π(g00)

where dπ is the differential of π, ie, it is the representation of the Lie algebraof G corresponding to the representation π of G. Note that dπ is a linear mapand hence if L1, ..., LN denotes a basis for the Lie algebra of G then

δX =

N∑

m=1

x(k)Lk, dπ(δX) =

N∑

m=1

x(k)dπ(Lk)

The problem of estimating δX is then equivalent to estimating x(k), k = 1, 2, ..., N .Also explain how you would approximately evaluate the mean and covarianceof the random variables x(k), k = 1, 2, ..., N in terms of the noise correlations.

Estimating the EEG parameters from the speech parameters using the EKF.The EEG model is

XE(t+ 1) = A(θ)XE(t) +WE(t+ 1) ∈ RN

where

A(θ) = A0 +

p∑

k=1

θ(k)Ak

θ = (θ(k) : k = 1, 2, ..., p)T ∈ Rp

is the EEG parameter vector to be estimated. The speech model on the otherhand is

XS(t) =

m∑

k=1

rs(k)XS(t− k) + rs(m+ 1) +WS(t) ∈ R

The speech parameter vector is

rs = (rs(k); k = 1, 2, ...,m+ 1)T ∈ Rm+1

We assume that the speech and EEG parameters are correlated, ie, there existsan affine relationship connecting the two of the form

rs = Cθ + d

The regression matrices C,d are assumed to be fixed, ie, they do not varyfrom patient to patient. These matrices can be estimated using the followingprocedure: Choose a bunch of K patients numbered 1, 2, ...,K having variouskinds of brain diseases. For each patient, collect his EEG and speech data and

10.1. END-SEMESTEREXAM, DURATION:FOUR HOURS, PATTERNRECOGNITION159

apply the EKF separately to these data to using the above models and noisymeasurement models of the form

ZE(t) = XE(t) +VE(t), ZS(t) = XS(t) + VS(t)

Denote the parameter estimates for the kth patient by θk, rsk for k = 1, 2, ..., , p.Then the regression matrices C,d are estimated by minimizing

E(C,d) =

K∑

k=1

‖ rsk −Cθk − d ‖2

For convenience of notation, we write

θk = θk, rsk = rsk

and then the optimal equations for the regression matrices obtained by settingthe gradient of E(C,d) to zero are given by

K∑

k=1

(rsk −Cθk − d)θTk = 0,

K∑

k=1

(rsk −Cθk − d) = 0

which can be expressed as

CRθ + dµTθ = Rsθ,

Cµθ + d = µs

where

Rθ = K−1.K∑

k=1

θk.θTk ∈ R

p×p

µθ = K−1K∑

k=1

θk ∈ Rp×1

Rsθ = K−1K∑

k=1

rskθTk ∈∈ R

(m+1)×(m+1),

µs = K−1K∑

k=1

rsk ∈ R(m+1)×1

Eliminating d from these equations gives

CRθ + (µs −Cµθ)µTθ = Rsθ

160CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

givingC = (Rsθ − µsµ

Tθ )(Rθ − µθµ

Tθ )

−1

andµs −Cµθ

This solves the problem of determining the regression coefficient matrices. Nowgiven a fresh patient, we wish to estimate his EEG parameters from only hisspeech data. To this end, we substitute into the speech model the expressionfor the speech parameters in terms of the EEG parameters using our regres-sion scheme: First transform the speech model into a state variable model byintroducing state variables:

ξ1(t) = XS(t− 1), ..., ξm(t) = XS(t−m)

andξ(t) = [ξ1(t), ..., ξm(t)]T

The speech model then becomes in state variable form

ξ(t+ 1) = B(rs)ξ(t) + u.rsm+1 + bWs(t)

where

B(rs) =

rs1 r2 rs3 ... rsm1 0 0 ... 0.. .. .. ... ..0 0 0.. 1 0

We can write

B(rs) =

m∑

k=1

rskBk

and hence our speech model becomes

ξ(t+ 1) = (

m∑

k=1

rskBk)ξ(t) + u.rsm+1 + bWs(t)

Nowrsk = (Cθ)k + dk, k = 1, 2, ...,m+ 1

Writing

F1(ξ, θ) =

m∑

k=1

rskBkξ + u.rsm+1

=m∑

k=1

((Cθ)k + dk)Bkξ + u(Cθ)m+1 + dm+1)

we find that

∂F1/∂ξ = [

m∑

k=1

((Cθ)k + dk)Bk]

10.2. QUANTUM STEIN’S THEOREM 161

∂F1/∂θ = [m∑

k=1

Ck1Bkξ+Cm+1,1u,m∑

k=1

Ck2Bkξ+Cm+1,2u, ...,m∑

k=1

CkpBkξ+Cm+1,pu]

10.2 Quantum Stein’s theorem

We wish to discriminate between two states ρ, σ using a sequence of iid mea-surements, ie, measurements taken when the states become ρ⊗n and σ⊗n, n =1, 2, .... Let 0 ≤ Tn ≤ 1 be the detection operators. If the state is ρ⊗n, then theprobability of a wrong decision is P1(n) = Tr(ρ⊗n(1− Tn)) while if the state isσ⊗n, the probability of a wrong decision is P2(n) = Tr(σ⊗nTn). The theoremthat we prove is the following: If the measurement operators Tn are such thatP2(n) → 0, then P1(n) cannot go to zero at a rate faster than exp(−nD(σ|ρ)),ie,

liminfn→n−1.log(P1(n)) ≥ −D(σ|ρ)

and conversely, for any δ > 0, there there exists a sequence Tn of detectionoperators such that P2(n) → 0 and simultaneously

limsupnn−1.log(P1(n)) ≤ −D(σ|ρ) + δ

To prove this, we first use the monotonicity of the Renyi entropy to deduce thatfor any detection operator Tn,

(Tr(ρ⊗n(1 − Tn)))sTr(σ⊗n(1 − Tn))

1−s

≤ Tr(ρ⊗ns)σ⊗n(1−s))

for any s ≤ 0. Therefore, if

Tr(σ⊗nTn) → 0

thenTr(σ⊗n(1 − Tn)) → 1

and we get from the above on taking logarithms,

limsupn−1.log(Tr(ρ⊗n(1− Tn)) ≥ log(Tr(ρsσ1−s))/s

or equivalently writing

α = limsupn−1.log(P1(n))

we getα ≥ s−1Tr(ρsσ1−s)∀s ≤ 0

Taking lims ↑ 0 in this expression gives us

α ≥ −D(σ|ρ) = −Tr(σ(log(σ)− log(ρ)))

This proves the first part. For the second part, we choose Tn in accordance withthe optimal Neyman-Pearson test. Specifically, choose

Tn = exp(nR)ρ⊗n ≥ σ⊗nwhere the real number R will be selected appropriately.

162CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

10.3 Question paper, short test, Statistical sig-

nal processing, M.Tech

[1] Consider the AR(p) process

x[n] = −p

k=1

a[k]x[n− k] + w[n]

where w[n] is white noise with autocorrelation

E(w[n]w[n − k]] = σ2wδ[n− kj

Assume that

A(z) = 1 +

p∑

k=1

a[k]z−k = Πrm=1(1− zmz−1)Πr+lm=r+1(1 − zmz

−1)(1− zmz−1)

where p = r+2l and z1, ..., zr+l are all within the unit circle with z1, ..., zr beingreal and zr+1, ..., zr+l being complex with non-zero imaginary parts. Assumethat the measured process is

y[n] = x[n] + v[n]

where v[.] is white, independent of w[.] and with autocorrelation σ2vδ[n]. Assume

that

σ2w + σ2

vA(z)A(z−1) = B(z)B(z−1)

with

B(z) = KΠpm=1(1 − pmz−1)

where p1, ..., pp are complex numbers within the unit circle. Find the value of Kin terms of σ2

w, σ2v , a[1], ..., a[p]. Determine the transfer function of the optimal

non-causal and causal Wiener filters for estimating the process x[.] based on y[.]and cast the causal Wiener filter in time recursive form. Finally, determine themean square estimation errors for both the filters.

[2] Consider the state model

X[n+ 1] = AX[n] + bw[n+ 1]

and an associated measurement model

z[n] = Cx[n] + v[n]

Given an AR(2)process

x[n] = −a[1]x[n− 1]− a[2]x[n− 2] + w[n]

10.3. QUESTIONPAPER, SHORT TEST, STATISTICAL SIGNAL PROCESSING, M.TECH163

and a measurement model

z[n] = x[n] + v[n]

with w[.] and v[.] being white and mutually uncorrelated processes with vari-ances of σ2

w and σ2v respectively. Cast this time series model in the above state

variable form with A being a 2×2 matrix, X[n] a 2×1 state vector, b a 2×1 vec-tor and C a 1× 2 matrix. Write down explicitly the Kalman filter equations forthis model and prove that in the steady state, ie, as n → ∞, the Kalman filterconverges to the optimal causal Wiener filter. For proving this, you may assumean appropriate factorization of σ2

vA(z)A(z−1) + σ2

w into a minimum phase anda non-minimum phase part where A(z) = 1 + a[1]z−1 + a[2]z−2 is assumed tobe minimum phase. Note that the steady state Kalman filter is obtained bysetting the state estimation error conditional covariance matrices for predictionand filtering, namely P[n+ 1|n], P[n|n] as well as the Kalman gain K[n] equalto constant matrices and thus express the Kalman filter in steady state form as

X[n+ 1|n+ 1] = DX[n|n] + fz[n+ 1]

where D, f are functions of the steady state values of P[n|n] and P[n + 1|n]which satisfy the steady state Riccati equation.

[3] Let Xk, k = 1, 2, ..., N and θ be random vectors defined on the sameprobability space with the conditional distribution of X1, ...,XN given θ beingthat of iid N(θ,R1) random vectors. Assume further that the distribution ofθ is N(µ,R2). Calculate (a) p(X1, ...,XN |θ),(b) p(θ|X1, ...,XN ), (3) the MLestimate of θ given X1, ...,XN and (4) the MAP estimate of θ given X1, ...,XN .Also evaluate

Cov(θML − θ), Cov(θMAP − θ)

Which one has a larger trace ? Explain why this must be so.

[4] Let X1, ...,XN be iid N(µ,R). Evaluate the Fisher information matrixelements

−E[∂2

∂µa∂µbln(p(X1, ...,XN |µ,R)]

−E[∂2

∂µa∂Rbcln(p(X1, ...,XN |µ,R)]

−E[∂2

∂Rab∂Rcdln(p(X1, ...,XN |µ,R)]

Taking the special case when the X′ks are 2-dimensional random vectors, evalu-

ate the Cramer-Rao lower bound for V ar(µa), a = 1, 2 and V ar(Rab), a, b = 1, 2

where µ and R are respectively unbiased estimates of µ and R.

164CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

[5] Let ρ(θ) denote a mixed quantum state in CN dependent upon a classicalparameter vector θ = (θ1, ...θp)

T . Let X,Y be two observables with eigende-compositions

X =

N∑

a=1

|ea > x(a) < ea|, Y =

N∑

a=1

|fa > y(a) < fa|, < ea|eb >=< fa|fb >= δab

First measure X and note its outcome x(a). Then allowing for state collapse,allow the state of the system to evolve from this collapsed state ρc to the newstate

T (ρc) =

m∑

k=1

EkρcE∗k ,

m∑

k=1

E∗kEk = I

Then measure Y in this evolved state and note the outcome y(b). Calculate thejoint probability distribution P (x(a), y(b)|θ). Explain by linearizing about someθ0, how you will calculate the maximum likelihood estimate of θ as a functionof the measurements x(a), y(b). Also explain how you will calculate the CRLBfor θ. Finally, explain how you will select the observables X,Y to be measuredin such a way that the CRLB is minimum when θ is a scalar parameter.

[6] For calculating time recursive updates of parameters in a linear AR model,we require to apply the matrix inversion lemma to a matrix of the form

(λ.R + bbT )−1

while for calculating order recursive updates, we require to use the projectionoperator update formula in the form

Pspan(,ξ) = PW +PP⊥

and also invert a block structured matrices of the form

(

A b

bT c

)

and(

c bT

b A

)

in terms of A−1. Explain these statements in the light of the recurisive leastsquares lattice algorithm for simultaneous time and order recursive updates ofthe prediction filter parameters.

10.4. QUESTIONPAPERON STATISTICAL SIGNAL PROCESSING, LONGTEST165

10.4 Question paper on statistical signal pro-

cessing, long test

10.4.1 Atom excited by a quantum field

[1] Let the state at time t = 0 of an atom be ρA(0) and that of the field be thecoherent state |φ(u) >< φ(u)| so that the state of both the atom and the fieldat time t = 0 is

ρ(0) = ρA(0)⊗ |φ(u) >< φ(u)|Let the Hamiltonian of the atom and field in interaction be

H(t) = HA +∑

k

ω(k)a(k)∗a(k) + δ.V (t)

where HA acts in the atom Hilbert space, HF =∑

k ω(k)a(k)∗a(k) acts in the

field Boson Fock space and the interaction Hamiltonian V (t) acts in the tensorproduct of both the spaces. Assume that V (t) has the form

V (t) =∑

k

(Xk(t)⊗ ak +Xk(t)∗ ⊗ a(k)∗)

where Xk(t) acts in the atom Hilbert space. Note that writing

exp(itHA)Xk(t)exp(−itHA) = Yk(t),

exp(itHF )ak.exp(−itHF ) = ak.exp(−iω(k)t) = ak(t)

we have on defining the unperturbed (ie, in the absence of interaction) systemplus field Hamiltonian

H0 = HA +HF = HA +∑

k

ω(k)a(k)∗a(k)

that

W (t) = exp(itH0)V (t).exp(−itH0) =∑

k

(Yk(t)⊗ ak(t) + Yk(t)∗ ⊗ ak(t)

∗)

show that upto O(δ2) terms, the total system ⊗ field evolution operator U(t)can be expressed as

U(t) = U0(t).S(t)

whereU0(t) = exp(−itH0) = exp(−itHA).exp(−itHF )

is the unperturbed evolution operator and

S(t) = 1− iδ

∫ t

0

W (s)ds− δ2∫

0<s2<s1<t

W (s1)W (s2)ds1ds2

166CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

= 1− iδS1(t)− δ2S2(t)

Show that with this approximation, the state of the system and bath at time Tis given by

ρ(T ) = U0(T )(1− iδS1(T )− δ2S2(T ))ρ(0).(1 + iδS1(T )− δ2S2(T )∗)U0(T )

Evaluate using this expression, the atomic state at time T , as

ρA(T ) = Tr2ρ(T )

Use the fact that ρ(0) = ρA(0)⊗ |φ(u) >< φ(u)| and that

Tr(a(k)|φ(u) >< φ(u)|) =< φ(u)|a(k)|φ(u) >= u(k),

T r(a(k)∗|φ(u) >< φ(u)|) =< φ(u)|a(k)∗|φ(u) >= u(k),

T r(a(k)a(m)|φ(u) >< φ(u)|) =< φ(u)|a(k)a(m)|φ(u) >= u(k)u(m),

T r(a(k)a(m)∗|φ(u) >< φ(u)|) =< φ(u)|a(k)a(m)∗|φ(u) >=< φ(u)|[a(k), a(m)∗]|φ(u) > + < φ(u)|a(m)∗a(k)|φ(u) >

= δ(k,m) + u(k)u(m)

andTr(a(k)∗a(m)|φ(u) >< φ(u)|) =< φ(u)|a(k)∗a(m)|φ(u) >=

= u(k)u(m)

Remark: More generally, we can expand any general initial state of thesystem (atom) and bath (field) as

ρ(0) =

FA(u, u)⊗ |φ(u) >< φ(u)|dudu

=

FA(u, u)exp(−|u|2)⊗ |e(u) >< e(u)|du.du

In this case also, we can calculate assuming that ρ(0) is the initial state, finalstate at time T using the formula

ρ(T ) = U0(T )(1− iδS1(T )− δ2S2(T ))ρ(0).(1 + iδS1(T )− δ2S2(T )∗)U0(T )

by evaluating this using the fact that if Y1, Y2 are system operators, then

(Y1 ⊗ ak)(

F (u, u)⊗ |φ(u) >< φ(u)|dudu)(Y2 ⊗ am)

=

Y1F (u, u)Y2exp(−|u|2)⊗ ak|e(u) >< e(u)|amdudu−−− (1)

10.4. QUESTIONPAPERON STATISTICAL SIGNAL PROCESSING, LONGTEST167

Nowak|e(u) >= u(k)|e(u) >

a∗k|e(u) >= ∂|e(u) > /∂u(k)

so (1) evaluates to after using integration by parts,∫

Y1F1(u, u)Y2u(k)exp(−|u|2)⊗ (∂/∂u(m))(|e(u) >< e(u)|)du.du

=

(−∂/∂u(m))(Y1F (u, u)Y2).exp(−|u|2u(k)))⊗ |e(u) >< e(u)|dudu

=

(−∂/∂u(m) + u(m))(Y1F (u, u)Y2u(k))|φ(u) >< φ(u)|du.du

Likewise,

(Y1 ⊗ a∗k)(

F (u, u)⊗ |φ(u) >< φ(u)|dudu)(Y2 ⊗ am)

=

Y1F (u, u)Y2 ⊗ a∗k|φ(u) >< φ(u)|amdudu

=

Y1F (u, u)Y2exp(−|u|2)⊗ (∂/∂u(k)).(∂/∂u(m))(|e(u) >< e(u)|)du.du

=

(∂2/∂u(k)∂u(m))(Y1F (u, u)Y2.exp(−|u|2))⊗ |e(u) >< e(u)|du.du

=

(exp(|u|2)(∂2/∂u(k)∂u(m))(Y1F (u, u)Y2.exp(−|u|2)))⊗|φ(u) >< φ(u)|du.du

=

(∂/∂u(k) + u(k))(∂/∂u(m)− u(m))(Y1F (u, u)Y2)⊗ |φ(u) < φ(u)|dudu

and likewise terms like

(Y1 ⊗ a∗k)(

F (u, u)⊗ |φ(u) >< φ(u)|dudu)(Y2 ⊗ a∗m)

and

(Y1 ⊗ ak)(

F (u, u)⊗ |φ(u) >< φ(u)|dudu)(Y2 ⊗ a∗m)

We leave it as an exercise to the reader to evaluate these terms and hence deriveupto the second order Dyson series approximation for the evolution operatorthe approximate final state of the atom and field and then by partial tracingover the field variables, to derive the approximate final atomic state under thisinteraction of atom and field (ie, system and bath).

[2] If the state of the system is given by ρ = ρ0 + δ.ρ1, then evaluate theVon-Neumann entropy of ρ as a power series in δ.

Hint:

168CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

10.4.2 Estimating the power spectral density of a station-ary Gaussian process using an ARmodel:Performanceanalysis of the algorithm

The process x[n] has mean zero and autocorrelation R(τ). It’s pth order predic-tion error process is given by

e[n] = x[n] +

p∑

k=1

a[k]x[n− k]

where the a[k]′s are chosen to minimize Ee[n]2. The optimal normal equationsare

E(e[n]x[n−m]) = 0,m = 1, 2, ..., p

and these yield the Yule-Walker equations

p∑

k=1

R[m− k]a[k] = −R[m],m = 1, 2, ..., p

or equivalently in matrix notation,

a = −R−1p rp

whereRp = ((R[m− k]))1≤m,k≤p, rp = ((R[m]))pm=1

The minimum mean square prediction error is then

σ2 = E(e[n]2 = E(e[n]x[n]) = R[0] +

p∑

k=1

a[k]R[k] = R[0] + aT rp

= R[0]− rTpR−1p rp

If the order of the AR model p is large, then e[n] is nearly a white process, infact in the limit p → ∞, e[n] becomes white, ie, E(e[n]x[n −m]) becomes zerofor all m ≥ 1 and hence so does E(e[n]e[n−m]). The ideal known statistics ARspectral estimate is given by

SAR,p(ω) = S(ω) =σ2

|A(ω)|2 ,

A(ω) = 1 + aT e(ω),

e(ω) = ((exp(−jωm)))pm=1

Since in practice, we do not have the exact correlations available with us, weestimate the AR parameters using the least squares method, ie, by minimizing

N∑

n=1

(x[n] + aTxn)2

10.4. QUESTIONPAPERON STATISTICAL SIGNAL PROCESSING, LONGTEST169

wherexn = ((x(n−m)))pm=1

The finite data estimate of Rp is

Rp = N−1N∑

n=1

xnTn

and that of rp is

rp = N−1N∑

n=1

x[n]xn

We write R for Rp and R = R+δR for Rp. Likewise, we write r for rp and r+δr

for rp for notational convenience. Note that R and r are unbiased estimates of Rand r respectively and hence δR and δr have zero mean. Now the least squaresestimate a = a+ δa of a is given by

a = a+ δa = −R−1r = −(R+ δR)−1(r + δr)

= −(R−1 −R−1δRR−1 +R−1δR.R−1δR.R−1)(r + δr)

= a−R−1δr −R−1δRa+R−1δR.R−1δr +R−1δR.R−1δR.a

upto quadratic orders in the data correlation estimate errors. Thus, we canwrite

δa = −R−1δr −R−1δRa+R−1δR.R−1δr +R−1δR.R−1δR.a

Likewise, the estimate of the prediction error energy σ2 is given by

σ2 = σ2 + δσ2 = R[0] + aT r

and henceδσ2 = δR[0] + aT δr + δaT r + δaT δr

= δR[0] + aT δr + (−R−1δr −R−1δRa+R−1δR.R−1δr +R−1δR.R−1δr)T δr

+(−R−1δr −R−1δRa)T δr

again upto quadratic orders in the data correlation estimate errors. In termsof the fourth moments of the process x[n], we can now evaluate the bias andmean square fluctuations of a and σ2 w.r.t their known statistics values a, σ2

respectively. First we observe that the bias of a is

E(δa) = E(−R−1δr −R−1δRa+R−1δR.R−1δr +R−1δR.R−1δR.a)

= E(R−1δR.R−1δr) + E(R−1δR.R−1δR.a)

Specifically, for the ith component of this, we find that with P = R−1, and usingEinstein’s summation convention over repeated indices,

E(δai) =

170CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

E(PijδRjkPkmδrm) + E(PijδRjkPkmδRmlal)

= PijPkmE(δRjkδrm) + PijPkmalE(δRjkδRml)

Now,

E(δRjk.δrm) = N−2N∑

n,l=1

E[(x[n − k]x[n− j]−R[j − k])(x[l]x[l −m]−R[m])]

= N−2N∑

n,l=1

(E(x[n− k]x[n− j]x[l]x[l −m])−R[j − k]R[m])

upto fourth degree terms in the data. Note that here we have not yet made anyassumption that x[n] is a Gaussian process, although we are assuming that theprocess is stationary at least upto fourth order. Writing

E(x[n1]x[n2]x[n3]x[n4]) = C(n2 − n1, n3 − n1, n4 − n1)

we get

E(δRjk.δrm) =

N−2N∑

n,l=1

C(k − j, l − n+ k, l− n+ k −m)

10.4.3 Statistical performance analysis of the MUSIC andESPRIT algorithms

Let

R = R+ δR

be an estimate of the signal correlation matrix R. Let v1, ..., vp denote thesignal eigenvectors and vp+1, ..., vN the noise eigenvectors. The signal subspaceeigen-matrix is

VS = [v1, ..., vp] ∈ CN×p

and the noise subspace eigen-matrix is

VN = [vp+1, ..., vN ]

so that

V = [VS |VN ]

is an N ×N unitary matrix. The signal correlation matrix model is

R = EDE∗ + σ2I

10.4. QUESTIONPAPERON STATISTICAL SIGNAL PROCESSING, LONGTEST171

σ2 is estimated as σ2 which is the average of the N − p smallest eigenvalues ofR. In this expression,

E = [e(ω1), ..., e(ωp)]

is the signal manifold matrix where

e(ω) = [1, exp(jω), ..., exp(j(N − 1)ω)]T

is the steering vector along the frequency ω. Note that since N > p, E has fullcolumn rank p. The signal eigenvalues of R are

λi = µi + σ2, i = 1, 2, ..., p

while the noise eigenvalues are

λi = σ2, i = p+ 1, ..., N

Thus,Rvi = λivi, i = 1, 2, ..., N

The known statistics exact MUSIC spectral estimator is

P (ω) = (1−N−1 ‖ V ∗S e(ω) ‖2)−1

which is infinite when and only when ω ∈ ω1, ..., ωp. Alternately, the zeros ofthe function

Q(ω) = 1/P (ω) = 1−N−1 ‖ V ∗S e(ω) ‖2

are precisely ω1, ..., ωp because, the V ∗Ne(ω) = 0 iff e(ω) belongs to the signal

subspace spane(ωi) : 1 ≤ i ≤ p iff e(ω), e(ωi), i = 1, 2, ..., p is a linearlydependent set iff ω ∈ ω1, ..., ωp, the last statement being a consequence of thefact that N ≥ p+1 and the Van-Der-Monde determinant theorem. Now, owingto finite data effects, vi gets perturbed to vi + δvi for each i = 1, 2, ..., p so thatcorrespondingly, VS gets perturbed to VS = VS + δVS . Then Q gets perturbedto (upto first orders of smallness, ie, upto O(δR)

Q(ω) + δQ(ω) = 1−N−1 ‖ (VS + δVS)∗e(ω) ‖2

so thatδQ(ω) = (−2/N)Re(e(ω)∗δVSV

∗S e(ω))

Let ω be one of the zeros of Q(ω), ie, one of the frequencies of the signal. Then,owing to finite data effects, it gets perturbed to ω + δω where

Q′(ω)δω + δQ(ω) = 0

ie,δω = −δQ(ω)/Q′(ω)

and

Q′(ω) = (−2/N)Re(e(ω)∗PSe′(ω)) = (−2/N)Re < e(ω)|PS |e′(ω) >

172CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

wherePS = VSV

∗S

is the orthogonal projection onto the signal subspace. Note that by first orderperturbation theory

|δvi >=∑

j 6=i

|vj >< vj |δR|vi >λi − λj

, i = 1, 2, ..., N

andδλi =< vi|δR|vi >

SoδVS = column(

j 6=i|vj >< vj |δR|vi > /(λi − λj) : i = 1, 2, ..., p]

= [Q1δR|v1 >, ..., QpδR|vp >]where

Qi =∑

j 6=i|vj >< vj |/(λi − λj)

We can express this as

δVS = [Q1, ..., Qp](Ip ⊗ δR.VS)

and henceδQ(ω) = (−2/N)Re(< e(ω)|δVSV ∗

S |e(ω)) >)= (−2/N)Re(< e(ω)S(Ip ⊗ δR).VSV

∗S |e(ω) >)

= (−2/N)Re(< e(ω)|S(Ip ⊗ δR)PS |e(ω) >)where as before PS = VSV

∗S is the orthogonal projection onto the signal subspace

andS = [Q1, Q2, ..., Qp]

The evaluation of E((δω)2) can therefore be easily done once we evaluate E(δR(i, j)δR(k,m))and this is what we now propose to do.

10.4. QUESTIONPAPERON STATISTICAL SIGNAL PROCESSING, LONGTEST173

Statistical performance analysis of the ESPRIT algorithm. Here, we havetwo correlation matrices R,R1 having the structures

R = EDE∗ + σ2I, R1 = EDΦ∗E∗ + σ2Z

where E has full column rank, D is a square, non-singular positive definitematrix and Φ is a diagonal unitary matrix. The structure of Z is known. Wehave to estimate the diagonal entries of Φ which give us the signal frequencies.Let as before VS denote the matrix formed from the signal eigenvectors of Rand VN that formed from the noise eigenvectors of R. V = [VS |VN ] is unitary.When we form finite data estimates, R is replaced by R = R + δR and R1 byR1 = R1 + δR1. The estimate of σ2 is σ2 = σ2 + δσ2 obtained by averaging theN − p smallest eigenvalues of R. The estimate of RS = EDE∗ is then

RS = RS + δRS = R− σ2I = R− σ2I + δR− δσ2

Thus,δRS = δR− δσ2I

Likewise, the estimate of RS1 = EDΦ∗E∗ is

RS1 = RS1 + δRS1 = R1 − σ2Z = R1 − σ2Z + δR1 − δσ2Z

so thatδRS1 = δR1 − δσ2Z

The rank reducing numbers of the matrix pencil

RS − γRS1 = ED(I − γΦ∗)E∗

are clearly the diagonal entries of Φ. Since R(E) = R(VS), it follows that theserank reducing numbers are the zeroes of the polynomial equation

det(V ∗S (RS − γRS1)VS) = 0

or equivalently ofdet(M − γV ∗

SRS1VS) = 0

whereM = V ∗

SRSVS = diag[µ1, ..., µp]

withµk = λk − σ2, kj = 1, 2, ..., p

being the nonzero eigenvalues of RS . When finite data effects are taken intoconsideration, these become the rank reducing numbers of

det(M + δM − γ(VS + δVS)∗(RS1 + δRS1)(VS + δVS)) = 0

or equivalently, upto first order perturbation theory, the rank reducing numberγ gets perturbed to γ = γ + δγ, where

det(M − γX + δM − γ.δX − δγX) = 0

174CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

where

X = V ∗SRS1VS , δX = δV ∗

SRS1VS + V ∗SRS1δVS + V ∗

S δRS1VS

Let ξ be a right eigenvector of M − γX and η∗ a left eigenvector of the samecorresponding to the zero eigenvalue. Note thatM,X, δM, δX are all Hermitianbut M − γX is not because γ = exp(jωi) is generally not real. Then, let ξ + δξbe a right eigenvector of

M − γX + δM − γδX − δγX

corresponding to the zero eigenvalue. In terms of first order perturbation theory,

0 = (M − γX + δM − γδX − δγX)(ξ + δξ)

= (M − γX)δξ + (δM − γδX)ξ − δγXξ

Premultiplying by η∗ gives us

δγη∗Xξ = η∗(δM − γδX)ξ

or equivalently,

δγ =< η|δM − γδX‖ξ >

< η|X |ξ >which is the desired formula.

10.5 Quantum neural networks for estimatingEEG pdf from speech pdf based on Schrodinger’s

equation

Reference:Paper by Vijay Upreti, Harish Parthasarathy, Vijayant Agarwal andGuguloth Sagar.

Quantum neural networks for estimating the marginal pdf of one signal fromthe marginal pdf of another signal. The idea is that the essential characteristicsof either speech or EEG signals are contained in their empirical pdf’s, ie, theproportion of time that the signal amplitude spends within any Borel subsetof the real line. This hypothesis leads one to believe that by estimating thepdf of the EEG data from that of speech data alone, we can firstly do awaywith the elaborate apparatus required for measuring EEG signals and secondlywe can extract from the EEG pdf, the essential characteristics of the EEG sig-nal parameters which determine the condition of a diseased brain. It is a factthat since the Hamiltonian operator H(t) in the time dependent Schrodingerequation is Hermitian, the resulting evolution operator U(t, s), t > s is unitarycausing thereby the wave function modulus square to integrate to unity at anytime t if at time t = 0 it does integrate to unity. Thus, the Schrodinger equation

10.5. QUANTUMNEURAL NETWORKS FOR ESTIMATING EEGPDF FROM SPEECHPDF BASEDON SCHRODINGER’S EQUATION175

naturally generates a family of pdf’s. Further, based on the assumption that thespeech pdf and the EEG pdf for any patient are correlated in the same way (aparticular kind of brain disease causes a certain distortion in the EEG signal anda correlated distortion in the speech process with the correlation being patientindependent), we include a potential term in Schrodinger’s equation involvingspeech pdf and hence while tracking a given EEG pdf owing to the adaptationalgorithm of the QNN, the QNN also ensures that the output pdf coming fromSchrodinger’s equation will also maintain a certain degree of correlation withthe speech pdf. The basic idea behind the quantum neural network is to mod-ulate the potential by a ”weight process” with the weight increasing wheneverthe desired pdf exceeds the pdf generated by Schrodinger’s equation using themodulus of the wave function squared. A hieuristic proof of why the pdf gener-ated by the Schrodinger equation should increase when the weight process andhence th modulated potential increases is provided at the end of the paper.

First consider the following model for estimating the pdf p(t, x) of one signalvia Schrodinger’s wave equation which naturally generates a continuous familyof pdf’s via the modulus square of the wave function:

ih∂tψ(t, x) = (−h2/2)∂2xψ(t, x) +W (t, x)V (x)ψ(t, x)

The weight process W (t, x) is generated in such a way that |ψ(t, x)|2 tracks thegiven pdf p(t, x). This adaptation scheme is

∂tW (t, x) = −βW (t, x) + α(p(t, x)− |ψ(t, x)|2)

where α, β are positive constants. This latter equation guarantees that ifW (t, x)is very large and positive (negative), then the term −βW (t, x) is very large andnegative (positive) causing W to decrease (increase). In other words, −β.Wis a threshold term that prevents the weight process from getting too large inmagnitude which could cause the break-down of the quantum neural network.We are assuming that V (x) is a non-negative potential. Thus, increase in Wwill cause an increase inWV which as we shall see by an argument given below,will cause a corresponding increase in |ψ|2 and likewise a decrease in W willcause a corresponding decrease in |ψ|2. Now we add another potential termp0(t, x)V0(x) to the above Schrodinger dynamics where p0 is the pdf of theinput signal. This term is added because we have the understanding that theoutput pdf p is in some way correlated with the input pdf. Then the dynamicsbecomes

ih∂tψ(t, x) = (−h2/2)∂2xψ(t, x) +W (t, x)V (x)ψ(t, x) + p0(t, x)V0(x)ψ(t, x)

Statistical performance analysis of the QNN: Suppose p0(t, x) fluctuates top0(t, x)+ δp0(t, x) and p(t, x) to p(t, x)+ δp(t, x). Then the corresponding QNNoutput wave function ψ(t, x) will fluctuate to ψ(t, x)+δψ(t, x) where δψ satisfiesthe linearized Schrodinger equation and likewise the weight processW (t, x) will

176CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

fluctuate to W (t, x) + δW (t, x). The linearized dynamics for these fluctuationsare

ih∂tδψ(t, x) = (−h2/2)∂2xδψ(t, x) + V (x)(Wδψ + ψδW ) + V0(x)(p0δψ + ψδp0)

∂tδW = −βδW + α(δp− ψ.δψ − ψ.δψ)

Writingψ(t, x) = A(t, x).exp(iS(t, x)/h)

we getδψ = (δA+ iAδS/h)exp(iS/h)

so that with prime denoting spatial derivative,

δψ′ = (δA′ + iA′δS/h+ iAδS′/h+ iS′δA/h−AS′δS/h2)exp(iS/h)

δψ′′ = (δA′′+iA′′δS/h+2iA′δS′/h+iAδS′′/h+2iS′δA′/h+iS′′δA/h−AS′δS′/h2

−A′S′δS/h2 −AS′′δS/h2 − AS′δS′/h2 − S′2δA/h2 − iAS

′2δS/h3).exp(iS/h)

Thus we obtain the following differential equation for the wave function fluctu-ations caused by random fluctuations in the EEG and speech pdf’s:

ih∂tδA− ∂t(AδS)− δA.∂tS − AδS∂tS/h

= (−h2/2)δA′′ − ihA′′δS/2− ihA′δS′ − ihA.δS′′/2− ihS′δA′

−ihS′′δA/2+AS′δS′/2+A′S′δS/2+AS′′δS/2+AS′δS′/2+S′2δA/2+iAS

′2δS/2h

+V0(x)p0(t, x)(δA + iAδS/h) + V0(x)Aδp0(t, x)

+V (x)W (t, x)(δA + iAδS/h) + V (x)AδW )

On equating real and imaginary parts in this equation, we get two pde’s for thetwo functions A and S. However, it is easier to derive these equations by firstderiving the unperturbed pde’s for A,S and then forming the perturbations.The unperturbed equations are (See Appendix A.1)

h∂tA = −h∂xA.∂xS − (1/2)A∂2xS

−A∂tS = −((h2/2)∂2xA− (1/2)A(∂xS)2) +WVA+ p0V0A

∂tW (t, x) = −β(W (t, x)−W0(x)) + α(p(t, x)−A2)

Forming the variation of this equation around a fixed operating point p0(x), p(x) =|ψ(t, x)|2 and W (x) with ψ(t, x) = A(x).exp(−iEt+ iS0(x)) gives us

h∂tδA = −hδA′.S′0 − hA′δS′ − (1/2)S′′

0 δA− (1/2)AδS′′,

−A∂tδS + EδA = (−h2/2)δA′′ − (1/2)S′20 δA−AS′

0δS′′

+VWδA+ V AδW + V0Aδp0 + V0p0δA,

∂tδW = −βδW + α.δp− 2αAδA

10.5. QUANTUMNEURAL NETWORKS FOR ESTIMATING EEGPDF FROM SPEECHPDF BASEDON SCHRODINGER’S EQUATION177

Appendix

A.1Proof that increase in potential causes increase in probability density deter-

mined as modulus square of the wave function.Consider Schrodinger’s equation in the form

ih∂tψ(t, x) = (−h2/2)∂2xψ(t, x) + V (x)ψ(t, x)

We writeψ(t, x) = A(t, x).exp(iS(t, x)/h)

where A,S are real functions with A non-negative. Thus, |ψ|2 = A2. Substitut-ing this expression into Schrodinger’s equation and equating real and imaginaryparts gives

h∂tA = −h∂xA.∂xS − (1/2)A∂2xS)

−A∂tS = −((h2/2)∂2xA− (1/2)A(∂xS)2) + V A

We writeS(t, x) = −Et+ S0(x)

so that E is identified with the total energy of the quantum particle. Then, thesecond equation gives on neglecting O(h2) terms,

(∂xS0) = ±√

2(E − V (x))

and then substituting this into the first equation with ∂tA = 0 gives

S′′0 /S

′0 = −2hA′/A

orln(S′

0) = −2h.ln(A) + C

soA = K1(S

′0)

−1/2 = K2(E − V (x))−1/4

which shows that as long as V (x) < E, A(x) and hence A(x)2 = |ψ(x)|2 willincrease with increasing V (x) confirming the fact that by causing the drivingpotential to increase when pE − |ψ|2 is large positive, we can ensure that |ψ|2will increase and viceversa thereby confirming the a validity of our adaptivealgorithm for tracking the EEG pdf.

A.2 Calculating the empirical distribution of a stationary stochastic process.Let x(t) be a stationary stochastic process. The empirical density estimate

of N samples of this process spaced τk, k = 1, 2, ..., n− 1 seconds apart from thefirst sample, ie, an estimate of the joint density of x(t), x(t+τk), k = 1, 2, ..., N−1is given bylog(

pT (x1, ..., xN ; τ1, ..., τN−1) = T−1

∫ T

0

ΠNk=1δ(x(t+ τk)− xk)dt

178CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

where τ0 = 0. The Gartner-Ellis theorem in the theory of large deviations pro-vides a reliability estimate of how this empirical density performs. Specifically,if f is a function of N variables, then the moment generating functional ofpT (., τ1, ..., τN−1) is given by

E[exp(

pT (x1, ..., xN ; τ1, ..., τN−1)f(x1, ..., xN )dx1...dxN )]

= E[exp(T−1

∫ T

0

f(x(t), x(t+ τ1), ..., x(t+ τN−1))dt)]

=MT (f |τ1, ..., τN−1)

say. Then, the rate functional for the family of random fields pT (., τ1, ...τN−1), T →∞ is given by

I(p) = supf (

f(x1, ..., xN )p(x1, ..., xN )dx1...dxN − Λ(f))

where

Λ(f) = limT→∞T−1log(MT (T.f))

assuming that this limit exists. It follows that as T → ∞, the asymptoticprobability of pT taking values in a set E is given by

T−1log(Pr(pT (., τ1, ..., τN−1) ∈ E) ≈ −inf(I(p) : p ∈ E)

or equivalently,

Pr(pT (., τ1, ..., τN−1) = p) ≈ exp(−TI(p)), T → ∞

Appendix A.3We can also use the electroweak theory or more specially, the Dirac-Yang-

Mills non-Abelian gauge theory for matter and gauge fields to track a given pdf.Such an algorithm enables us to make use of elementary particles in a particleaccelerator to simulate probability densities for learning about the statistics ofsignals.

Let Aaµ(x)τa be a non-Abelian gauge field assuming values in a Lie algebrag which is a Lie sub-algebra of u(N). Here, τa, a = 1, 2, ..., N are Hermitiangenerators of g. Denote their structure constants by C(abc), ie,

[τa, τb] = iC(abc)τc

Then the gauge covariant derivative is

∇µ = ∂µ + ieAaµτa

10.5. QUANTUMNEURAL NETWORKS FOR ESTIMATING EEGPDF FROM SPEECHPDF BASEDON SCHRODINGER’S EQUATION179

and the corresponding Yang-Mills antisymmetric gauge field tensor, also calledthe curvature of the gauge covariant derivative is given by

ieFµν = [∇mu,∇ν ] =

= ie(Aaν,µ −Aaµ,ν)− eC(abc)AbµAcν)τa

so thatFµν = F aµντa

whereF aµν = Aaν,µ − Aaµ,ν)− eC(abc)AbµA

The action functional for the matter field ψ(x) having 4N components and thegauge potentials Aaµ(x) is then given by (S.Weinberg, The quantum theory offields, vol.II)

S[A,ψ] =

Ld4x,

L = ψ(γµ(i∂µ + eAaµτa)−m)ψ − (1/4)F aµνFaµν

whereψ = ψ∗γ0

The equations of motion obtained by setting the variation of S w.r.t the matterwave function ψ and the gauge potentials A are given by

[γµ(i∂µ + eAaµτa)−m]ψ = 0,

DνFµν = Jµ

where Dµ is the gauge covariant derivative in the adjoint representation, ie,

[∂ν + ieAν , Fµν ] = Jµ

or equivalently in terms of components,

∂νFaµν − eC(abc)AbνF

cµν = Jaµ

with Jaµ being the Dirac-Yang-Mills current:

Jaµ = −eψ(γµ ⊗ τa)ψ

it is clear that ψ satisfies a Dirac-Yang-mills matter wave equation and it canbe expressed in the form of Dirac’s equation with a self-adjoint Hamiltonianowing to which Jaµ is a conserved current and in particular, the |ψ|2 integratesover space to give a constant at any time which can be set to unity by theinitial condition. In other words, |ψ(x)|2 is a probability density at any timet where x = (t, r). Now it is easy to make the potential in this wave equationadaptive, ie, depend on the error p(x) − |ψ(x)|2. Simply add an interactionterm W (x)(p(x) − |ψ(x)|2)V (x) to to the Hamiltonian. This is more easilyaccomplished using the electroweak theory where there is in addition to the

180CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

Yang-Mills gauge potential terms a photon four potential Bµ with B0(x) =W (x)(p(x) − |ψ(x)|2)V (x). This results in the following wave equation

[γµ(i∂µ + eAaµτa + eBµ)−m]ψ = 0

The problem with this formalism is that a photon Lagrangian term (−1/4)(Bν,µ−Bµ,ν)(B

ν,µ − Bµ,ν) also has to be added to the total Lagrangian which resultsin the Maxwell field equation

F phµν,ν = −eψγµψ

whereF phµν = Bν,µ −Bµ,ν

is the electromagnetic field. In order not to violate these Maxwell equations, wemust add another external charge source term to the rhs so that the Maxwellequation become

F phµν,ν = −eψγµψ + ρ(x)δµ0

ρ(x) is an external charge source term that is given by

−eψ∗ψ + ρ(x) = −∇2(W (x)(p(x) − |ψ(x)|2)V (x)) = −∇2B0

so thatB0 =W (p− |ψ|2)V

The change in the Lagrangian produced by this source term is −ρ(x)B0(x).This control charge term can be adjusted for appropriate adaptation. Thus ineffect our Lagrangian acquires two new terms, the first is

eψγ0B0ψ = eψ∗ψ.B0

and the second is−ρ.B0

whereρ = −∇2(W (p− ψ∗ψ)) + eψ∗ψ

The dynamics of the wave function ψ then acquires two new terms, the firstcoming the variation w.r.t ψ of the term eψ∗ψ.B0 = eψγ0ψ.B0, this term iseB0γ

0ψ, the second is the term coming from the variation w.r.t ψ of the term−ρ.B0 = B0∇2(W (p− ψγ0ψ)). This variation gives

(∇2B0)(Wγ0ψ)

The resulting equations of motion for the wave function get modified to

[γµ(i∂µ + eAaµτa + eBµ)−m+∇2B0.Wγ0]ψ = 0

This equation is to be supplemented with the Maxwell and Yang-Mills gaugefield equations and of course the weight update equation for W (x).

10.6. STATISTICAL SIGNAL PROCESSING,M.TECH, LONGTESTMAXMARKS:50181

10.6 Statistical Signal Processing, M.Tech, Long

TestMax marks:50

[1] [a] Prove that the quantum relative entropy is convex, ie, if (ρi, σi), i = 1, 2are pairs of mixed states in Cd, then

D(p1ρ1 + p2ρ2|p1σ1 + p2σ2) ≤ p1D(ρ1|σ1) + p2D(ρ2|σ2)

for p1, p2 ≥ 0, p1 + p2 = 1.hint: First prove using the following steps that if K is a quantum operation,

ie, TPCP map, then for two states ρ, σ, we have

D(K(ρ)|K(σ)) ≤ D(ρ|σ) −−− (a)

Then define

ρ =

(

p1ρ1 00 p2ρ2

)

,

σ =

(

p1σ1 00 p2σ2

)

,

and K to be the operation given by

K(ρ) = E1ρ.E∗1 + E2ρ.E

∗2

whereE1 = [I|0], E2 = [0|I] ∈ C

d×2d

Note thatE∗

1E1 + E∗2E2 = I2d

To prove (a), make use of the result on the asymptotic error rate in hypothesistesting on tensor product states: Given that the first kind of error, ie, probabilityof detecting ρ⊗n given that the true state is σ⊗n goes to zero, the minimum(Neyman-Pearson test) probability of error of the second kind goes to zero at therate of −D(ρ|σ) (this result is known as quantum Stein’s lemma). Now considerany sequence of tests Tn, ie 0 ≤ Tn ≤ I on (Cd)⊗n. Then consider using thistest for testing between the pair of states K⊗n(ρ⊗n) and K⊗n(σ⊗n). Observethat the optimum performance for this sequential hypothesis testing problem isthe same as that of the dual problem of testing ρ⊗n versus σ⊗n using the dualtest K∗⊗n(Tn). This is because

Tr(K⊗n(ρ⊗n)Tn) = Tr(ρ⊗nK∗⊗n(Tn))

etc. It follows that the optimum Neyman-Pearson error probability of thefirst kind for testing ρ⊗n versus σ⊗n will be much smaller than that of test-ing K⊗n(ρ⊗n) versus K⊗n(σ⊗n) since K∗⊗n(Tn) varies only over a subset oftests as Tn varies over all tests in (Cd)⊗n. Hence going to the limit gives theresult

exp(−nD(ρ|σ)) ≤ exp(−nD(K(ρ)|K(σ))

182CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

asymptotically as n→ ∞ which actually means that

D(ρ|σ) ≥ D(K(ρ)|K(σ))

[b] Prove using [a] that the Von-Neumann entropy H(ρ) = −Tr(ρ.log(ρ)) isa concave function, ie if ρ, σ are two states, then

H(p1ρ+ p2σ) ≥ p1H(ρ) + p2H(σ), p1, p2 ≥ 0, p1 + p2 = 1

hint: Let K be any TPCP map. Let

ρ =

(

p1ρ1 00 p2ρ2

)

∈ C2d×2d,

andσ = I2d/2d

where ρk ∈ Cd×d, k = 1, 2. Then choose

E1 = [I|0], E2 = [0|I] ∈ Cd×2d

andK(X) = E1XE

∗1 + E2XE

∗2 ∈ C

d×d, X ∈ C2d×2d

Note thatE∗

1E1 + E∗2E2 = Id

Show thatK(σ) = K(I2d/2d) = Id/d

Then we get using [a],

D(K(ρ)|K(σ)) ≤ D(ρ|sigma)

Show that this yields

H(p1ρ1 + p2ρ2) + log(d) ≥ p1H(ρ1) + p2H(ρ2)−H(p) + log(2d)

whereH(p) = −p1log(p1)− p2log(p2)

Show thatH(p) ≤ log(2)

and hence deduce the result.

[2] [a] Let X[n], n ≥ 0 be a M -vector valued AR process, ie, it satisfies thefirst order vector difference equation

X[n] = AX[n− 1] +W[n], n ≥ 1

10.6. STATISTICAL SIGNAL PROCESSING,M.TECH, LONGTESTMAXMARKS:50183

Assume that W[n], N ≥ 1 is an iid N(0,Q) process independent of X[0] andthat X[0] is a Gaussian random vector. Calculate the value of

R[0] = E(X[0]X[0]T )

in terms of A and Q so that X[n], n ≥ 0 is wide-sense stationary in the sensethat

E(X[n]X[n−m]T ) = R[m]

does not depend on n. In such a case, evaluate the ACF R[m] of X[n] interms of A and Q. Assume that A is diagonable with all eigenvalues beingstrictly inside the unit circle. Express your answer in terms of the eigenvaluesand eigen-projections of A.

[b] Assuming X[0] known, evaluate the maximum likelihood estimator of Agiven the data X[n] : 1 ≤ n ≤ N. Also evaluate the approximate mean andcovariance of the estimates of the matrix elements of A upto quadratic orders inthe correlations R[m] based on the data X[n], 1 ≤ n ≤ N and upto O(1/N).In other words, calculate using perturbation theory, E(δAij) and E(δAijδAkm)upto O(1/N) where

A−A = δA

where A is the maximum likelihood estimator of A.hint: Write

N−1N∑

n=1

X[n− i]X[n− j]T −R[j − i] = δR(j, i)

and then evaluate δA upto quadratic orders in δR(j, i). Finally, use the factthat since X[n], n ≥ 0 is a Gaussian process,

E(Xa[n1]Xb[n2]Xc[n3]Xd[n4]) = Rab[n1−n2]Rcd[n3−n4]+Rac[n1−n3]Rbd[n2−n4]+Rad[n1−n4]Rbc[n2−n3]

where((Rab(τ)))1≤a,b≤M = R(τ) = E(X[n]X[n− τ ]T )

[c] Calculate the Fisher information matrix for the parameters Aij andhence the corresponding Cramer-Rao lower bound.

[3] Let X[n] be a stationary M -vector valued zero mean Gaussian processwith ACFR(τ) = E(X[n]X[n−τ ]T ) ∈ RM×M . Define its power spectral densitymatrix by

S(ω) =

∞∑

τ=−∞R(τ)exp(−jωτ) ∈ C

M×M

Prove that S(ω) is a positive definite matrix. Define its windowed periodogramestimate by

SN (ω) = XN (ω)(XN (ω))∗

184CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

where

XN(ω) = N−1/2N−1∑

n=0

W(n)X(n)exp(−jωn)

where W(n) is an M ×M matrix valued window function. Calculate

ESN (ω), Cov((SN (ω1))ab, (SN (ω2))cd)

and discuss the limits of these as N → ∞. Express these means and covariancesin terms of

W (ω) =

∞∑

τ=−∞W (τ)exp(−jωτ)

[4] This problem deals with causal Wiener filtering of vector valued sta-tionary processes. Let X(n),S(n), n ∈ Z be two jointly WSS processes withcorrelations

RXX(τ) = E(X(n)X(n − τ)T ),RSX(τ) = E(S(n)X(n − τ)T )

LetSXX(z) =

τ

RXX(τ)z−τ

Prove thatRXX(−τ) = RXX(τ)

T

and henceSXX(z−1) = SXX(z)T

Assume that this power spectral density matrix can be factorized as

SXX(z) = L(z)L(z−1)T

where L(z) and G(z) = L(z)−1 admit expansions of the form

L(z) =∑

n≥0

l[n]z−n, |z| ≥ 1

G(z) =∑

n≥0

g[n]z−n, |z| ≥ 1

with∑

n≥0

‖ l[n] ‖<∞,

n≥0

‖ g[n] ‖<∞

Then suppose

H(z) =∑

n≥0

h[n]z−n

10.6. STATISTICAL SIGNAL PROCESSING,M.TECH, LONGTESTMAXMARKS:50185

is the optimum causal matrix Wiener filter that takes X[.] as input and outputsan estimate S[.] of S[.] in the sense that

S[n] =∑

k≥0

h[k]X[n− k]

with h[k] chosen so that

E[‖ S[n]− S[n] ‖2]

is a minimum, then prove that h[.] satisfies the matrix Wiener-Hopf equation

k≥0

h[k]RXX [m− k] = RSX [m],m ≥ 0

Show that this equation is equivalent to

[H(z)SXX(z)− SSX(z)]+ = 0

Prove that the solution to this equation is given by

H(z) = [SSX(z)L(z−1)−T ]+L(z)−1

= [SSX(z)G(z−1)T ]+G(z)

where[SSX(z)G(z−1)T ]+

= [∑

RSX [n]g[m]T z−n+m]+ =

n≥0

z−n∑

m≥0

RSX [n+m]g[m]T

Prove that this is a valid solution for a stable Causal matrix Wiener filter, it issufficient that

n≥0

‖ RSX [n] ‖<∞

[5] Write short notes on any two of the following:[a] Recursive least squares lattice filter for prediction with time and order

updates of the prediction filter coefficients.[b] Kolmogorov’s formula for the entropy rate of a stationary Gaussian pro-

cess in discrete time in terms of the power spectral density and its relationshipto the infinite order one step prediction error variance.

[c] The Cq Shannon coding theorem and its converse and a derivation of theclassical Shannon noisy coding theorem and its converse from it.

[d] Cramer-Rao lower bound for vector parameters and vector observations.[e] The optimum detector in a binary quantum hypothesis testing problem

between two mixed states.

186CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

10.7 Monotonocity of quantum relative Renyi

entropy

Let A,B be positive definite matrices and let 0 ≤ t ≤ 1, then prove that

(tA+ (1 − t)B)−1 ≤ t.A−1 + (1− t).B−1 −− − (1)

ie, the map A→ A−1 is operator convex on the space of positive matrices.Solution: Proving (1) is equivalent to proving that

A1/2(tA+ (1 − t)B)−1.A1/2 ≤ tI + (1 − t)A1/2B−1A1/2

which is the same as proving that

(tI + (1− t)X)−1 ≤ tI + (1 − t)X−1 −−− (2)

whereX = A−1/2BA−1/2

It is enough to prove (2) for any positive matrix X . By the spectral theoremfor positive matrices, to prove (2), it is enough to prove

(t+ (1− t)x)−1 ≤ t+ (1− t)/x−−− (3)

for all positive real numbers x. Proving (3) is equivalent to proving that

(t+ (1− t)x)(tx + 1− t) ≥ x

for all x > 0, or equivalently,

(t2 + (1− t)2)x+ t(1− t)x2 + t(1− t) ≥ x

or equivalently, adding and subtracting 2t(1− t)x on both sides to complete thesquare,

−2t(1− t)x + t(1− t)x2 + t(1− t) ≥ 0

or equivalently,1 + x2 − 2x ≥ 0

which is true.

Problem: For p ∈ (0, 1), show that x → −xp, x → xp−1 and x → xp+1 areoperator convex on the space of positive matrices.

Solution: Consider the integral

I(a, x) =

∫ ∞

0

ua−1du/(u+ x)

It is clear that this integral is convergent when x ≥ 0 and 0 < a < 1. Now

I(a, x) = x−1

∫ ∞

0

ua−1(1 + u/x)−1du

10.7. MONOTONOCITYOFQUANTUMRELATIVE RENYI ENTROPY187

= xa−1

∫ ∞

0

ya−1(1 + y)−1dy

on making the change of variables u = xy. Again make the change of variables

1− 1/(1 + y) = y/(1 + y) = z, y = 1/(1− z)− 1 = z/(1− z)

Then

I(a, x) = xa−1

∫ 1

0

za−1(1−z)−a+1(1−z)(1−z)−2dz = xa−1

∫ 1

0

za−1(1−z)−adz

= xa−1β(a, 1− a) = x1−aΓ(a)Γ(1− a) = xa−1π/sin(πa)

Thus,

xa−1 = (sin(πa)/π)I(a, x) = (sin(πa)/π)

∫ ∞

0

ua−1du/(u+ x)

and since for u > 0, as we just saw, x→ (u+x)−1 is operator convex, the proofthat f(x) = xa−1 is operator convex when a ∈ (0, 1). Now, Now,

ua−1x/(u+ x) = ua−1(1 − u/(u+ x)) = ua−1 − ua/(u+ x)

Integrating from u = 0 to u = ∞ and making use of the previous result gives

xa = (sin(πa)/π)

∫ ∞

0

(ua−1 − ua/(u+ x))du

from which it immediately becomes clear that x → −xa is operator convex onthe space of positive matrices. Now consider xa+1. We have

ua−1x2/(u+ x) = ua−1((u+ x)2 − u2 − 2ux)/(u+ x) =

ua−1(u+x)−ua+1/(u+x)−2uax/(u+x) = ua−1(u+x)−ua+1/(u+x)−2ua(1−u/(u+x))and integration thus gives

xa+1 = (sin(πa)/π)

∫ ∞

0

(ua−1x+ ua+1/(u+ x)− ua)du

from which it becomes immediately clear that xa+1 is operator convex.

Monotonicity of Renyi entropy: Let s ∈ R. For two positive matrices A,B,consider the function

Ds(A|B) = log(Tr(A1−sBs))

Our aim is to show that for s ∈ (0, 1) and a TPCP map K, we have for twostates A,B,

Ds(K(A)|K(B)) ≥ Ds(A|B)

188CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

while for s ∈ (−1, 0), we have

Ds(K(A)|K(B)) ≤ Ds(A|B)

For a ∈ (0, 1), and x > 0 we’ve seen that

xa+1 = sin(πa)/π)

∫ ∞

0

(ua−1x+ ua+1/(u+ x)− ua)du

and hence for s ∈ (−1, 0), replacing a by −s gives

x1−s = −(sin(πs)/π)

∫ ∞

0

(ua−1x+ ua+1/(u+ x)− ua)du

Let s ∈ (−1, 0). Then the above formula shows that x1−s is operator convex.Hence, if A,B are any two density matrices and K a TPCP map, then

K(A)1−s ≤ K(A1−s)

whenceBs/2K(A)1−sBs/2 ≤ Bs/2K(A1−s)Bs/2

for any positive B and taking taking trace gives

Tr(K(A)1−sBs) ≤ Tr(K(A1−s)Bs)

Replacing B by K(B) here gives

Tr(K(A)1−sK(B)s) ≤ Tr(K(A1−s)K(B)s)

Further since s ∈ (−1, 0) it follows that xs is operator convex. Hence

K(B)s ≤ K(Bs)

and soA1/2K(B)sA1/2 ≤ A1/2K(Bs)A1/2

for any positive A and replacing A by K(A1−s) and then taking trace gives

Tr(K(A1−s)K(B)s) ≤ Tr(K(A1−s)K(Bs))

More generally, we have the following:If X is any matrix, and s ∈ (0, 1) and A,B positive, and K any TPCP map

Tr(X∗K(A)1−sXK(B)s) ≥ Tr(X∗K(A)1−sXK(Bs))

(sinceK(B)s ≥ K(Bs) becauseB → Bs is operator concave and sinceK(A)1−sXis positive)

= Tr(K(A)1−sXK(Bs)X∗) ≥ Tr(K(A1−s)XK(Bs)X∗)

10.7. MONOTONOCITYOFQUANTUMRELATIVE RENYI ENTROPY189

(since K(A)1−s ≥ K(A1−s) because A → A1−s is operator concave and sinceXK(Bs)X∗ is positive)

= Tr(X∗K(A1−s)XK(Bs))

Since X is an arbitary complex square matrix, this equation immediately canbe written as

K(A)1−s ⊗K(B)s ≥ K(A1−s)⊗K(Bs)

for all positive A,B and s ∈ (0, 1). We can express this equation as

K(A)1−s ⊗K(B)s ≥ (K ⊗K)(A1−s ⊗Bs)

and hence if I denotes the identity matrix of the same size as A,B, then

Tr(K(A)1−sK(B)s) ≥ Tr((A1−s ⊗Bs)(K ⊗K)(V ec(I).V ec(I)∗))

Note that if ek : k = 1, 2, ..., n is an onb for Cn, the Hilbert space on whichA,B act, then

I =∑

k

|ek >< ek|, V ec(I) =∑

k

|ek ⊗ ek >= |ξ >

V ec(I).V ec(I)∗ = |ξ >< ξ| =∑

ij

Eij ⊗ Eij

whereEij = |ei >< ej |

Thus, for s ∈ (0, 1),

Tr(K(A)1−sK(B)s) ≥∑

i,j

Tr(KijA1−sKjiB

s)

whereKij = K(Eij)

Likewise, if s ∈ (0, 1) so that s− 1 ∈ (−1, 0), 2− s ∈ (1, 2), then both the mapsA → As−1 and A → A2−s are operator convex and hence for any TPCP mapK , K(A)s−1 ≤ K(A)s−1,K(A)2−s ≤ K(A2−s), from which we immediatelydeduce that

Tr(X∗K(A)2−sXK(B)s−1) ≤ Tr(X∗K(A2−s)XK(Bs−1))

or equivalently,

K(A)2−s ⊗K(B)s−1 ≤ K(A2−s)⊗K(Bs−1)

which result in the inequality

Tr(K(A)2−sK(B)s−1) ≤∑

i,j

Tr(KijA2−sKjiB

s−1)

190CHAPTER 10. LECTURE PLAN FOR STATISTICAL SIGNAL PROCESSING

Remark: Lieb’s inequality states that for s ∈ (0, 1), the map

(A,B) → A1−t ⊗Bt = ft(A,B)

is joint operator concave, ie, if A1, A2, B1, B2 are positive and p, q > 0, p+ q = 1then with

A = pA1 + qA2, B = pB1 + qB2

we haveft(A,B) ≥ pft(A1, B1) + qft(A2, B2)

From this, we get the result that

(A,B) → Tr(X∗A1−tXBt) = Ht(A,B|X)

is jointly concave. Now let K denote the pinching

K(A) = (A+ UAU∗)/2

where U is a unitary matrix. Then, we get

Ht(K(A),K(B)|X) ≥ (1/2)Ht(A,B|X) + (1/2)Ht(UAU∗, UBU∗|X)

and replacing X by I in this inequality gives

Tr(K(A)1−tK(B)t) ≥ Tr(A1−tBt)

which proves the monotonicity of the Renyi entropy when t ∈ (0, 1) underpinching operations K of the above kind. Now, this inequality will also hold forany pinching K since any pinching can be expressed as a composition of suchelementary pinchings of the above kind. We now also wish to show the reverseinequality when t ∈ (−1, 0), ie show that if t ∈ (0, 1), then

Tr(K(A)2−tK(B)t−1) ≤ Tr(A2−tBt−1)

Chapter 11

My commentary on theAshtavakra Gita

A couple of years ago, my colleague and friend presented me with a verse toverse translation of the celebrated Ashtavakra Gita in English along with acommentary on each verse. I was struck by the grandeur and the simplicity ofthe Sanskrit used by Ashtavakra for explaining to King Janaka, the true meaningof self-realization, how to attain it immediately without any difficulty and alsoKing Janaka’s reply to Ashtavakra about how he has attained immeasurablejoy and peace by understanding this profound truth. Ashtavakra was a sagewho lived during the time in which the incidents of the Raamayana took placeand hence his Gita dates much much before even the Bhagavad Gita spokenby Lord Krisnna to Arjuna was written down by Sage Vyasa through Ganapatithe scribe. One can see in fact many of the slokas in the Bhagavad Gita havebeen adapted from the Ashtavakra Gita and hence I feel that it is imperativefor any seeker of the absolute truth and everlasting peace, joy and tranquilityto read the Ashtavakra Gita, so relevant in our modern times where there areso many disturbances. Unlike the Bhagavad Gita, Ashtavkra does not bringin the concept of a personal God to explain true knowledge and the processof self-realization. He simply says that you attain self-realization immediatelyat this moment when you understand that the physical universe that you seearound you is illusory and the fact that your soul is the entire universe. Everyliving being’s soul is the same and it fills the entire universe and that all thephysical events that take place within our universe as perceived by our sensesare false, ie, they do not occur. In the language of quantum mechanics, theseevents occur only because of the presence of the observer, namely the livingindividuals with their senses. The absolute truth is that each individual is asoul and not a body with his/her senses and this soul neither acts nor causes toact. Knowledge is obtained immediately at the moment when we understand thetruth that the impressions we have from our physical experiences are illusory,ie false and that the truth is that our soul fills the entire universe and that this

191

192 CHAPTER 11. MY COMMENTARY ON THE ASHTAVAKRA GITA

soul is indestructible and imperishable as even the Bhagavad Gita states. TheBhagavad Gita however distinguishes between two kinds of souls, the jivatmaand the paramatma, the former belonging to the living entities in our universeand the latter to the supreme personality of Godhead. The Bhagavad Gitatells us that within each individual entity, both the jivatma and the paramatmareside and when the individual becomes arrogant, he believes that his jivatma issuperior to the paramatma and this then results in self conflict owing to whichthe individual suffers. By serving the paramatma with devotion as outlined inthe chapter on Bhakti yoga, the jivatma is able to befriend the paramatma andthen the individual lives in peace and tranquility. To explain self-knowledgemore clearlh without bringing in the notion of a God, Ashtavakra tells RajaJanaka that there are two persons who are doing exactly the same kind of workevery day both in their offices as well as in their family home. However, oneof them is very happy and tranquil while the other is depressed and is alwayssuffering. What is the reason for this ? The reason, Ashtavakra says is that thesecond person believes that his self is doing these works and consequently thereactions generated by his works affect him and make him either very happyor very sad. He is confused and does not know how to escape from this knotof physical experiences. The first person, on the other hand, knows very wellthat he is compelled to act in the way he is acting owing to effects of his pastlife’s karma’s and that in fact his soul is pure and has nothing at all to do withthe results and effects of his actions. He therefore neither rejoices nor does heget depressed. The physical experiences that he goes through do not leave anyimprints on his self because he knows that the physical experiences are illusoryand that his soul which pure and without any blemish fills the entire universeand is the truth. Ashtavarka further clarifies his point by saying that the personin ignorance will practice meditation and try to cultivate detachment from theobjects of this world thinking that this will give him peace. Momentarily sucha person will get joy but when he comes of the meditation, immediately allthe desires of this illusory physical world will come back to him causing evenmore misery. He will try to run to a secluded place like a forest and meditateto forget about his experiences, but again when his meditation is over, miserywill come back to him. However, once he realizes that he is actually not hisbody but his soul and that the soul is free of any physical quality and thatit is the immeasurable truth, then he will neither try forcefully to cultivatedetachment nor will he try forcefully to acquire possessions. Both the extremesof attachment and detachment will not affect him. In short, he attains peaceeven living in this world immediately at this moment when true knowledgecomes to him. According to Ashtavkara, the seeker of attachement as well asthe seeker of detachment are both unhappy because they have not realized thatthe mind must not be forced to perform an activity. Knowledge is alreadypresent within the individual, he just has to remain in his worldly position asit is, performing his duties with the full knowledge that this compulsion is dueto his past karmas and then simultaneously be conscious of the fact that hissoul is the perfect Brahman and it pervades the entire universe. At one point,Ashtavakra tells Janaka that let even Hari (Vishnu) or Hara (Shiva) be your

193

guru, even then you cannot be happy unless you have realized at this momentthe falsity of the physical world around you and the reality of your soul asthat which pervades the entire universe. Such a person who has understoodthis, will according to Ashtavakra, be unmoved if either if death in some formapproaches him or if temptations of various kinds appear before him. He willremain neutral to both of these extremes. Such a realized soul has eliminated allthe thought processes that occur within his mind. When true knowledge dawnsupon a person, his mind will become empty and like a dry leaf in a strong wind,he will appear to be blown from one place to other doing different kinds ofkarma but always with the full understanding that these karmas have nothingat all to do with his soul which is ever at bliss, imperturbable, imperishableand omnisicient. In the quantum theory of matter and fields, we learn abouthow the observer perturbs a system in order to measure it and gain knowledgeabout the system. When the system is thus perturbed, one measures not thetrue system but only the perturbed system. This is the celebrated Heisenberguncertainty principle. Likewise, in Ashtavakra’s philosophy, if we attempt togain true knowledge about the universe and its living entities from our sensoryexperience, we are in effect perturbing the universe and hence we would gainonly perturbed knowledge, not true knowledge. True knowledge comes onlywith the instant realization that we are not these bodies, we are all permanentsouls. Even from a scientific standpoint, as we grow older, all our body cellseventually get replaced by new ones, then what it is that in our new body afterten years that we are able to retain our identity ? What scientific theory ofthe DNA or RNA can explain this fact ? What is the difference between thechemical composition of our body just before it dies and just after it dies ?Absolutely no difference. Then what is it that disinguishes a dead body froma living body ? That something which distinguishes the two must be the soulwhich is not composed of any chemical. It must be an energy, the life forcewhich our modern science is incapable of explaining.

194 CHAPTER 11. MY COMMENTARY ON THE ASHTAVAKRA GITA

Chapter 12

LDP-UKF based controller

State equations:x(n+ 1) = f(x(n), u(n)) + w(n+ 1)

Measurement equations:

z(n) = h(x(n), u(n)) + v(n)

w, v are independent white Gaussian processes with mean zero and covariancesQ,R respectively. xd(n) is the desired process to be tracked and it satisfies

xd(n+ 1) = f(xd(n), u(n))

UKF equations: Let ξ(k) : 1 ≤ k ≤ K and η(k) : 1 ≤ k ≤ K be independentiid N(0, I) random vectors. The UKF equations are

x(n+ 1|n) = K−1K∑

k=1

f(x(n|n) +√

P (n|n)ξ(k), u(n))

x(n+ 1|n+ 1) = x(n+ 1|n) +K(n)(z(n+ 1)− z(n+ 1|n))where

K(n+ 1) = ΣXY Σ−1Y Y

Where

ΣXY = K−1K∑

k=1

[√

P (n+ 1|n)ξ(k)(h(x(n+1|n)+√

P (n+ 1|n)η(k))−z(n+1|n))T ],

z(n+ 1|n) = K−1K∑

k=1

h(x(n+ 1|n) +√

P (n+ 1|n)η(k), u(n+ 1))

ΣY Y = K−1.

K∑

k=1

(h(x(n+ 1|n) +√

P (n+ 1|n)η(k))(, , )T

195

196 CHAPTER 12. LDP-UKF BASED CONTROLLER

P (n+1|n) = K−1K∑

k=1

[(f(x(n|n) +√

P (n|n)ξ(k), u(n))− x(n+1|n))(, , )T ] +Q

P (n+ 1|n+ 1) = ΣXX − ΣXY .Σ−1Y Y Σ

TXY

whereΣXX = P (n+ 1|n)

Tracking error feedback controller: Let

f(n) = xd(n)− x(n|n), f(n+ 1|n) = xd(n+ 1)− x(n+ 1|n)

Controlled dynamics:

x(n+ 1) = f(x(n), u(n)) + w(n+ 1) +Kcf(n)

Aim: To choose the controller coefficient Kc so that the probability

P (max1≤n≤N (|e(n)|2 + |f(n)|2) > ǫ)

is minimized where

e(n) = x(n)− x(n), e(n+ 1|n) = x(n+ 1)− x(n+ 1|n), x(n) = x(n|n)

Linearized difference equations for e(n), f(n): (Approximate)

e(n+1|n) = x(n+1)−x(n+1|n) = f(x(n), u(n))+w(n+1)+Kcf(n+1|n)−K−1.∑

k

f(x(n)+√

P (n|n)ξ(k), u(n))

= f(x(n), u(n))−f(x(n), u(n))−f ′(x(n), u(n))√

P (n|n)K−1∑

k

ξ(k)+w(n+1)+Kcf(n)

= f ′(x(n), u(n))e(n)−f ′(x(n), u(n))√

P (n|n)K−1∑

k

ξ(k)+w(n+1)+Kcf(n)−−−(1)

f(n+1|n) = xd(n+1)−x(n+1|n) = f(xd(n), u(n))−K−1.∑

k

f(x(n)+√

P (n|n)ξ(k), u(n))

= f ′(x(n), u(n))f(n)− f ′(x(n), u(n))√

P (n|n)K−1∑

k

ξ(k)−−− (2)

e(n+ 1) = x(n+ 1)− x(n+ 1|n+ 1) =

e(n+ 1|n) +Kcf(n+ 1|n)−K(n)(z(n+ 1)− z(n+ 1|n))

= e(n+1|n)+Kcf(n+1|n)−K(n)(h(x(n+1), u(n+1))+v(n+1)−K−1∑

k

h(x(n+1|n)+√

P (n+ 1|n)η(k), u(n+1)))

= e(n+1|n)+Kcf(n+1|n)−K(n)(h′(x(n+1|n), u(n+1))e(n+1|n)+v(n+1)

−h′(x(n+ 1|n), u(n+ 1)).√

P (n+ 1|n).K−1.∑

k

η(k))

= (I −K(n)h′(x(n+ 1|n), u(n+ 1)))e(n+ 1|n) +Kcf(n)−K(n)v(n+ 1)

197

+K(n)h′(x(n+ 1|n), u(n+ 1)).√

P (n+ 1|n)K−1∑

k

η(k)−−− (3)

Finally,

f(n+ 1) = xd(n+ 1)− x(n+ 1|n+ 1) =

f(n+ 1|n)−K(n)(z(n+ 1)− z(n+ 1|n))

= f(n+1|n)−K(n)(h(x(n+1), u(n+1))+v(n+1)−K−1∑

k

h(x(n+1|n)+√

P (n+ 1|n)η(k), u(n+1))

= f(n+ 1|n)−K(n)(h′(x(n+ 1|n), u(n+ 1))e(n+ 1|n)

−h′(x(n+ 1|n), u(n+ 1)).√

P (n+ 1|n)K−1∑

k

η(k))−K(n)v(n+ 1)

= f(n+ 1|n)−K(n)h′(x(n+ 1|n), u(n+ 1)))e(n+ 1|n)

+K(n)h′(x(n+1|n), u(n+1)).√

P (n+ 1|n).K−1.∑

k

η(k)−K(n)v(n+1)−−−(4)

Substituting for e(n+1|n) and f(n+1|n) from (1) and (2) respectively into (3)and (4) results in the following recursion for e(n), f(n):

e(n+ 1) =

f ′(x(n), u(n))f(n)−f ′(x(n), u(n))√

P (n|n)K−1∑

k

ξ(k)−K(n)h′(x(n+1|n), u(n+1)))(f ′(x(n), u(n))e(n)

−f ′(x(n), u(n))√

P (n|n)K−1∑

k

ξ(k) + w(n+ 1) +Kcf(n))

+K(n)h′(x(n+1|n), u(n+1)).√

P (n+ 1|n).K−1.∑

k

η(k)−K(n)v(n+1)−−−(5)

and

f(n+ 1) =

f ′(x(n), u(n))f(n)− f ′(x(n), u(n))√

P (n|n)K−1∑

k

ξ(k)

−K(n)h′(x(n+ 1|n), u(n+ 1))).

.(f ′(x(n), u(n))e(n)−f ′(x(n), u(n))√

P (n|n)K−1∑

k

ξ(k)+w(n+1)+Kcf(n))

+K(n)h′(x(n+1|n), u(n+1)).√

P (n+ 1|n).K−1.∑

k

η(k)−K(n)v(n+1)−−−(6)

If we regard the error processes e(n), f(n) and the noise processes w(n),v(n),

ξ1(n) = K−1.∑Kk=1 ξ(k) and η1(n) = K−1

k η(k) to be all of the first order ofsmallness in perturbation theory, then to the same order, we can replace in (4)and (5), x(n) appearing on the rhs by xd(n) and then the error thus incurred

198 CHAPTER 12. LDP-UKF BASED CONTROLLER

will be only of the second order of smallness. Doing so, we can express (4) and(5) in the form after equilibrium has been obtained as

e(n+1) = A11e(n)+(A12+A18Kc)f(n)+A13w(n+1)+A14A15v(n+1)+A16ξ1(n+1)+A17η1(n+1)

f(n+1) = A21e(n)+(A22+A28Kc)f(n)+A23w(n+1)+A24A25v(n+1)+A26ξ1(n+1)+A27η1(n+1)

Define the aggregate noise process

W [n] = [w(n)T , v(n)T , ξ1(n)T , η1(n)

T ]T

It is a zero mean vector valued white Gaussian process and its covariance matrixis block diagonal:

E(W (n)W (m)T ) = RW (n)δ(n−m)

where

RW (n) = E(W (n)W (n)T ) = diag[Q,R, P [n|n)/K, P (n+ 1|n)/K]

Let P denote the limiting equilibrium value of P [n|n] as n → ∞. The limitingequilibrium value of P [n+ 1|n] is then given approximately by

P+ = f ′(x, u)Pf ′(x, u)T +Q

where x may be replaced by the limiting value of xd, assuming it to be asymp-totically d.c and f ′(x, u) is the Jacobian matrix of f(x, u) w.r.t x. We writef ′(xd, u) = F and then

P+ = FPFT +Q

If we desire to use a more accurate value of P+, then we must replace it by

P+ = Cov(f(xd + P.ξ1)) +Q

where ξ1 is an N(0,K−1P ) random vector. Now, let RW denote the limitingequilibrium value of RW (n), ie,

RW = diag[Q,R, P/K,P+/K]

Then, the above equations for

χ[n] = [e[n]T , f [n]T ]T

can be expressed as

χ[n+ 1] = (G0 +G1KcG2)χ[n] +G3W [n+ 1]

and its solution is given by

χ[n] =

n−1∑

k=0

(G0 +G1KcG2)n−1−kG3W [k + 1]

199

This process is zero mean Gaussian with an autocorrelation given by

E(χ[n]χ[m]T ) =

min(n,m)−1∑

k=0

(G0 +G1KcG2)n−1−k.RW .(G0 +G1KcG2)

Tm−1−k

= Rχ[n,m|Kc]

say. The LDP rate function of the process χ[.] over the time interval [0, N ] isthen given by

I[χ[n] : 0 ≤ n ≤ N ] = (1/2)

N∑

n,m=0

χ[n]TQχ[n,m|Kc]χ[m]

whereQχ = ((Rχ[n,m|Kc]))

−10≤n,m≤N

According to the LDP for Gaussian processes, the probability that∑N

n=0 ‖χ[n] ‖2 will exceed a threshold δ after we scale χ by a small number

√ǫ| is given

approximately by

exp(−ǫ−1min(I(χ) :

N∑

n=0

‖ χ[n] ‖2> δ))

and in order to minimize this deviation probability, we must choose Kc so thatthe maximum eigenvalue λmax(Kc) of ((Rχ[n,m|Kc]))

−10≤n,m≤N is a minimum.

200 CHAPTER 12. LDP-UKF BASED CONTROLLER

Chapter 13

Abstracts for a book onquantum signal processingusing quantum field theory

Chapter 1

An introduction to quantum electrodynamics

[1] The Maxwell and Dirac equations.

[2] Quantization of the electromagnetic field using Bosonic creation and an-nihilation operators.

[3] Second Quantization of the Dirac field using Fermionic creation and an-nihilation operators.

[4] The interaction term between the Dirac field and the Maxwell field.

[5] Propagators for the photons and electron-positron fields.

[7] The Dyson series and its application to calculation of amplitudes forabsorption, emission and scattering processes between electrons and positrons.

[8] Representing the terms in the Dyson series for interactions between pho-tons, electrons and positrons using Feynman diagrams.

[9] The role of propagators in Feynman diagrammatic calculations.

[10] Quantum electrodynamics using the Feynman path integral method.

[11] Calculating the propagator of fields using Feynman path integrals.

[12] Electrons self energy, vacuum polarization, Compton scattering andanomalous magnetic moment of the electron.

Chapter 2

The design of quantum gates using quantum field theory

[1] Design of quantum gates using first quantized quantum mechanics bycontrolled modulation of potentials.

[2] Design of quantum gates using harmonic oscillators with charges per-turbed by and electric field.

201

202CHAPTER 13. ABSTRACTS FORABOOKONQUANTUM SIGNAL PROCESSINGUSING QUANTUM FIELD THEORY

[3] Design of quantum gates using 3-D harmonic oscillators perturbed bycontrol electric and magnetic field.

[4] Representing the Klein-Gordon Hamiltonian with perturbation usingtruncated spatial Fourier modes within a box.

[5] Design of the perturbation potential in Klein-Gordon quantization forrealizing very large sized quantum gates.

[6] Representing the Hamiltonian of the Dirac field plus electromagnetic fieldin terms of photon and electron-positron creation and annihilation operators.

[7] Design of control classical electromagnetic fields and control classicalcurrent sources in quantum electrodynamics for approximating very large sizedquantum unitary gates.

[8] The statistical performance of quantum gate design methods in the pres-ence of classical noise.

Chapter 3

The quantum Boltzmann equation with applications to the design of quan-tum noisy gates, ie, TPCP channels.

[1] The classical Boltzmann equation for a plasma of charged particles inter-acting with an external electromagnetic field.

[2] Calculating the conductivity and permeability of a plasma interactingwith an electromagnetic field using first order perturbation theory.

[3] The quantum Boltzmann equation for the one particle density opera-tor for a system of N identically charged particles interacting with a classicalelectromagnetic field. The effect of quantum charge and current densities onthe dynamics of the Maxwell field and the effect of the Maxwell field on thedynamics of the one particle density operator.

[4] The quantum nonlinear channel generated by the evolving the one particledensity operator under an external field in the quantum Boltzmann equation.

Chapter 4

The Hudson-Parthasarathy quantum stochastic calculus with application toquantum gate and quantum TPCP channel design.

[1] Boson and Fermion Fock spaces.[2] Creation, annihilation and conservation operator fields in Boson and

Fermion Fock spaces.[3] Creation, annihilation and conservation processes in Boson Fock space.[4] The Hudson-Parthasarathy quantum Ito formulae.[5] Hudson-Parthasarathy-Schrodinger equation for describing the evolution

of a quantum system in the presence of bath noise.[6] The GKSL (Lindlbad) master equation for open quantum systems.

[7] Design of the Lindlbad operators for an open quantum system for gener-ation of a given TPCP map, ie, design of noisy quantum channels.

[8] Schrodinger’s equation with Lindlbad noisy operators in the presence ofan external electromagnetic field. Design of the electromagnetic field so thatthe evolution matches a given TPCP map in the least squares sense.

203

[9] Quantum Boltzmann equation in the presence of an external electromag-netic field with Lindblad noise terms. Design of the Lindblad operators andthe electromagnetic field so that the evolution matches a given nonlinear TPCPmap.

Chapter 5

Non-Abelian matter and gauge fields applied to the design of quantum gates.[1] Lie algebra of a Lie group.[2] Gauge covariant derivatives.[3] The non-Abelian matter and gauge field Lagrangian that is invariant

under local gauge transformations.[4] Application of non-Abelian matter and gauge field theory to the design

of large sized quantum gates in the first quantized formalism, gauge fields areassumed to be classical and the control fields and current are also assumed tobe classical.

Chapter 6

The performance of quantum gates in the presence of quantum noise.

Quantum plasma physics: For an N particle system with identitcal particles

iρ,t = [∑

k

Hk, ρ] + [∑

i<j

Vij , ρ] + Lindlbadnoiseterms.

whereHk = −(∇k − ieA(t, rk))

2/2m+ eΦ(t, rk)

= H0k + Vk

where

H0k = −∇2k/2m,Vk = (ie/2m)(2.divAk + (Ak,∇k)) + e2A2

k/2m+ eΦk

Approximate one particle Boltzmann equation obtained by partial tracing overthe remaining N particles.

iρ1,t = [H1, ρ1] + (N − 1)Tr2[V12, ρ12] + Lindbladnoiseterms

iρ12,t = [H1+H2+V12, ρ12]+(N −2)Tr3[V13+V23, ρ123]+Lindbladnoiseterms

Writingρ12 = ρ1 ⊗ ρ1 + g12

we get from the above two equations,

i(ρ1,t ⊗ ρ1 + ρ1 ⊗ ρ1,t) + ig12,t =

[H1, ρ1]⊗ ρ1 + ρ1 ⊗ [H1, ρ1] + ig12,t =

[H1, ρ1]⊗ ρ1 + ρ1 ⊗ [H1, ρ1] + [H1 +H2 + V12, g12]

204CHAPTER 13. ABSTRACTS FORABOOKONQUANTUM SIGNAL PROCESSINGUSING QUANTUM FIELD THEORY

+Lindbladnoise

Here, we have neglected the cubic terms, ie, the terms involving ρ123. Aftermaking the appropriate cancellations, we get

ig12,t = [H1 +H2 + V12, g12] + Lindbladnoise

Thus we get for the one particle density,

iρ1,t = [H1, ρ1] + (N − 1)Tr2[V12, ρ1 ⊗ ρ1] + (N − 1)Tr2[V12, g12]

+Lindbladnoise

We can express this approximate equation as

iρ1,t = [H1, ρ1] + (N − 1)Tr2[V12, ρ1 ⊗ ρ1]

+(N − 1)Tr2[V12, exp(−itad(H1 +H2 + V12))(g12(0))]

Since the nonlinear term is small, we can approximate the ρ1(t) occurring in itby exp(−itad(H1))(ρ1(0)). Thus,

iρ1,t(t) = [H1, ρ1(t)] + (N − 1)Tr2[V12, exp(−it.ad(H1 +H2))(ρ1(0)⊗ ρ1(0))]

+(N − 1)Tr2[V12, exp(−itad(H1 +H2 + V12))(g12(0))] + θ(ρ1(t))

where θ(ρ1) is the Lindblad noise term. The solution is

ρ1(t) = exp(−itad(H1))(ρ1(0))+

(N−1)

∫ t

0

exp(−i(t−s)ad(H1))(Tr2[V12, exp(−is.ad(H1+H2))(ρ1(0)⊗ρ1(0))])ds

+(N−1)

∫ t

0

exp(−i(t−s)ad(H1))(Tr2[V12, exp(−isad(H1+H2+V12))(g12(0))])ds

+Lindbladnoiseterms.

The first term is the initial condition term, the second term is the collision termand the third term is the scattering term. Note that taking the trace gives us

i∂t(Tr(ρ1(t)) = 0

which means that the above quantum Boltzmann equation as it evolves the oneparticle state, determines a nonlinear trace preserving mapping. This suggeststhat the quantum Boltzmann equation can be used to design nonlinear TPCPmaps by controlling the electromagnetic potentials A,Φ as well as the Lindbladterms.

Now let ρ1(t, r, r′) denote the position space representation of the one particle

density operator. Then the charge density corresponding to this operator isgiven by eρ1(t, r, r) and the current density is

J(t, r) = (e/m)∇r′Im(ρ(t, r, r′))|r′=r + (e/m)A(t, r)ρ1(t, r, r)

205

This is based on the fact that if ψ(t, r) denotes a pure state in the posi-tion representation, ie, a wave function, then the mixed state is an averageof ψ(t, r)ψ(t, r′)∗ over ensembles of such pure states, ie,

ρ1(t, r, r′) =< ψ(t, r)ψ(t, r′)∗ >

In particular, the charge density is

e < |ψ(t, r)|2 >= eρ1(t, r, r)

Likewise the current density for a pure state is

J(t, r) = (ie/2m)(ψ(t, r)∗(∇− ieA(t, r))ψ(t, r) − ψ(t, r)(∇ + ieA(t, r))ψ(t, r)∗)

= (e/m)A(t, r)|ψ(t, r)|2 − (e/m)Im(ψ(t, r)∗∇ψ(t, r))which on taking average gives

J(t, r) = (e/m)A(t, r)ρ1(t, r, r) − (e/m)∇r′Im(ρ1(t, r′, r)|r′=r)

= (e/m)A(t, r)ρ1(t, r, r) + (e/m)∇r′Im(ρ1(t,′ r, r)|r′=r)

The classical em field then satisfies the Maxwell equations

A(t, r) = µ0J(t, r) = µ0((e/m)A(t, r)ρ1(t, r, r)−(e/m)∇r′Im(ρ1(t, r′, r)|r′=r))

Φ(t, r) = ρ(t, r)/ǫ0 = ǫ−10 (eρ1(t, r, r))

where = (c−2∂2t −∇2)

is the wave operator. The complete system of the quantum Boltzmann or ratherquantum Vlasov (quantum for the density operator and classical for the em field)is therefore summarized below:

H1 = −(∇1 + ieA(t, r1))2/2m+ eΦ(t, r1)

V12 = V (|r1 − r2|)ρ1(t) = exp(−itad(H1))(ρ1(0))+

(N−1)

∫ t

0

exp(−i(t−s)ad(H1))(Tr2[V12, exp(−is.ad(H1+H2))(ρ1(0)⊗ρ1(0))])ds

+(N−1)

∫ t

0

exp(−i(t−s)ad(H1))(Tr2[V12, exp(−isad(H1+H2+V12))(g12(0))])ds

+Lindbladnoiseterms.

A(t, r) = µ0((e/m)A(t, r)ρ1(t, r, r) − (e/m)∇r′Im(ρ1(t, r′, r)|r′=r))

Φ(t, r) = ǫ−10 (eρ1(t, r, r))

206CHAPTER 13. ABSTRACTS FORABOOKONQUANTUM SIGNAL PROCESSINGUSING QUANTUM FIELD THEORY

Second quantization of the quantum plasma equations.Consider the Dirac field of electrons and positrons. The wave function ψ(t, r)

is a Fermionic field operator satisfying the Dirac equation

[γµ(i∂µ + eAµ)−m]ψ(t, r) = 0

The unperturbed field ψ0(t, r), ie, when Aµ = 0 can be expanded as a superposi-tion of annihilation and creation operator fields in momentum space of electronsand positrons. This is the case when we second quantize the motion of a sin-gle electron-positron pair. When however we second quantize the motion of Nidentical electron-positron pairs and then form the marginal density operator ofa single electron-positron pair, we obtain the quantum Boltzmann equation forthe one particle density operator field ρ1(t, r

′, r′) which in the absence of mu-tual interactions V (r1 − r2), is simply ψ0(t, r)ψ0(t, r

′)∗ which is expressible asa quadratic combination of Fermionic creation and annihilation operator fieldsin momentum space. The single particle Dirac Hamiltonian is

HD = (α,−i∇+ eA) + βm− eΦ

which in the absence of electromagnetic interactions reads

HD0 = (α,−i∇) + βm

The second quantized density operator field in position space representationthat corresponds to this unperturbed Dirac Hamiltonian is given by

ρ0(t, r, r′) = ψ0(t, r)ψ0(t, r

′)

where

ψ0(t, r) =

[a(P, σ)u(P, σ)exp(−i(E(P )t−P.r))+b(P, σ)∗v(P, σ)exp(i(E(P )t−P.r))]d3P

with u(P, σ) and v(−P, σ) being eigenvectors for HD0(P ) = (α, P ) + βm witheigenvalues ±E(P ) respectively where E(P ) =

√P 2 +m2. Thus, we can write

after discretization in the momentum domain,

ρ100(t, r, r′) =

k,m

[a(k)a(m)f1(t, r, r′|k,m) + a(k)a(m)∗f2(t, r, r

′)] +H.c

where a(k), a(k)∗ are respectively Fermionic annihilation and creation operatorssatisfying the canonical anticommutation relations

[a(k), a(m)∗]+ = δ(k,m), [a(k), a(m)]+ = 0 = [a(k)∗, a(m)∗]+

After interaction with the quantum em field, the density operator ρ10(t, r, r′)

satisfies the equation

i∂tρ10(t) = [HD0 +Hint(t), ρ10(t)]

207

whereHint(t) = e(α,A(t, r)) − eΦ(t, r)

where A(t, r),Φ(t, r) are now both Bosonic quantum operator fields expressibleupto zeroth order approximations as linear combinations of Bosonic annihila-tion and creation operators c(k), c(k)∗, k = 1, 2, ... that satisfy the canonicalcommutation relations:

[c(k), c(m)∗] = δ(k,m), [c(k), c(m)] = [c(k)∗, c(m)∗] = 0

We have upto first order perturbation theory,

ρ10 = ρ100 + ρ101

wherei∂tρ101(t) = [HD0, ρ101(t)] + [Hint(t), ρ100(t)]

This has the solution

ρ101(t) =

∫ t

0

exp(−i(t− s)ad(HD0)([Hint(s), ρ100(s)])ds

It is clear that this expression for ρ101(t) is a trilinear combination of the pho-tonic creation and annihilation operators and the electron-positron creation andannihilation operators such that it is linear in the former and quadratic in thelatter. Thus, we can write approximately

ρ101(t, r, r′) =

klm

[c(k)a(l)a(m)g1(t, r, r′|klm)+c(k)∗a(l)a(m)g2(t, r, r

′|klm)+c(k)a(l)∗a(m)g3(t, r, r′|klm)+c(k)∗a(l)∗a(m)g4(t, r, r

′|klm)]+H.c

After taking into account self interactions of the plasma of electrons and positrons,the one particle density operator field ρ1(t, r, r

′) satisfies the quantum Boltz-mann equation

iρ1,t(t, r1, r′1) =

(HD(t, r1, r′′1 )ρ1(t, r

′′, r′)− ρ1(t, r, r′′)HD(t, r

′′, r′))d3r′′

+(N − 1)ρ10(t, r1, r′1)

(V (r1, r2)− V (r1‘′, r2))ρ10(t, r2, r2)d

3r2

+(N − 1)

(V (r1, r2)− V (r′1, r2))g12(t, r1, r2|r′1, r2)d3r2 −−− (a)

whereg12(t, r1, r2|r′1, r′2)

Note that in perturbation theory,

ρ1 = ρ10 + ρ11

where ρ10 as calculated above, satisfies (a) with V = 0.

208CHAPTER 13. ABSTRACTS FORABOOKONQUANTUM SIGNAL PROCESSINGUSING QUANTUM FIELD THEORY

Chapter 14

Quantum neural networksand learning using mixedstate dynamics for openquantum systems

Let ρ(t) be the state of an evolving quantum system according to the dynamics

iρ′(t) = [H0 + V (t), ρ(t)] + θ(ρ(t))

where V (t) is the control potential and θ(ρ) is the Lindblad noise term. Our aimis to design this control potential V (t) so that the probability density ρ(t, r, r)in the position representation tracks a given probability density function p(t, r)and the quantum current (e/m)∇r′Im(ρ(t, r, r′)|r′=r) tracks a given currentdensity J(t, r). By analogy with pure state dynamics and fluid dynamical sys-tems, the corresponding problem is formulated as follows: Consider the wavefunction ψ(t, r) that satisfies Schrodinger’s equation for a charged particle in anelectromagnetic field:

i∂tψ(t, r) = (−(∇+ ieA(t, r))2/2− eΦ(t, r) +W (t, r)V (t, r))ψ(t, r)

The aim is to control the neural weight W (t, r), so that pq(t, r) = |ψ(t, r)|2tracks the fluid density ρ(t, r) that is normalized so that

ρ(t, r)d3r = 1 andfurther that the probability current density

Jq(t, r) = Im(ψ(t, r)∇ψ(t, r)∗) +A(t, r)|ψ(t, r)|2

tracks a given current density J(t, r) of the fluid. Note that we are assumingthat p, J satisfy the equation of continuity:

∂tρ+ div(J) = 0

209

210CHAPTER 14. QUANTUMNEURAL NETWORKSAND LEARNING USINGMIXED STATE DYNAMICS FOROPENQUANTUM SYSTEMS

This is an effective QNN algorithm for simulating the motion of a fluid becausethe corresponding quantum mechanical probability and current density pairs(pq, Jq) automatically by virtue of Schrodinger’s equation satisfy the equationof continuity and simultaneously guarantee that pq is a probability density, ie,it is non-negative and integrates over space to unity. Thus rather than using aQNN for tracking only the density of a fluid or a probability density, we can inaddition use it to track also the fluid current density by adjusting the controlweights W (t, r). Note that for open quantum systems, ie, in the presence ofLindblad noise terms, the equation of continuity for the QNN does not hold.This means that when bath noise interacts with our QNN, we would not get anaccurate simulation of the fluid mass and current density. However, if there aresources of mass generation in the fluid so that the equation of continuity reads

∂tρ(t, r) + divJ(t, r) = g(t, r)

then we can use the Lindblad operator coefficient terms as additional controlparameters to simulate ρ, J for such a system with mass generation g > 0 ormass destruction g < 0.

|J(t, r) − Jq(t, r)|2 = |Im(ψ(t, r)∇ψ(t, r)∗) +A(t, r)|ψ(t, r)|2 − J(t, r)|2

is the space-time instantaneous norm error square between the true fluid currentdensity and that predicted by the QNN. Its gradient w.r.t A(t, r) is

δ|J(t, r)− Jq(t, r)|2/δA(t, r) =

2|ψ(t, r)|2(J(t, r) − Jq(t, r))

Note that this is a vector. This means that for tracking the current density, theupdate equation for the control vector potential must be

∂tA(t, r) = −µδ|J(t, r) − Jq(t, r)|2/δA(t, r) =

−2µ|ψ(t, r)|2(J(t, r) − Jq(t, r))

We can also derive a corresponding update equation for the scalar potential inSchrodinger’s equation based on matching the true mass density of the fluid tothat predicted by the QNN as follows.

|p(t, r) − pq(t, r)|2 = |p(t, r) − |ψ(t, r)|2|2

so∂tΦ(t, r) = −µδ|p(t, r) − |ψ(t, r)|2|2/δΦ(t, r) =

2µ[ψ(t, r)δψ(t, r)/δΦ(t, r) + ψ(t, r)δψ(t, r)/δΦ(t, r)][p(t, r) − |ψ(t, r)|2]Now using the Dyson series expansion,

ψ(t, r) = U0(t)ψ(0, r)+∑

n≥1

(−i)n∫

0<tn<...<t1<t

U0(t−t1)V (t1)U0(t1−t2)....U0(tn−1−tn)ψ(0, r)dt1...dtn−1

211

giving

δψ(t, r)/δV (t, r) = V (t, r)∑

n≥1

(−i)n∫

0<tn<...<t2<t

U0(t−t2)V (t2)U0(t2−t3)...U0(tn−1−tn)ψ(0, r)dt2...dtn

= −iV (t, r)(ψ(t, r) − ψ(0, r))

Taking the conjugate of this equation gives us

δψ(t, r)/δV (t, r) = iV (t, r)(ψ(t, r)− ψ(0, r))

Thus the update equation for Φ(t, r) = V (t, r)/e becomes

∂tΦ(t, r) = 2µ.Φ(t, r)Im(ψ(t, r) − ψ(0, r)).(p(t, r) − |ψ(t, r)|2)

Note that here, we can more generally consider the error energy to be a weightedlinear combination of the error energies between the fluid mass density and itsQNN estimate and that between the fluid current density and its QNN estimate:

E = w1|p(t, r)−|ψ(t, r)|2|2+w2|J(t, r)−Im(ψ(t, r)∇ψ(t, r)∗)+A(t, r)|ψ(t, r)|2 |2

while calculating the updates for the vector and scalar potentials using thisenergy function, we must take into account the dependence of ψ(t, r) and ψ(t, r)on both A(t, r) and Φ(t, r). For this we write the Hamiltonian as

H(t) = H0 + eV1(t) + e2V2(t)

where

H0 = −∇2/2, V1(t) = Φ(t, r)+(1/2)(divA(t, r)+2(A(t, r),∇)), V2(t) = A(t, r)2/2

Then in terms of variational derivatives, from the Dyson series expansion,

δψ(t, r)/δA(t, r) = e((i/2)∇ψ(t, r)− i∇ψ(t, r))

= (−i/2)∇ψ(t, r)and taking conjugate

δψ(t, r)/δA(t, r) = (i/2)∇ψ(t, r)

so we getδ|ψ(t, r)|2/δA(t, r) =

(−i/2)(ψ(t, r)∇ψ(t, r) − ψ(t, r)∇ψ(t, r))= Im(ψ(t, r)∇ψ(t, r))

Exercise: Using the above formulas, evaluate

δE/δA(t, r), δE/δΦ(t, r)

212CHAPTER 14. QUANTUMNEURAL NETWORKSAND LEARNING USINGMIXED STATE DYNAMICS FOROPENQUANTUM SYSTEMS

QNN in the presence of Hudson-Parthasarathy noise:Let |ψ(t) > be a pure state in the Hilbert space h ⊗ Γs(L

2(R+)) evolvingaccording to the Hudson-Parthasarathy noisy Schrodinger equation where h =L2(R+):

d|ψ(t) >= (−(iH + LL∗/2)dt+ LdA− L∗dA∗)|ψ(t) >

where H,L are system operators. Specifically, choose

H = −∇2/2 + V (t, r), L = (a,∇) + (b, r)

where a, b are constant complex 3-vectors. For a specific representation of thecreation and annihilation processes A(t)∗, A(t), we choose an onb |en > forL2(R+) and then write

A(t) =∑

n

an < χ[0,t] > |en >,A(t)∗ =∑

n

a∗n < en|χ[0,t] >

where an are operators in a Hilbert space satisfying the CCR

[an, am] = 0, [an, a∗m] = δ[n−m], [a∗n, a

∗m] = 0

Then an easy calculation shows that

[A(t), A(s)∗] =∑

n

< χ[0,t]|en >< en|χ[0,s] >=< χ[0,t]|χ[0,s] >= min(t, s)

from which we easily deduce the quantum Ito formula of Hudson and Parthasarathy.We can also introduce a conservation process Λ(t)

Λ(t) =∑

n,m

a∗nam < en|χ[0,t]|em >

and deduce that

[A(t),Λ(s)] =∑

n,m

am < χ[0,t]|en >< en|χ[0,s]|em >

=∑

m

am < χ[0,min(t,s)]|em >= A(min(t, s))

and taking the adjoint of this equation then gives us

[Λ(s), A(t)∗] = A(min(t, s))∗

These equations yield the other quantum Ito formulas involving the productsof the creation and annihilation operator differentials with the conservationoperator differentials. Now suppose that we assume that our wave function onthe tensor product of the system and bath space has the form

|ψ(t) >= |ψs(t) > ⊗|φ(u) >

213

where |φ(u) > is the bath coherent state and |ψs(t) > is the system state.Then substituting this into the Hudson-Parthasarathy noisy Schrodinger equa-tion gives us using

dA(t)|φ(u) >= dt.u(t)|φ(u) >dA(t)∗|φ(u) >= (dB(t)− u(t)dt)|φ(u) >

whereB(t) = A(t) +A(t)∗

is a classical Brownian motion process and since the bath is in the coherentstate, it is should be regarded as a Brownian motion with drift. Thus, the HPSequation assumes the form for such a special class of pure states on system andbath space assumes the form of a stochastic Schrodinger equatiion

i∂t|ψs(t) >= (−(iH + LL∗/2)dt+ u(t)dtL − (dB(t)− u(t)dt)L∗)|ψs(t) >

= (((−iH − LL∗/2 + u(t)(L+ L∗))dt− L∗dB(t))|ψs(t) >This stochastic Schrodinger equation is parametrized by the coherent state pa-rameter function u ∈ L2(R+). We can now use this stochastic Schrodinger equa-tion to track a given probability density p(t, r) using control on all H,L, u(t).This would thus define a QNN in the Hudson-Parthasarathy-Stochastic-Schrodingerframework. The idea is that Brownian noise is undesired but one cannot helpit enter into the picture and hence one must track the given pdf in spite of itbeing present.

Using quantum electrodynamics (QED) to track joint probability densitiesin a QNN.

The Lagrangian of (QED) is given by the sum of a Maxwell term, a Diracterm and an interaction term:

L = LM + LD + Lint =

(−1/4)FµνFµν + ψ(γµ(i∂µ + eAµ)−m)ψ

= (−1/4)FµνFµν + ψ(iγµ(∂µ −m)ψ + eψγµψ.Aµ

Thus we identify

LM = (−1/4)FµνFµν , LD = ψ(iγµ∂µ −m)ψ,

Lint = eψγµψ.Aµ

To proceed further, we write down the Schrodinger equation corresponding tothis second quantized system as

iU ′(t) = H(t)U(t)

whereH(t) is the second quantized Hamiltonian of the system expressed in termsof the creation and annihilation operator fields of the photons and Fermions. To

214CHAPTER 14. QUANTUMNEURAL NETWORKSAND LEARNING USINGMIXED STATE DYNAMICS FOROPENQUANTUM SYSTEMS

get at this Hamiltonian, we must first calculate to a given order of perturbation,the Dirac wave field and the Maxwell wave field in terms of these Bosonic andFermionic creation and annihilation operators. Let us as an example, derivethese expressions upto second order of perturbation. The Maxwell and Diracequations derived from the Lagrangian are

∂α∂αAµ = eψ∗αµψ,

(iγµ∂µ −m)ψ = −eγµψ.Aµwhere we have applied the Lorentz gauge condition

∂µAµ = 0

in deriving the former. In the absence of interactions, we can express the solu-tions as

ψ0(x) =∑

k

(a(k)fk(x) + a(k)∗gk(x)),

Aµ(x) =∑

k

c(k)φµk (x) + c(k)∗φµk (x)

where c(k), c(k)∗ satisfy the Bosonic CCR while a(k), a(k)∗ satisfy the FermionicCAR. Writing

ψ = ψ0 + ψ1 + ψ2, Aµ = Aµ0 +Aµ1 +Aµ2,

we geta.Zeroth order perturbation equations.

(iγµ∂µ −m)ψ0 = 0, ∂α∂αAµ0 = 0,

b.First order perturbation equations.

(iγµ∂µ −m)ψ1 = −eγµψ0Aµ0,

∂α∂αAµ1 = eψ∗

0αµψ0

c.Second order perturbation equations.

(iγµ∂µ −m)ψ1 = −eγµ(ψ0Aµ1 + ψ1Aµ0)

∂α∂αAµ2 = e(ψ∗

0αµψ1 + ψ∗1αµψ0)

In general, writing the nth order perturbation series

Aµ =

n∑

k=0

Aµk, ψ =

n∑

k=0

ψk,

215

we obtain the nth order perturbation equations as

(iγµ∂µ −m)ψn = −eγµn−1∑

k=0

ψkAµn−1−k,

∂α∂αAµn = e

n−1∑

k=0

ψ∗kαµψn−1−k

216CHAPTER 14. QUANTUMNEURAL NETWORKSAND LEARNING USINGMIXED STATE DYNAMICS FOROPENQUANTUM SYSTEMS

Chapter 15

Quantum gate design,magnet motion,quantumfiltering,Wignerdistributions

15.1 Designing a quantum gate using QED witha control c-number four potential

Consider the interaction Lagrangian between the em four potential and theDirac current:

Lint = eψγµψ.Aµ

The corresponding contribution of this interaction Lagrangian to the interactionHamiltonian is its negative (in view of the Legendre transformation) integratedover space:

Hint = −Lint = −e∫

ψγµψ.Aµd3x

The contribution of this term upto second order in the em potential to thescattering amplitude between two states |p1, σ1, p2, σ2 > and |p′1, σ′

1, p′2, σ

′2 > in

which there are two electrons with the specified four momenta and z-componentof the spins or equivalently two positrons or one electron and one positron isgiven by

A2(p′1, σ

′1, p

′2, σ

′2|p1, σ1, p2, σ2) = −e2 < p′1, σ

′1, p

′2, σ

′2|T (

Hint(x1)Hint(x2))d4x1d

4x2|p1, σ1, p2, σ2 >

Let u(p, σ) denote the free particle Dirac wave function in the momentum-spindomain. Thus it satisfies

(γµpµ −m)u(p, σ) = 0, σ = 1, 2

217

218CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

Taking the adjoint of this equation after premultiplying by γ0 and making useof the self-adjointness of the matrices γ0γµ gives us

u(p, σ)(γµpµ −m) = 0

whereu(p, σ) = u(p, σ)∗γ0

Now observe that for one particle momentum-spin states of an electron,

a(p′, σ′)|p, σ >= δ(σ′, σ)δ3(p′ − p)|0 >

and therefore,

ψ(x)|p, σ >e= exp(−ip.x)u(p, σ)|0 >and for a one particle positron state

ψ(x)∗|p, σ >p= exp(ip.x)v(p, σ)∗|0 >

or equivalently,

ψ(x)|p, σ >= ψ(x)∗γ0|p, σ >p= exp(ip.x)v(p, σ)|0 >

Thus one of the terms in A2(p′1, σ

′1, p

′2, σ

′2|p1, σ1, p2, σ2) for two electron momen-

tum state transitions is given by

−e2∫

< p′1, σ′1, p

′2, σ

′2|(ψ(x1)⊗ψ(x2))γµ⊗γν)(ψ(x1)⊗ψ(x2))|p1, σ1, p2, σ2 > Dµν(x1−x2)d4x1d4x2

=

Dµν(x1−x2)(u(p′1, σ′1)⊗u(p′2, σ′

2))(γµ⊗γν)(u(p1, σ1)⊗u(p2, σ2))exp(−i(p1−p′1).x1−i(p2−p′2).x2)d4x1d4x2

= Dµν(p1 − p′1)δ4(p1 + p2 − p′1 − p′2).

.(u(p′1, σ′1)⊗ u(p′2, σ

′2))(γ

µ ⊗ γν)(u(p1, σ1)⊗ u(p2, σ2))

This evaluates to

[(u(p′1, σ′1)γ

µu(p1, σ1)).(u(p′2, σ

′2)γµu(p2, σ2))/(p

′1 − p1)

2]δ4(p1 + p2 − p1 − p′2)

Likewise another term having a different structure also comes from the sameexpression for A2 and that is

[(u(p′2, σ′2)γ

µu(p1, σ1)).(u(p′1, σ

′1)γµu(p2, σ2))/(p

′2 − p1)

2]δ4(p1 + p2 − p1 − p′2)

The total amplitude is a superposition of these two amplitudes. Here, Dµν(x)

is the photon propagator and Dµν(p) is its four dimensional space-time Fouriertransform and is given by ηµν/p

2 where p2 = pµpµ. This propagator is definedby

Dµν(x1 − x2) =< 0|T (Aµ(x1)Aν(x2))|0 >

15.2. ON THEMOTIONOFAMAGNET THROUGHA PIPEWITHA COIL UNDER GRAVITY AND THE REACTION FORCE EXERTEDBYTHE INDUCED CURRENT THROUGH THE COIL ONTHEMAGNET219

Now we consider a situation when apart from the quantum em field, there isalso an external classical control em field described by the four potential Acµ.

Then in the nth order term in the Dyson series for the scattering ofM electrons,there will be a term of the form∫

< p′k, σ′k, k = 1, 2, ...,M |T (Hint(x1)...Hint(xn))|pk, σk, k = 1, 2, ...,M > d4x1...d

4xn

whereHint(x) = eψ(x)γµψ(x)(Aµ(x) +Acµ(x))

Here, Aµ is the four potential of the quantum electromagnetic field while Acµ(x)is the four potential of the classical c-number control electromagnetic four po-tential. Expanding these, it is clear that this term will be a sum of terms of theform

< p′k, σ′k, k = 1, 2, ...,MT (Πnk=1ψ(xk)γ

µkψ(xk))|pk, σk, k = 1, 2, ...,M >

. < 0|T (Aµ1(x1)...Aµr

(xr))|0 > Πnm=r+1Acµm

(xm)d4x1...d4xn

15.2 On the motion of a magnet through a pipewith a coil under gravity and the reaction

force exerted by the induced current throughthe coil on the magnet

Problem formulation

Assume that the coil is wound around a tube with n0 turns per unit lengthextending from a height of h to h+ b. The tube is cylindrical and extends fromz = 0 to z = d. Taking relativity into account via the retarded potentials, ifm(t) denotes the magnetic moment of the magnet, then the vector potentialgenerated by it is approximately obtained from the retarded potential formulaas

A(t, r) = µm(t− r/c)× r/4πr3 −−− (1)

where we assume that the magnet is located near the origin. This approximatesto

A(t, r) = µm(t)× r/4πr3 − µm′(t)× r/4πcr2 −−− (2)

Now Assume that the magnet falls through the central axis with its axis alwaysbeing oriented along the −z direction. Then, after time t, let its z coordinatebe ξ(t). We have the equation of motion

Mξ′′(t) = F (t)−Mg −−− (3)

220CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

where F (t) is the z component of the force exerted by the magnetic field gener-ated by the current induced in the coil upon the magnet. Now, the flux of themagnetic field generated by the magnet through the coil is given by

Φ(t) = n0

X2+Y 2≤R2T,h≤z≤h+b

Bz(t,X, Y, Z)dXdY dZ −−− (4)

where

Bz(t,X, Y, Z) = z.B(t,X, Y, Z)−−− (5a)

B(t,X, Y, Z) = curlA(t,R− ξ(t)z)−−− (5b)

where

R = (X,Y, Z)

and where RT is the radius of the tube. The current through the coil is then

I(t) = Φ′(t)/R0 −−− (6)

where R0 is the coil resistance. The force of the magnetic field generated bythis current upon the falling magnet can now be computed easily. To do this,we first compute the magnetic field Bc(t,X, Y, Z) generated by the coil usingthe usual retarded potential formula

Bc(t,X, Y, Z) = curlAc(t,X, Y, Z)−−− (7a)

where

Ac(t,X, Y, Z) = (µ/4π)

∫ 2n0π

0

(I(t−|R−RT (x.cos(φ)+y.sin(φ))−z(h+b/2n0π)φ|/c).

.(−x.sin(φ)+y.cos(φ))/|R−RT (x.cos(φ)+y.sin(φ))−z(b/2n0π)φ|)RT dφ−−−(7a)

Note that Ac(t,X, Y, Z) is a function of ξ(t) and also ofm(t) andm′(t). It followsthat I(t) is a function of ξ(t), ξ′(t),m(t),m′(t),m′′(t). Then we calculate theinteraction energy between the magnetic fields generated by the magnet and bythe coil as

Uint(ξ(t), ξ′(t), t) = (2µ)−1

(Bc(t,X, Y, Z), B(t,X, Y, Z))dXdY dZ −−− (8)

and then we get for the force exerted by the coil’s magnetic field upon the fallingmagnet as

F (t) = −∂Uint(ξ(t), ξ′(t), t)/∂ξ(t)−−− (9)

Remark: Ideally since the interaction energy between the two magnetic fieldsappears in the Lagrangian

(E2/c2 − B2)d3R/2µ, we should write down theEuler-Lagrange equations using the Lagrangian

L(t, ξ′(t), ξ′(t)) =Mξ′(t)2/2− Uint(ξ(t), ξ′(t), t) −Mgξ(t)−−(10)

15.2. ON THEMOTIONOFAMAGNET THROUGHA PIPEWITHA COIL UNDER GRAVITY AND THE REACTION FORCE EXERTEDBYTHE INDUCED CURRENT THROUGH THE COIL ONTHEMAGNET221

in the formd

dt∂L/∂ξ′ − ∂L/∂ξ = 0−−− (11)

which yields

Mξ′′(t)− d

dt∂U/∂ξ′ + ∂Uint/∂ξ +Mg = 0−−(12)

3. Numerical simulations

Define the force function as in (9):

F (ξ(t), ξ′(t) = −∂Uint(ξ(t), ξ′(t))/∂ξ(t)

Neglecting the term involving the gradient of Uint w.r.t ξ′ (this is justifiedbecause the dominant reaction force of the coil field on the magnet is due to thespatial gradient of the dipole interaction energy), we can discretize (2) with atime step of $δ to get

M(ξ[n+ 1]− 2ξ[n] + ξ[n− 1])/δ2 + Fint(ξ[n], (ξ[n]− ξ[n− 1])/δ) +Mg = 0

or equivalently in recursive form,

ξ[n+ 1] == 2ξ[n]− ξ[n− 1]− (δ2/M)Fint(ξ[n], (ξ[n]− ξ[n− 1])/δ)− δ2g

Fig.1 shows plot of ξ[n] obtained using a ”for loop”. The current through thecoil as a function of time has also been plotted in fig.2 using the formula

I(t) = R−10

d

dt(n0

X2+Y 2≤R2T,h≤z≤h+b

Bz(t,X, Y, Z)dXdY dZ)

= I(t|ξ(t), ξ′(t))where

Bz(t,X, Y, z) = z.curlA(t,X, Y, Z−ξ(t)) = ∂Ay(X,Y, Z)∂X−∂Ax(X,Y, Z)/∂Y

with

A(X,Y, Z) = µ0.m×R/4πR3,R = (X,Y, Z)

Here relativistic corrections have been neglected. The formula for the interactionenergy between the magnet and the field generated by the coil discussed in theprevious section can be simplified after making appropriate approximations to−m.Bc = mBcz where Bcz, the z-component of the reaction magnetic fieldproduced by the coil is evaluated at the site of the magnet, ie, at (0, 0, ξ(t)).Thus, the formula for the force of the field on the coil is approximated as

Fint(ξ, ξ′) = Fint(ξ) = −m∂Bcz(t, 0, 0, ξ|ξ, ξ′)/∂ξ

222CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

The coil magnetic field is evaluated using the approximate non-relativistic Biot-Savart formula

Bc(t,X, Y, Z|ξ, ξ′) = (µ0I(t|ξ, ξ′)/4π))∫ 2n0π

0

RTdφ(−x.sin(φ)+y.cos(φ))×((X−RT .cos(φ))x+(Y−RT .sin(φ))

.y+(Z−h−bφ/2n0π)z)/[(X−RT .cos(φ))2+(Y−RT .sin(φ))2+(Z−h−bφ/2n0π)2]3/2

These approximations are shown to produce excellent agreement with hardwareexperiments as a comparison of figs.() shows.

Taking into account relativistic corrections: The magnetic field generatedby the coil has the form Bc(t,X, Y, Z|ξ(t), ξ′(t)). More precisely, if I0(t) isthe current of the electrons within the magnet, the magnetic vector potentialgenerated by this current is given by

A(t, R) = (µ/4π)

magnet

I0(t− |R− ξ(t)− R′|/c)dR′/|R− ξ(t)−R′|

In this expression, ξ(t) is the position vector of the centre of the magnet at timet and R′ is the position on the magnet electron loop relative to its centre. Sincethe magnet current is a constant in the case of a permanent magnet, the abovesimplifies to

A(t, R) = (µI0/4π)

magnet

dR′/|R− ξ(t)−R′|

and hence by making the infinitesimal loop approximation, this becomes

A(t, R) = (µ/4π)m× (R− ξ(t))/|R − ξ(t)|3

In other words, relativistic corrections to the magnetic field produced by themagnet will arise only if the magnet is an electromagnet so that the currentthrough it varies with time. However, the current through the coil wound onthe tube surface varies with time and the magnetic field produced by it will haveto be calculated using the retarded potential. Hence relativistic corrections willarise in this case.

Estimating the magnetic momentm of the magnet using the following scheme:The magnetic field through the coil and hence the magnetic flux and hence theinduced current in the coil are all functions of m. We’ve already noted that theinduced current can be expressed in the form given by equations (2b),(4),(5)and (6). We write this as

I(t) = I(m, ξ(t), ξ′(t))

in the special case m is a constant. It is also clear from the above formulas thatfor given ξ(t), ξ′(t), the coil current is a linear function of m. This is because

15.2. ON THEMOTIONOFAMAGNET THROUGHA PIPEWITHA COIL UNDER GRAVITY AND THE REACTION FORCE EXERTEDBYTHE INDUCED CURRENT THROUGH THE COIL ONTHEMAGNET223

the magnetic field produced by the magnet is a linear function of m. Hence, wecan write

I(t) = P (ξ(t), ξ′(t))m

where P (ξ(t), ξ′(t)) is an easily calculable function. Thus, if we take measure-ments of the coil current and the displacement of the magnet at a sequence oftimes t1 < t2 < .. < tN , then m may be estimated by the least squares method:

m = argminm

N∑

k=1

(I(tk)− P (ξ(tk), (ξ(tk+1)− ξ(tk))/(tk+1 − tk))m)2

Carrying out this minimization results in the estimate

m =∑

k

IkPk/∑

k

P 2k

whereIk = I(tk), Pk = P (ξ(tk), (ξ(tk+1)− ξ(tk))/(tk+1 − tk))

The estimates of m obtained in this way for different measurements are shownin Table(.).

Remark: m is the z component of the magnetic moment which equals in mag-nitude the total magnetic moment since the magnet is assumed to be orientedalong the z direction. The flux through the coil can be expressed as

Φ(t) = m.(µn0/4π)

X2+Y 2≤R2T,h≤Z≤h+b

curl(z×R−ξ(t)z)/|R−ξ(t)z|3).zdXdY dZ

so this provides the immediate identification

P (ξ(t), ξ′(t)) =

d

dt[(µn0/4πR0)

X2+Y 2≤R2T,h≤Z≤h+b

curl(z ×R/|R− ξ(t)z|3).zdXdY dZ

Quantization of the motion of a magnet in the vicinity of the coil. Themagnetic dipole is replaced by a constrained sea of Dirac electrons and positronswithin the boundary of the magnet. This Dirac field can be expressed in theform

ψ(x) =∑

k

(a(k)fk(x) + a(k)∗gk(x))

where the functions fk(x), gk(x) are eigen functions of the Dirac equation withboundary conditions determined by the vanishing of the Dirac wave functionon the boundary of the magnet. These are four component wave functions. Inother words, they satisfy

(iγµ∂µ −m)(fk(x), gk(x)) = 0

224CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

for all x within the bounding walls of the magnet and they vanish on the bound-ary, or rather the charge density corresponding to these functions as well as thenormal component of the spatial current density vanishes on the boundary. Themagnetic dipole moment of the magnet after second quantization is given by

m(t) =

M

ψ(x)∗(eJ/2m)ψ(x)d3x

whereJ = L+ gσ/2

is the total orbital plus spin angular momentum of an electron taking the gfactor into account, normally we put g = 1. It is a conserved vector operatorfor the Dirac Hamiltonian. The a(k), a(k)∗ are the annihilation and creationoperators of the electrons and positrons. They satisfy the CAR

[a(k), a(m)∗]+ = δ(k,m)

It is easily noted that m(t) is a quadratic function of the electron-positroncreation and annihilation operators.

The magnetic field produced by this magnet at the site of the coil is givenby the usual formula

B(t,R) = (µ/4π)curl(m(t)× (R− ξ(t))/|R − ξ(t)|3)

When integrated over the cross-section of the coil and differentiated w.r.t timeand then divided by the coil resistance, this gives the induced current as afunction of the position ξ(t) and velocity ξ′(t) of the magnet and this expressionfor the current is a quadratic form in the Fermion creation and annihilationoperators. The magnetic field produced by the coil current at the site of themagnet can be computed and hence the force exerted by the coil’s magnetic fieldon the magnet obtained by taking the dot product of m(t) with this magneticfield followed by the spatial gradient will be a fourth degree polynomial inthe Fermionic creation and annihilation operators. The Heisenberg equation ofmotion of the coil will therefore have the form

ξ′′(t) = F (t, ξ(t), ξ′(t), a(k), a(k)∗)− g

where g is the acceleration due to gravity and the rhs is a homogeneous fourthdegree polynomial in a, a∗. We can thus express this operator equation as

ξ′′(t) = F1(t, ξ(t), ı′(t)|kmpq)a(k)a(m)a(p)a(q)+F2(t, ξ(t), ı

′(t)|km)a(k)∗a(m)a(p)a(q)+F3(t, ξ(t), ı′(t)|kmpq)a(k)∗a(m)∗a(p)a(q)+F4(t, ξ(t), ı

′(t)|km)a(k)∗a(m)...+sixteentermsinall−g

These equations are solved approximately using perturbation theory as follows.We express the solution as

ξ(t) = d− gt2/2 + δξ(t) = ξ0(t) + δξ(t), ξ0(t) = d− gt2/2

where δξ(t) satisfies the first order perturbed equation

δξ′′(t) = F (t, ξ0(t), ξ′0(t), a, a

∗)

15.2. ON THEMOTIONOFAMAGNET THROUGHA PIPEWITHA COIL UNDER GRAVITY AND THE REACTION FORCE EXERTEDBYTHE INDUCED CURRENT THROUGH THE COIL ONTHEMAGNET225

giving

δξ(t) =

∫ t

0

(t− s)F (s, ξ0(s), ξ′0(s), a, a

∗)ds

Statistical properties of the quantum fluctuations δξ(t) of the magnet’s positionin a coherent state of the Fermions within the magnet can now be evaluated.For example if |ψ(γ) > is such a coherent state, then

a(k)|ψ(γ) >= γ(k)|ψ(γ) >

a(k)∗|ψ(γ) >= ∂/∂γ|ψ(γ) >where γ(k), γ(k)∗ are Grassmannian numbers, ie, Fermionic parameters. Theyare assumed to all mutually anticommute, ie,

[γ(k), γ(l)]+ = 0 = [γ(k)∗, γ(l)∗]+,

[γ(k), γ(l)∗]+ = 0

and further they anticommute with the a(m), a(m)∗. To check the consistencyof these operations, we have

γ(m)a(k)|ψ(γ) >= γ(m)γ(k)|ψ(γ) >

on the one hand and on the other,

γ(k)γ(m)|ψ(γ) >= −γ(m)γ(k)|ψ(γ) >= −γ(m)a(k)|ψ(γ) >

in agreement withγ(m)γ(k) = −γ(k)γ(m)

Further,

a(k)a(m)∗|ψ(γ) >= (δ(k,m)− a(m)∗a(k))|ψ(γ) >=(δ(k,m)− a(m)∗γ(k))|ψ(γ) >= (δ(k,m) + γ(k)a(m)∗)|ψ(γ) >

while on the other hand,

a(m)∗a(k)|ψ(γ) >= a(m)∗γ(k)|ψ(γ) >

= −γ(k)a(m)∗|ψ(γ) >which is a consistent picture.

Let us for example take a single CAR pair a, a∗ so that a2 = a∗2 = 0, aa∗ +a∗a = 1. Let |0 > be the vacuum state and |1 > the single particle state, sothat

a|0 >= 0, a∗|1 >= 0, a∗|0 >= |1 >, a|1 >= |0 >Then, the number operator N = a∗a obviously satisfies

N |0 >= 0, N |1 >= |1 >

226CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

Suppose we assume that

|ψ(γ) >= (c1γ + c2γ∗ + c3)|0 > +(c4γ + c5γ

∗ + c6)|1 >

Then since aγ = −γa, aγ∗ = −γ∗a, we get

a|ψ(γ) >= (−c4γ|0 > −c5γ∗ + c6)|0 >

and for this to equal γ|ψ(γ) >, we require that c2 = c5 = c6 = 0 and c3 = −c4.In other words,

|ψ(γ) >= (c1γ + c3)|0 > −c3γ|1 >We then get

a∗|ψ(γ) >= (−c1γ + c3)|1 >Note that we require that

γγ∗ + γ∗γ = 1

We have∂/∂γ|ψ(γ) >= c1|0 > −c3|1 >

which cannot equal a∗|ψ(γ) >. So to get agreement with this equation also, wemust suppose that

|ψ(γ) >= (c1γ + c2γ∗ + c3γ

∗γ + c4)|0 > +(d1γ + d2γ∗ + d3γ

∗γ + d4)|1 >

Note that by hypothesis,γ∗γ + γγ∗ = 0

Then,a|ψ(γ) >= (−d1γ − d2γ

∗ + d3γ∗γ + d4)|0 >

γ|ψ(γ) >= (c2γγ∗ + c4γ)|0 > +(d2γγ

∗ + d4γ)|1 >For equality of the two expressions, we must thus assume that

d2 = d4 = 0, d3 = −c2, d1 = −c4

This gives

|ψ(γ) >= (c1γ + c2γ∗ + c3γ

∗γ + c4)|0 > +(−c4γ − c2γ∗γ)|1 >

Now we also require that

a∗|ψ(γ) >= −∂|ψ(γ) > /∂γ

and this gives

−(c1 − c3γ∗)|0 > +(c4 − c2γ

∗)|1 >= (−c1γ − c2γ∗ + c3γ

∗γ + c4)|1 >

Thus,c1 = c3 = 0

15.3. QUANTUM FILTERING THEORYAPPLIED TOQUANTUM FIELD THEORY FORTRACKINGAGIVEN PROBABILITYDENSITY FUNCTION227

and we get|ψ(γ) >= (c2γ

∗ + c4)|0 > +(−c4γ − c2γ∗γ)|1 >

As a check, we compute

a|ψ(γ) >= (c4γ − c2γ∗γ)|0 >,

γ|ψ(γ) >= (c2γγ∗ + c4γ)|0 >,

a∗|ψ(γ) >= (−c2γ∗ + c4)|1 >,∂|ψ(γ) > /∂γ = (−c4 + c2γ

∗)|1 >

Now suppose that we have a term for example as a(k)∗a(m)a(n)∗a(r) andwe wish to evaluate its expectation value in the coherent state |ψ(γ) >. It willbe given by

< ψ(γ)|a(k)∗a(m)a(n)∗a(r)|ψ(γ) >=−γ(r) < ψ(γ)|a(k)∗a(m)a(n)∗|ψ(γ) >

= −γ(r)γ(k)∗ < ψ(γ)|a(m)a(n)∗|ψ(γ) >== −γ(r)γ(k)∗ < ψ(γ)|δ(m,n)− a(n)∗a(m)|ψ(γ) >

= −δ(m,n)γ(r)γ(k) + γ(r)γ(k)∗γ(n)∗γ(m)

The problem is how to interpret this in terms of real numbers ?

15.3 Quantum filtering theory applied to quan-

tum field theory for tracking a given prob-ability density function

Consider for example the Klein-Gordon field φ(x) in the presence of an externalpotential V (φ) whose dynamics after it gets coupled to the electron-positronDirac field is given by the Lagrangian

L = ψ(iγµ∂µ − eφ−me)ψ + (1/2)∂µφ∂µφ−m2

kφ2/2− V (φ)

= Le + Lkg + Lint

where Le, the electronic Lagrangian, is given by

Le = ψ(iγµ∂µ −me)ψ,

Lk the KG Lagrangian is given by

Lkg = (1/2)∂µφ.∂µφ−m2

kφ2/2− V (φ)

228CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

and Lint, the interaction Lagrangian between the KG field and the electron fieldis given by

Lint = −eψψ.φNote that we could also include interaction terms of the form given by the realand imaginary parts of

ψγµ1 ...γµkψ.∂µ1...∂µk

φ

The Schrodinger evolution equation for such a system of quantum fields is ob-tained by introducing Fermionic and Bosonic creation and annihilation operatorsto describe the non-interacting fields in the presence of zero potential V , thenapplying perturbation theory to approximately solve for the interacting fieldsin the presence of the potential V , then use this approximate solution to writedown an expression for the total second quantized field Hamiltonian in terms ofthe creation and annihilation operators.

The electron-positron field interacting with a KG field and the photon field:

L = (−1/4)FµνFµν + ψ(γµ(i∂µ + e1Aµ) + e2φ−m1)ψ

+(1/2)∂µφ.∂µφ−m2φ

2/2

The interaction Hamiltonian derived from this expression is

Hint(t) = e1

ψ(x)γµψ(x)Aµ(x)d3x

+e2

ψ(x)ψ(x)φ(x)d3x

One of the terms for the scattering amplitude of two electrons in the Dysonseries formulation is

e21 < p′1, p′2|∫

T (ψ(x1)γµψ(x1)ψ(x2)γ

νψ(x2))DAµν(x1 − x2)d

4x1d4x2|p1, p2 >

Another term is

e22 < p′1, p′2|∫

T (ψ(x1)ψ(x1)ψ(x2)ψ(x2))Dφ(x1 − x2)d

4x1d4x2|p1, p2 >

Yet another term is obtained from

e21e22

ψ(x1)γµψ(x1)ψ(x2)γ

νψ(x2)ψ(x3)ψ(x3)ψ(x4)ψ(x4)DAµν(x1−x2)Dφ(x3−x4)dx1dx2dx3dx4

This last term leads to a term of the form

e21e22

< p′1, p′2|ψ(x4)Se(x4−x1)γµSe(x1−x2)γνψ(x2)ψ(x3)ψ(x3)|p1, p2 > DA

µν(x1−x2)Dφ(x3−x4)dx1dx2dx3dx4

15.4. LECTUREONTHE ROLEOF STATISTICAL SIGNAL PROCESSING IN GENERAL RELATIVITY AND QUANTUMMECHANICS229

This term can be evaluated easily by expanding in terms of matrix elements orequivalently by using the Kronecker tensor product:

e21e22

< p′1, p′2|(ψ(x4)⊗ψ(x3))(Se(x4−x1)γµSe(x1−x2)⊗I)(ψ(x2)⊗ψ(x3))|p1, p2 >

.DAµν(x1 − x2)D

φ(x3 − x4)dx1dx2dx3dx4

One of the terms in the scattering amplitude of two electrons in the presenceof an external electromagnetic potential is

A(p′1, p′2|p1, p2) =< p′1, p

′2|∫

Acµ(x1)T (ψ(x1)γµT (ψ(x1)ψ(x2))γ

νψ(x2)ψ(x3))γρψ(x3))Dνρ(x2−x3)dx1dx2dx3|p1, p2 >

=

Acµ(x1) < p′1, p′2|T (ψ(x1)γµSe(x1−x2)γνψ(x2)ψ(x3)γρψ(x3)|p1, p2 > Dνρ(x2−x3)dx1dx2dx3

This can be expressed as

Acµ(x1) < p′1, p′2, |T ((ψ(x1)⊗ψ(x3))(γµSe(x1−x2)γν⊗γρ)(ψ(x2)⊗ψ(x3))|p1, p2 > Dνρ(x2−x3)dx1dx2dx3

which evaluates to∫

Acµ(x1)exp(i(p′1.x1+p

′2.x3−p1.x2−p2.x3))u(p′1)γµSe(x1−x2)γνu(p1).u(p′2)γρu(p2)Dνρ(x2−x3)dx1dx2dx3

By the change of variables

x1 − x2 = x, x2 − x3 = y

and by expressing Acµ(x1) in terms of its momentum domain Fourier integral,this integral can be expressed as

15.4 Lecture on the role of statistical signal pro-

cessing in general relativity and quantummechanics

[1] Using the extended Kalman filter to calculate the estimate of the fluid ve-locity field when the fluid dynamical equations are described by the generalrelativisitic versions of the Navier-Stokes equation and the equation of continu-ity and external noise is present. The fluid equations are

((ρ+ p)vµvν − pgµν):ν = fµ

230CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

where fµ is a random field. From these equations, we can derive

((ρ+ p)vν):νvµ + (ρ+ p)vνvµ:ν − p,µ = fµ

which results in on using vµvµ = 1,

((ρ+ p)vν):ν − p,µvµ − fµv

µ = 0

This represents the general relativistic equation of continuity and on substitutingthis into the previous equation, we get

(ρ+ p)vνvµ:ν − p,µ − fµ + vµ(p,νvν + fνv

ν) = 0

This equation is the general relativistic version of the Navier-Stokes equation.This system of equations can be expressed as a system of four differential equa-tions for vr, r = 1, 2, 3, ρ assuming that the equation of state p = p(ρ) is known.Then, the measurement process consists of measuring the velocity field vr ata set of N discrete spatial points, taking into account measurement noise andthen applying the EKF to estimate the fields vr, ρ at all spatial points on a realtime basis.

Applications of quantum statistical signal processing to quantum fluid dy-namics.

Consider the Navier-Stokes equation along with the equation of continuityin the form

v,t(t, r) = F1(t, r, v(t, r),∇v(t, r), ρ(t, r),∇ρ(t, r)) + w(t, r)

ρ,t(t, r) = F − 2(t, r, v(t, r),∇ρ(t, r),∇v(t, r))Denoting the four component vector time varying vector field [ρ(t, r), v(t, r)T ]T

by ξ(t, r), these equations can be expressed in the form

ξ,t(t, r) = F(t, r, ξ(t, r),∇ξ(t, r)) + w(t, r)

where w is the noise field. Now to quantize this system of equations, there aretwo available routes. The first is to introduce an auxiliary four vector field η(t, r)which acts as a Lagrange multiplier and write down the Lagrangian density forthe pair of fields ξ, η as

L(ξ, η, ξ,t,∇ξ,∇η) = η(t, r)T (ξ,t(t, r) − F (t, r, ξ(t, r),∇ξ(t, r)))

The equations of motion of η are derived from

δξL = 0

which result in

η,t(t, r) + η(t, r)T (∂F/∂ξ)− div(η(t, r)T ∂F/∂∇ξ) = 0

15.5. PROBLEMS IN CLASSICAL ANDQUANTUM STOCHASTIC CALCULUS231

Now we can express the Hamiltonian density of this system of two fields asfollows. The Hamiltonian density of this system of fields is first computed: Thecanonical momenta are

Pη = ∂L/∂η,t = 0

Pξ = ∂L/∂ξ,t = η

H = ηξ,t − L

It is impossible to bring this to Hamiltonian form owing to the strange natureof of the constraints involved. Indeed, to bring this to Hamiltonian form, ξ,tmust be expressed in terms of the canonical momenta, ξ, η and their spatialgradients which is not possible. Therefore, we add a regularizing term to givethe Lagrangian density the following form:

L = ηT (ξ,t − F ) + ǫ.ξ2,t/2 + µ.η2,t/2

The canonical momenta now are

Pξ = ∂L/∂ξ,t = η + ǫ.ξ,t,

Pη = ∂L/∂η,t = µη,t

Thus,

H = Pξ.ξ,t + Pη.η,t − L =

15.5 Problems in classical and quantum stochas-

tic calculus

[1] Let P : 0 = t0 < t1 < ... < tn = T be a partition of [0, T ]. Let B be standardone dimensional Brownian motion. Let f be adapted to the Brownian filtrationand bounded, continuous over [0, T ]. Let Q : 0 = s0 < s1 < ... < sm = T beanother partition of [0, T ]. Define

I(P, f) =

n−1∑

k=0

f(tk)(B(tk+1)−B(tk))

Show that if R = P ∪Q, then

E(I(P, f) − I(P ∪Q, f))2 ≤ T.maxksuptk<s<tk+1E(f(s)− f(tk))

2

and that this quantity converges to zero as |P | → 0. Deduce using the inequality

E(I(P, f)− I(Q, f))2 ≤ 2E(I(P, f)− I(P ∪Q, f))2+2E(I(Q, f)− I(P ∪Q, f))2

232CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

that as |P |, |Q| → 0, we get

E(I(P, f)− I(Q, f))2 → 0

and hence that

lim|P |→0I(P, f) =

∫ T

0

f(t)dB(t)

exists.

[2] Derive the quantum Fokker-Planck or Master equation or GKSL equationfrom Hamiltonian quantum mechanics along the following lines: Let H(t) =H0+V (t) be the Hamiltonian where H0 is non-random and δH(t) is an operatorvalued Hermitian stochastic process with dt2.E(V (t) ⊗ V (t)) being of O(dt)just as in Ito’s formula for Brownian motion. Here V (t) is to be regarded asoperator valued white noise so that V (t)dt becomes an operator valued Browniandifferential. Then consider the Schrodinger equation

iρ′(t) = [H0 + V (t), ρ(t)] = [H(t), ρ(t)]

Show that this implies approximately for small τ that

i(ρ(t+τ)−ρ(t)) =∫ t+τ

t

[H(s), ρ(s)]ds =

∫ t+τ

t

[H(s), ρ(t)+ρ′(t)(s−t)]ds+O(τ3)

= [H(t), ρ(t)]τ − i[H(t), [H(t), ρ(t)]]τ2/2

which gives on taking average over the probability distribution of V (t),

ρ(t+ τ) − ρ(t) = −i[H0, ρ(t)]τ − (τ2/2) < [V (t), [V (t), ρ(t)]] >

= −i[H0, ρ(t)]τ − (τ2/2)(< V (t)2ρ(t) + ρ(t)V (t)2 − 2V (t)ρ(t)V (t) >)

Now, write

V (t) =∑

k

ξ(k)Lk

where ξ(k) are iid r.v’s with zero mean and variance 1/τ in accordance with theassumption that V (t) is operator valued white noise. L′

ks are assumed to beself-adjoint operators in the Hilbert space. Then show that the above equationgives

ρ′(t) = −i[H0, ρ(t)]− (1/2)∑

k

(L2kρ(t) + ρ(t)L2

k − 2Lkρ(t)Lk)

which is a special case of the master equation for open quantum systems.

[3] Quantum fluctuation dissipation theorem: The general Lindblad term inthe master equation has the form

k

(L∗kLkρ+ ρL∗

kLk − 2LkρL∗k]

15.5. PROBLEMS IN CLASSICAL ANDQUANTUM STOCHASTIC CALCULUS233

For ρ = C(β)exp(−βH0) to satisfy the GKSL as an equilibrium state, we requirethe quantum fluctuation-dissipation theorem to hold, ie,

k

(L∗kLkexp(−βH) + exp(−βH)L∗

kLk − 2Lk.exp(−βH).L∗k) = 0

Problem: letH = P 2/2m+ U(Q)

in L2(R) where P = −id/dQ and put

Lk = gk(Q,P ) = αkQ+ βkP

Evaluate the conditions of the quantum fluctuation-dissipation theorem.hint:

exp(−βH).F (Q,P ).exp(βH) = exp(−βadH)(F )

= F +∑

n≥1

((−β)n/n!)(adH)n(F )

Now,[H,Q] = −iP/m, [H,P ] = iU ′(Q)

Make the induction hypothesis

ad(H)n(Q) =

n∑

k=0

Fnk(Q)P k, ad(H)n(P ) =

n∑

k=0

Gnk(Q)P k

Deduce recursion formulae for Fnk(Q)k and Gnk(Q)k.

[4] Evaluate exp(−βP 2/2m).exp(−βU(Q)) in terms of exp(−β(P 2/2m +U(Q)) and vice versa and multiple commutators between P and Q.

hint: Let A,B be two operators. Put

exp(tA).exp(tB) = exp(Z(t))

Then,exp(Z)g(adZ)(Z ′) = Aexp(Z) + exp(Z)B

whereg(z) = (1 − exp(−z))/z

Thus,g(adZ)(Z ′) = exp(−adZ)(A) +B

or equivalentlyZ ′ = f1(adZ)(A) + f2(adZ)(B)

wheref1(z) = g(z)−1, f2(z) = exp(−z)g(z)−1

234CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

15.6 Some remarks on the equilibrium Fokker-

Planck equation for the Gibbs distribution

The quantum fluctuation-dissipation theorem for a single Lindlbad operator is

L∗Lexp(−βH) + exp(−βH)L∗L− 2Lexp(−βH)L∗ = 0

This is the condition on the Lindblad operatorL so that ρ = exp(−βH)/T r(exp(−βH))satisfies the steady state GKSL equation:

0 = i∂tρ = [H, ρ]− (1/2)(L∗Lρ+ ρL∗L− 2LρL∗)

HereH = P 2/2m+ U(Q), P = −i∂/∂Q

The classical version of this is

0 = ∂tf(Q,P ) = −(P/m)∂Qf + U ′(Q)∂P f + γ∂P (Pf) + (σ2/2)∂2P f

withf = Cexp(−βH), H = P 2/2m+ U(Q)

Note that this classical Fokker-Planck equation is derived from the Ito sde

dQ = Pdt/m, dP = −γPdt− U ′(Q)dt+ σdB(t)

Now,∂P f = −βPf/m, ∂2Pf = −βf/m+ β2P 2f/m2

∂Qf = −βU ′(Q)f

So the above classical equilibrium condition becomes

0 = (βP/m)U ′(Q)+U ′(Q)(−βP/m)+γ(1−βP 2/m)+(σ2/2)(−β/m+β2P 2/m2)

This is satisfied iffγ − σ2β/2m = 0

or equivalently,2kTmγ = σ2

which is the classical fluctuation-dissipation theorem as first derived by Ein-stein. In the quantum context, the dissipation coefficient γ and the diffu-sion/fluctuation coefficient σ2 are both represented by the Lindblad opera-tor L as follows from the derivation the master equation from the Hudson-Parthasarathy-Schrodinger equation

dU(t) = (−(iH + L∗L/2)dt+ LdA− L∗dA∗)U(t)

by partial tracing of the density evolution over the bath state. Now considerthe Heisenberg quantum equations of motion for Q,P for the master equation:For any Heisenberg observable X , the dual of the master equation is

X ′ = i[H,X ]−(1/2)(XL∗L+L∗LX−2L∗XL) = i[H,X ]−(1/2)([X,L∗]L+L∗[L,X ])

15.6. SOMEREMARKS ON THE EQUILIBRIUM FOKKER-PLANCKEQUATION FORTHEGIBBS DISTRIBUTION235

In particular for X = Q,

Q′ = P/m− (1/2)([Q,L∗]L+ L∗[L,Q])

and for X = P ,

P ′ = −U ′(Q)− (1/2)([P,L∗]L+ L∗[L, P ])

Now to get an equation that describes the motion of a particle in a potentialU(Q) with linear velocity damping, we assume that

L = aQ+ bP, L∗ = aQ+ bP

Then,[L,Q] = [aQ+ bP,Q] = −ib, [Q,L∗] = [Q, aQ+ bP ] = ib

[Q,L∗]L+ L∗[L,Q] = ib(aQ+ bP ) + (aQ+ bP )(−ib) = 2.Im(ab)Q

[P,L∗]L+ L∗[L, P ] = (−ia)(aQ + bP ) + (aQ+ bP )(ia) = 2.Im(ab)P

Thus the Heisenberg-Lindblad equations read

Q′ = P/m− Im(ab)Q,

P ′ = −U ′(Q)− Im(ab)P

Eliminating P between these two equations gives us

Q′′ = (−U ′(Q)−m.Im(ab)(Q′ + Im(ab)Q))/m− Im(ab)Q′

ormQ′′ = −(U ′(Q) + (Im(ab))2)Q − 2Im(ab)Q′

which is the velocity damped equation in a potential U(Q) + γ2Q2/8 and withvelocity damping coefficient γ = 2Im(ab). The harmonic correction to the po-tential U(Q) is actually a quantum correction as can be verified by not assumingPlanck’s constant h = 1 as we have done here. (Here h stands for Planck’s con-stant divided by 2π). Now, the quantum fluctuation-dissipation theorem canalso be expressed as

L∗L+ exp(−βadH)(L∗L)− 2Lexp(−βadH)(L∗) = 0

For high temperatures or equivalently small β, this reads

[L∗, L]− β([H,L∗L]− L[H,L∗]) = 0

where O(β2) terms have been neglected. Now for L = aQ+ bP , we get

[L∗, L] = [aQ + bP, aQ + bP ] = iab− iab = 2Im(ab)

[H,L∗L] = [H,L∗]L+ L∗[H,L]

so[H,L∗L]− L[H,L∗] = [[H,L∗], L] + L∗[H,L]

236CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

15.7 Wigner-distributions in quantum mechan-

ics

Since Q,P do not commute in quantum mechanics, we cannot talk of a jointprobability density of these two at any time unlike the classical case. So how tointerpret the Lindblad equation

i∂tρ = [H, ρ]− (1/2)(L∗Lρ+ ρL∗L− 2LρL∗)

as a quantum Fokker-Planck equation for the joint density of (Q,P ) ? Wignersuggested the following method: Let ρ(t, Q,Q′) =< Q|ρ(t)|Q > denote ρ(t) inthe position representation. Then ρ(t) in the momentum representation is givenby

ρ(t, P, P ′) =< P |ρ(t)|P ′ >=

< P |Q >< Q|ρ(t)|Q′ >< Q′|P ′ > dQdQ′

=

ρ(t, Q,Q′)exp(−iQP + iQ′P ′)dQdQ′

ρ(t, Q,Q) is the probability density of Q(t) and ρ(t, P, P ) is the probabilitydensity of P (t). Both are non-negative and integrate to unity. Now Wignerdefined the function

W (t, Q, P ) =

ρ(t, Q+ q/2, Q− q/2)exp(−iP q)dq/(2π)

This is in general a complex function of Q,P and therefore cannot be a proba-bility density. However,

W (t, Q, P )dP = ρ(t, Q,Q),

is the marginal density of Q(t) Further, we can write

W (t, Q, P ) =

ρ(t, P1, P2)exp(iP1(Q+q/2)−iP2(Q−q/2))exp(−iP q)dP1dP2dq/(2π)2

− =

ρ(t, P1, P2)exp(i(P1 − P2)Q)δ((P1 + P2)/2− P )dP1dP2/(2π)

so that∫

W (t, Q, P )dQ =

ρ(t, P1, P2)δ(P1 − P2)δ((P1 + P2)/2− P )dP1dP2

= ρ(t, P, P )

is the marginal density of P (t). This suggest strongly that the complex valuedfunctionW (t, Q, P ) should be regarded as the quantum analogue of the jointprobability density of (Q(t), P (t)). Our aim is to derive the pde satisfied by W

15.7. WIGNER-DISTRIBUTIONS IN QUANTUM MECHANICS 237

when ρ(t) satisfies the quantum Fokker-Planck equation. To this end, observethat

[P 2, ρ]ψ(Q) = (−∫

ρ,11(Q,Q′)ψ(Q′)dQ′ +

ρ(Q,Q′)ψ′′(Q′)dQ′

=

(−ρ,11 + ρ,22)(Q,Q′)ψ(Q′)dQ′

which means that the position space kernel of [P 2, ρ] is

[P 2, ρ](Q,Q′) = (−ρ,11 + ρ,22)(Q,Q′)

Also[U, ρ](Q,Q′) = (U(Q)− U(Q′))ρ(Q,Q′)

L∗Lρ(Q,Q′) = (aQ+ bP )(aQ+ bP )ρ(Q,Q′)

= |a|2Q2ρ(Q,Q′)− |b|2ρ,11(Q,Q′)− iabQρ,1(Q,Q′)

−iab(ρ(Q,Q′) +Qρ,1(Q,Q′))

How to transform from W (Q,P ) to ρ(Q,Q′)?

W (Q,P ) =

ρ(Q+ q/2, Q− q/2)exp(−iP q)dq

gives∫

W (Q,P )exp(iP q)dP/(2π) = ρ(Q+ q/2, Q− q/2)

or equivalently,∫

W ((Q +Q′)/2, P )exp(iP (Q−Q′))dP/(2π) = ρ(Q,Q′)

So

ρ,1(Q,Q′) =

((1/2)W,1((Q+Q′)/2, P )+iPW ((Q+Q′)/2, P ))exp(iP (Q−Q′))dP

Also note that

Qρ(Q,Q′) =

QW ((Q+Q′)/2, P )exp(iP (Q−Q′))dP =

=

((Q +Q′)/2) + (Q−Q′)/2)W ((Q+Q′)/2, P )exp(iP (Q−Q′))dP

(((Q+Q′)/2)W ((Q+Q′)/2, P )+(i/2)W,2((Q+Q′)/2, P ))exp(iP (Q−Q′))dP

on using integration by parts. Consider now the term

(U(Q)− U(Q′))ρ(Q,Q′)

238CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

In the classical case, ρ(Q,Q′) is diagonal and so such a term contributes onlywhen Q′ is in the vicinity of Q. For example, if we have

ρ(Q,Q′) = ρ1(Q)δ(Q′ −Q) + ρ2(Q)δ′(Q′ −Q)

then,

(U(Q′)− U(Q))ρ(Q,Q′) = ρ2(Q)(U(Q′)− U(Q))δ′(Q′ −Q)

or equivalently,∫

(U(Q′)− U(Q))ρ(Q,Q′)f(Q′)dQ′ =

−∂Q′(ρ2(Q)(U(Q′)− U(Q))f(Q′))|Q′=Q

= −ρ2(Q)U ′(Q)f(Q)

which is equivalent to saying that

(U(Q′)− U(Q))ρ(Q,Q′) = −ρ2(Q)U ′(Q)δ(Q′ −Q)

just as in the classical case. In the general quantum case, we can write

ρ(Q,Q′)f(Q′)dQ′ =∑

n≥0

ρn(Q)∂nQf(Q)

This can be seen by writing

f(Q′) =∑

n

f (n)(Q)(Q′ −Q)n/n!

and then defining

ρ(Q,Q′)(Q′ −Q)ndQ′/n! = ρn(Q)

Now, this is equivalent to saying that the kernel of ρ in the position represen-tation is given by

ρ(Q,Q′) =∑

n

ρn(Q)δ(n)(Q −Q′)

and hence∫

(U(Q′)−U(Q))ρ(Q,Q′)f(Q′)dQ′ =∑

n

ρn(Q)

δ(n)(Q−Q′)(U(Q′)−U(Q))f(Q′)dQ′

=∑

n

ρn(Q)(−1)n∂nQ′((U(Q′)− U(Q))f(Q′))|Q′=Q

=∑

n≥0,1≤k≤nρn(Q)(−1)n

(

n

k

)

U (k)(Q)f (n−k)(Q)

15.7. WIGNER-DISTRIBUTIONS IN QUANTUM MECHANICS 239

Thus,

(U(Q′)− U(Q))ρ(Q,Q′) =

n≥0,1≤k≤nρn(Q)(−1)k

(

n

k

)

U (k)(Q)δ(n−k)(Q′ −Q)

This is the quantum generalization of the classical term

−U(Q)∂P f(Q,P )

which appears in the classical Fokker-Planck equation. Another way to expressthis kernel is to directly write

(U(Q′)− U(Q))ρ(Q,Q′) =∑

n≥1

U (n)(Q)(Q′ −Q)nρ(Q,Q′)/n!

From the Wigner-distribution viewpoint, it is more convenient to write

U(Q′)− U(Q) = U((Q+Q′)/2− (Q−Q′)/2)− U((Q +Q′)/2 + (Q −Q′)/2)

= −∑

k≥0

U (2k+1)((Q +Q′)/2)(Q−Q′)2k+1)/(2k + 1)!22k

Consider a particular term U (n)((Q+Q′)/2)(Q−Q′)nρ(Q,Q′) in this expansion:

U (n)((Q+Q′)/2)(Q−Q′)nρ(Q,Q′) = U (n)((Q+Q′)/2)(Q−Q′)n∫

W ((Q+Q′)/2, P )exp(iP (Q−Q′))dP

= in∫

U (n)((Q +Q′)/2)∂nP (W ((Q+Q′)/2, P ))exp(iP (Q−Q′))dP

and thus

(U(Q′)− U(Q))ρ(Q,Q′) =

−2

exp(iP (Q−Q′))∑

k≥0

(i/2)2k+1((2k+1)!)−1(U (2k+1)((Q+Q′)/2)∂2k+1P W ((Q+Q′)/2, P ))dP

Also note that

ρ,11(Q,Q′) =

∂2Q

W ((Q +Q′)/2, P )exp(iP (Q−Q′))dP

=

((1/4)W,11((Q+Q′)/2, P )−P 2W ((Q+Q′)/2, P )+iPW,1((Q+Q′)/2, P ))exp(iP (Q−Q′))dP

and likewise,

ρ,22(Q,Q′) =

((1/4)W,11((Q+Q′)/2, P )−P 2W ((Q+Q′)/2, P )−iPW,1((Q+Q′)/2, P ))exp(iP (Q−Q′))dP

240CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

Thus, the noiseless quantum Fokker-Planck equation (or equivalently, the Liouville-Schrodinger-Von-Neumann-equation)

iρ′(t) = [(P 2/2m+ U(Q)), ρ(t)]

is equivalent toi∂t(W (t, (Q+Q′)/2, P )) =

(−iP/m)W,1(t, (Q +Q′)/2, P )

+2∑

k≥0

(i/2)2k+1((2k + 1)!)−1(U (2k+1)((Q+Q′)/2)∂2k+1P W (t, (Q+Q′)/2, P ))

or equivalently,i∂tW (t, Q, P ) =

(−iP/m)W,1(t, Q, P ) + U ′(Q)∂PW (t, Q, P )+

+2∑

k≥0

(i/2)2k+1((2k + 1)!)−1(U (2k+1)(Q)∂2k+1P W (t, Q, P ))

This can be expressed in the form of a classical Liouville equation with quantumcorrection terms:

∂tW (t, Q, P ) = (−P/m)∂QW (t, Q, P ) + U ′(Q)∂PW (t, Q, P )

+∑

k≥1

((−1)k/(22k(2k + 1)!))U (2k+1)(Q)∂2k+1P W (t, Q, P )

The last term is the quantum correction term.Remark: To see that the last term is indeed a quantum correction term, we

have to use general units in which Planck’s constant is not set to unity. Thus,we define

W (Q,P ) = C

ρ(Q+ q/2, Q− q/2).exp(−iP q/h)dq

For∫

W (Q,P )dP = ρ(Q,Q), we must set

2πhC = 1

Then,

ρ(Q + q/2, Q− q/2) =

W (Q,P )exp(iP q/h)dP

Thus, with P = −ih∂Q, we get

[P 2/2m, ρ](Q,Q′) = (h2/2m)(−ρ,11(Q,Q′) + ρ,22(Q,Q′))

= (h2/2m)(−∂2Q + ∂2Q′)

W ((Q +Q′)/2, P )exp(iP (Q−Q′)/h)dP

= (−ih/m)

PW,1((Q +Q′)/2, P )exp(iP (Q−Q′)/h)dP

15.7. WIGNER-DISTRIBUTIONS IN QUANTUM MECHANICS 241

Further,

(U(Q)−U(Q′))ρ(Q,Q′) =∑

k≥0

2−2k((2k+1)!)−1U (2k+1)((Q+Q′)/2)(Q−Q′)2k+1ρ(Q,Q′)

=

k

2−2k((2k+1)!)−1(ih)2k+1U (2k+1)((Q+Q′)/2)∂2k+1P W ((Q+Q′)/2, P )exp(iP (Q−Q′)/h)dP

and hence its we get from the Schrodinger equation

ih∂tρ = [H, ρ] = [P 2/2m, ρ] + [U(Q), ρ],

the equation

ih∂tW (t, Q, P ) = (−ih/m)PW,1((Q +Q′)/2, P )

+ihU ′(Q)∂PW (t, Q, P )+ih∑

k≥1

(−1)k2−2kh2k((2k+1)!)−1U (2k+1)(Q)∂2k+1P W (t, Q, P )

or equivalently,

∂tW (t, Q, P ) = (−P/m)∂QW (t, Q, P ) + U ′(Q)∂PW (t, Q, P )

+∑

k≥1

(−h2/4)k((2k + 1)!)−1U (2k+1)(Q)∂2k+1P W (t, Q, P )

which clearly shows that the quantum correction to the Liouville equation is apower series in h2 beginning with the first power of h2. Note that the quantumcorrections involve ∂2k+1

P W (t, Q, P ), k = 1, 2, ... but a diffusion term of the form∂2PW is absent. It will be present only when we include the Lindblad terms forit is these terms that describe the effect of a noisy bath on our quantum system.

Construction of the quantum Fokker-Planck equation for a specific

choice of the Lindblad operator

Here,H = P 2/2m+ U(Q), L = g(Q)

Assume that g is a rea; function. Then, the Schrodinger equation for the mixedstate ρ is given by

∂tρ = −i[H, ρ]− (1/2)θ(ρ)

whereθ(ρ) = L∗Lρ+ ρL∗L− 2LρL∗

Thus, the kernel of this in the position representation is given by

θ(ρ)(Q,Q′) = (g(Q)2 + g(Q′)2 − 2g(Q)g(Q′))ρ(Q,Q′)

= (g(Q)− g(Q′))2ρ(Q,Q′)

242CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

More generally, if

θ(ρ) =∑

k

(L∗kLkρ+ ρL∗

kLk − 2LkρLk)

where

Lk = gk(Q)

are real functions, then

θ(ρ)(Q,Q′) = F (Q,Q′)ρ(Q,Q′)

where

F (Q,Q′) =∑

k

(gk(Q)− gk(Q′))2

We define

G(Q, q) = F (Q + q/2, Q− q/2)

or equivalently,

F (Q,Q′) = G((Q +Q′)/2, Q−Q′)

Then,

θ(ρ)(Q,Q′) = G((Q +Q′)/2, (Q−Q′)ρ(Q,Q′) =

n≥0

Gn((Q+Q′)/2)(Q−Q′)n∫

W ((Q +Q′)/2, P )exp(iP (Q−Q′))dP

=∑

n≥0

Gn((Q+Q′)/2)in∫

∂nPW ((Q+Q′)/2, P )exp(iP (Q−Q′))dP

and this contributes a factor of

−(1/2)∑

n≥0

inGn(Q)∂nPW (Q,P )

= (−1/2)G0(Q)W (Q,P )− (i/2)G1(Q)∂PW (Q,P ) + (1/2)G2(Q)∂2PW (Q,P )

−(1/2)∑

n>2

inGn(Q)∂nPW (Q,P )

The term (1/2)G2(Q)∂2PW (Q,P ) is the quantum analogue of the classical dif-fusion term in the Fokker-Planck equation. Here, we are assuming an expansion

G(Q, q) =∑

n

Gn(Q)qn

ie,

Gn(Q) = n!−1∂nq G(Q, q)|q=0

15.7. WIGNER-DISTRIBUTIONS IN QUANTUM MECHANICS 243

Problems in quantum corrections to classical theories in probabil-

ity theory and in mechanics

Consider the Lindblad term in the density evolution problem:

θ(ρ) = L∗Lρ+ ρL∗L− 2LρL∗)

where

L = g0(Q) + g1(Q)P

We have

L∗ = g0(Q) + P g1(Q)

L∗Lψ(Q) = g0+P g1)(g0+g1P )ψ(Q) = |g0(Q)|2ψ(Q)+P |g1(Q)|2Pψ(Q)+g0g1Pψ(Q)+P g1g0ψ(Q)

Now, defining

|g1(Q)|2 = f1(Q), |g0(Q)|2 = f0(Q),

we get

Pf1(Q)Pψ(Q) = −∂Qf1(Q)ψ′(Q) = −f ′1(Q)ψ′(Q)− f1(Q)ψ′′(Q)

P g1(Q)g0(Q)ψ(Q) = −i(g1g0)′(Q)ψ(Q)− g1(Q)g0Q)ψ′(Q)

Likewise for the other terms. Thus, L∗L has the following kernel in the positionrepresentation:

L∗L(Q,Q′) = −f1(Q)δ′′(Q−Q′) + f2(Q)δ′(Q−Q′) + f3(Q)δ(Q−Q′)

where f1(Q) is real positive and f2, f3 are complex functions. Thus,

L∗Lρ(Q,Q′) = −f1(Q)∂2Qρ(Q,Q′) + f2(Q)∂Qρ(Q,Q

′) + f3(Q)ρ(Q,Q′)

ρL∗L(Q,Q′) =

ρ(Q,Q′′)L∗L(Q′′, Q′)dQ′′ = −∂2Q′(ρ(Q,Q′)f1(Q′))−

∂Q′(ρ(Q,Q′)f2(Q′)) + ρ(Q,Q′)f3(Q

′)

Likewise,

LρL∗(Q,Q′) = g1(Q)g1(Q′)∂Q∂Q′ρ(Q,Q′) + T

where T contains terms of the form a function of Q,Q′ times first order partialderivatives in Q,Q′. Putting all the terms together, we find that tbe Lindbladcontribution to the master equation has the form

(−1/2)θ(ρ)(Q,Q′) = (f1(Q)∂2Q+f1(Q′)∂2Q′ )ρ(Q,Q′)−g1(Q)g1(Q

′)∂Q∂Q′ρ(Q,Q′)+h1(Q,Q′)∂Qρ(Q,Q

′)+

h2(Q,Q′)∂Q′ρ(Q,Q′) + h3(Q,Q

′)ρ(Q,Q′)

244CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

15.8 Motion of an element having both an elec-

tric and a magnetic dipole moment in anexternal electromagnetic field plus a coil

Let m, p denote respectively the magnetic and electric dipole moments of theelement at time t = 0. Let E(t, r), B(t, r) denote the external electric andmagnetic fields. After time t, the element undergoes a rotation S(t) ∈ SO(3)so that its magnetic moment becomes

m(t) = S(t)m,

and its electric dipole moment becomes

p(t) = S(t)p

Assume that after time t, the centre of the element is located at the positionR(t). It should be noted that if n(t) denotes the unit vector along the length ofthe element at time t, then we can write

m(t) = K1n(t), p(t) = K2n(t)

where K1,K2 are constants. Thus, we can write

m(t) = K1S(t)n(0), p(t) = K2S(t)n(0)

The angular velocity tensor Ω(t) in the lab frame can be expressed using

S′(t) = Ω(t)S(t)

or equivalently,Ω(t) = S′(t)S(t)−1 = S′(t)S(t)T

The angular velocity tensor in the rest frame of the element is

Ωr(t) = S(t)−1Ω(t)S(t) = S(t)−1S′(t) = S(t)TS′(t)

The total kinetic energy of the element due to translational moment of its cmand rotational motion around its cm is given by

K(t) = (1/2)Tr(S′(t)JS′(t)T ) +MR′(t)2/2

the interaction potential energy between the magnetic moment of the elementand the magnetic field plus that between the electric dipole moment and theelectric field is given by

V (t) = −(m(t), B(t, R(t)))−(p(t), E(t, R(t)) = −mTS(t)TB(t, R(t))−pTS(t)TE(t, R(t))

and hence taking into account the gravitational potential energy, the total La-grangian of the element is given by

L(R(t), R′(t), S(t), S′(t)) = (1/2)Tr(S′(t)T JS′(t))+MR′(t)2/2+mTS(t)TB(t, R(t))+pTS(t)TE(t, R(t))−MgeT3 R(t)

15.8. MOTIONOF AN ELEMENTHAVING BOTHAN ELECTRICAND AMAGNETIC DIPOLEMOMENT IN AN EXTERNAL ELECTROMAGNETIC FIELD PLUS A COIL245

where e3 is the unit vector along the z direction. While writing the Euler-Lagrange equations of motion from this Lagrangian, we must take into ac-count the constraint S(t)TS(t) = I by adding a Lagrange multiplier term−Tr(Λ(t)(S(t)TS(t)− I)) to the above Lagrangian.

Remark: The magnetic vector potential produced by a current loop Γ car-rying a current I(t) is given by the retarded potential formula in the form of aline integral

A(t, r) = (µ/4π)

Γ

I(t− |r − r′|/c)dr′/|r − r′|

This approximates in the far field zone |r| >> |r′| to∫

Γ

I(t− r/c+ r.r′/c)(1 + r.r′/r)dr′/r

which further approximates to

(I(t− r/c)/r2)

Γ

r.r′dr′ + (I ′(t− r/c)/cr)

Γ

(r.r′)dr′

= (I(t− r/c)/r3)

Γ

r.r′dr′ + (I ′(t− r/c)/cr2)

Γ

r.r′dr′

which can be expressed in terms of the magnetic moment

m(t) = I(t)

Γ

r′ × dr′

as

A(t, r) = (µ/4π)[m(t− r/c)× r/r3 +m′(t− r/c)× r/cr2]

Remark: The interaction energy between a system of charged particles inmotion and a magnetic field is derived from the Hamiltonian term

i

(pi − eiA(t, ri))2/2mi

The interaction term in this expression is

−∑

i

ei(pi, A(t, ri))/mi

However, noting the relationship between momentum and velocity in a magneticfield: pi = mivi+eiA(t, ri), it follows that the interaction energy should be takenas

−∑

i

ei(vi, A(t, ri))

246CHAPTER 15. QUANTUMGATE DESIGN, MAGNETMOTION,QUANTUMFILTERING,WIGNER DISTRIBUTIONS

Now if the system of charged particles spans a small region of space around thepoint r and in this region, the magnetic field does not vary too rapidly, then wecan approximate the above interaction energy term by

U = −∑

i

ei(vi, A(t, r + r′i)) = −∑

i

ei(vi, (r′i,∇)A(t, r))

where r′i = ri − r. Now assuming A to be time independent,

d/dt(r′i, (r′i,∇)A) = (vi, (r

′i,∇)A) + (r′i, (vi,∇)A)

For bounded motion, the time average of the rhs is zero and we get denotingtime averages by < . >,

< (vi, (r′i,∇)A) >= − < (r′i, (vi,∇)A) >

Hence,

< (vi, (r′i,∇)A) >= (1/2(< (vi, (r

′i,∇)A)− (r′i, (vi,∇)A) >)

On the other hand,

(r′i × vi).(nabla×A) = (vi, (r′i,∇)A)− (r′i, (vi,∇)A)

and hence we deduce that

< U >= −∑

i

(eir′i × vi/2).curlA(t, r)

provided that A does not vary too rapidly with time. This formula can beexpressed as

U = −m.Bwhere

m = (1/2)∑

i

eir′i × vi

is the magnetic moment of the system of particles.

Chapter 16

Large deviations forMarkov chains, quantummaster equation, Wignerdistribution for theBelavkin filter

16.1 Large deviation rate function for the em-pirical distribution of a Markov chain

Assume that X(n), n ≥ 0 is a discrete time Markov chain with finite state spaceE = 1, 2, ...,K and with transition probabilities

P (X(n+ 1) = y|X(n) = x) = π(x, y), x, y ∈ E

let µ be the distribution of X(1). The empirical distribution of the chain basedon data collected upto time N is given by

νN = N−1N∑

n=1

δX(n)

Then the logarithmic generating function of νN is given by

ΛN (f) = logEexp(

f(x)dνN (x)) = logE[exp(N−1N∑

n=1

f(X(n)))]

and hence,

ΛN (Nf) = logE[exp(

N∑

n=1

f(X(n)))] =

247

248CHAPTER 16. LARGE DEVIATIONS FORMARKOVCHAINS, QUANTUMMASTER EQUATION,WIGNER DISTRIBUTION FORTHE BELAVKIN FILTER

log(µTπNf (1))

where

πf = ((exp(f(x))π(x, y)))1≤x,y≤K

and

πfu(x) =∑

y

πf (x, y)u(y)

It follows from the spectral theorem for matrices that

limN→∞N−1ΛN (Nf) = log(λ(f))

where λ(f) is the maximum eigenvalue of πf . By the Perron-Frobenius theory,this is real. Equivalently, this fact follows from the above relation since thelimit is real because each term is real and the limit equals the eigenvalue of πfhaving maximum magnitude. Thus if q is a probability distribution on E, thenthe LDP rate function of the process evaluated at q is given by

I(q) = supf(∑

x

f(x)q(x) − logλ(f))

We claim that

I(q) = J(q) = supu>0

x

q(x)log(u(x)/πu(x))

where

πu(x) =∑

y

π(x, y)u(y)

To see this, we first observe that for any f, u, we have

x

f(x)q(x) −∑

x

q(x)log(u(x)/πu(x)) = −∑

x

q(x)log(u(x)/πfu(x))

Now fix f and choose u = u0 so that

πfu0(x) = λ(f)u0(x), x ∈ E

ie u0 is an eigenvector of πf corresponding to its maximum eigenvalue. Then,

x

f(x)q(x) − J(q) = infu(∑

x

f(x)q(x) −∑

x

q(x)log(u(x)/πu(x))

= infu −∑

x

q(x)log(u(x)/πfu(x)) ≤ −∑

x

log(u0(x)/πfu0(x)) = logλ(f)

Thus, for any f ,∑

x

f(x)q(x) − logλ(f) ≤ J(q)

16.2. QUANTUM MASTER EQUATION IN GENERAL RELATIVITY 249

and taking supremum over all f , gives us

I(q) ≤ J(q)

In order to show that

J(q) ≤ I(q)

it suffices to show that for every u, there exists an f such that

x

q(x)log(u(x)/πu(x)) ≤∑

x

f(x)q(x) − logλ(f)

or equivalently, such that

−∑

x

q(x)log(u(x)/πfu(x)) = logλ(f)−−− (a)

To show this fix u and define

f(x) = log(u(x)/πu(x))

Then, clearly

πfu(x) = u(x)

and hence for every n ≥ 1,

πnf u = u

and hence

log(λ(f)) = limnn−1log(µTπnf u) = 0

ie

λ(f) = 1

But then, both sides of (a) equal zero and the proof is complete.

16.2 Quantum Master Equation in general rel-ativity

Assume that the metric of space-time is restricted to have the form

dτ2 = (1 + δ.V0(x))dt2 −

3∑

k=1

(1 + δ.Vk(x))(dxk)2

where V0, Vk, k = 1, 2, 3 are some functions of x = (t, xk, k = 1, 2, 3). Thecorresponding metric is given by

g00 = 1 + δ.V0, gkk = −(1 + δ.Vk), k = 1, 2, 3

250CHAPTER 16. LARGE DEVIATIONS FORMARKOVCHAINS, QUANTUMMASTER EQUATION,WIGNER DISTRIBUTION FORTHE BELAVKIN FILTER

and

gµν = 0, µ 6= ν

We substitute this into the Einstein-Hilbert Lagrangian density

Lg = gµν√−g(ΓαµνΓβαβ − ΓαµβΓ

βνα)

and retain terms only upto O(δ2). Likewise, we compute the Lagrangian densityof the electromagnetic field

Lem = (−1/4)FµνFµν

and retain terms only upto O(δ), ie, terms linear in the Vµ. The resultingLagrangian of the gravitational field interacting with the external em field willhave the form

L = Lg + Lem,int = δ2C1(µναβ)Vµ,αVν,β+

+δ.C2(µαβρσ)VµFαβFρσ

where we regard the covariant components Aµ of the electromagnetic four po-tential as being fundamental and the contravariant components being derivedfrom the covariant components after interaction with the gravitational field.Specifically, with neglect of O(δ2) terms,

A0 = g0µAµ = g00A0 = (1− δ.V0)A0

Ar = grrAr = −(1− δ.Vr)Ar

We regard the field

Bµ = C2(µαβρσ)FαβFρσ

as a new c-number external field with which the gravitational field interacts andthen write down our gravitational field Lagrangian as

L = δ2C1(µναβ)Vµ,αVν,β + δ.VµBµ

Keeping this model in mind, it can be abstracted into the Lagrangian of a set ofKlein-Gordon field interacting with an external c-number field. The canonicalmomenta are

Pµ = ∂L/∂Vµ,0 = 2δ2C1(µν0β)Vν,β

= 2δ2C1(µν00)Vν,0 + 2δ2C1(µν0r)Vν,r

which can be easily solved to express Vµ,0 as a linear combination of Pν andVν,r. In this way our Hamiltonian density has the form

H = C1(µν)PµPν + C2(µνr)PµVν,r

+C3(µνrs)Vµ,rVν,s + C4(µν)VµBν

16.3. BELAVKIN FILTER FORTHEWIGNERDISTIRBUTION FUNCTION251

Abstracting this to one dimension, we get a Hamiltonian of the form

H = P 2/2 + ω2Q2/2 + β(PQ +QP ) + γE(t)Q

where E(t) is an external c-number function of time. Application of a canonicaltransformation then gives us a Hamiltonian of the form

H = P 2/2 + ω2Q2/2 + E(t)(aP + bQ)

If E(t) is taken as quantum noise coming from a bath, then we can derive thefollowing Hudson-Parthasarathy noisy Schrodinger equation for the evolutionoperator U(t):

dU(t) = (−(iH + LL∗/2)dt+ LdA(t)− L∗dA(t)∗)

where

L = aQ+ bP

16.3 Belavkin filter for the Wigner distirbutionfunction

The HPS dynamics is

dU(t) = (−(iH + LL∗/2)dt+ LdA(t)− L∗dA(t)∗)U(t)

where

L = aP + bQ,H = P 2/2 + U(Q)

The coherent state in which the Belavkin filter is designed to operate is

|φ(u) >= exp(−|u|2/2)|e(u) >, u ∈ L2(R+)

The measurement process is

Yo(t) = U(t)∗Yi(t)U(t), Yi(t) = cA(t) + cA(t)∗

Then,

dYo(t) = dYi(t)− jt(cL+ cL∗)dt

where

jt(X) = U(t)∗XU(t)

Note that

cL+ cL∗ = c(aP + bQ) + c(aQ+ bP ) = 2Re(ac)Q + 2Re(bc)P

252CHAPTER 16. LARGE DEVIATIONS FORMARKOVCHAINS, QUANTUMMASTER EQUATION,WIGNER DISTRIBUTION FORTHE BELAVKIN FILTER

Thus, our measurement model corresponds to measuring the observable 2Re(ac)Q(t)+2Re(bc)P (t) plus white Gaussian noise. Let

πt(X) = E(jt(X)|ηo(t)), ηo(t) = σ(Yo(s) : s ≤ t)

Letdπt(X) = Ft(X)dt+Gt(X)dYo(t)

withFt(X), Gt(X) ∈ ηo(t)

LetdC(t) = f(t)C(t)dYo(t), C(0) = 1

The orthogonality principle gives

E[(jt(X)− πt(X))C(t)] = 0

and hence by quantum Ito’s formula and the arbitrariness of the complex valuedfunction f(t), we have

E[(djt(X)− dπt(X))|ηo(t)] = 0,

E[(jt(X)− πt(X))dYo(t)|ηo(t)] + E[(djt(X)− dπt(X))dYo(t)|ηo(t)] = 0

Note that

djt(X) = jt(θ0(X))dt+ jt(θ1(X))dA(t) + jt(θ2(X))dA(t)∗

whereθ0(X) = i[H,X ]− (1/2)(LL∗X +XLL∗ − 2LXL∗),

θ1(X) = −LX +XL = [X,L], θ2(X) = L∗X −XL∗ = [L∗, X ]

Then the above conditions give

πt(θ0(X)) + πt(θ1(X))u(t) + πt(θ2(X))u(t) =

Ft(X) +Gt(X)(cu(t) + cu(t)− πt(cL+ cL∗))

and−πt(X(cL+ cL∗)) + πt(X)πt(cL+ cL∗) + πt(θ1(X))c

−Gt(X)|c|2 = 0

This second equation simplifies to

Gt(X) = |c|−2(−πt(cXL∗ + cLX) + πt(cL+ cL∗)πt(X))

Then,dπt(X) = (πt(θ0(X)) + πt(θ1(X))u(t) + πt(θ2(X))u(t))dt

+Gt(X)(dYo(t) + πt(cL+ cL∗ − cu(t)− cu(t))dt)

16.3. BELAVKIN FILTER FORTHEWIGNERDISTIRBUTION FUNCTION253

In the special case when c = −1, we get

dπt(X) = (πt(θ0(X)) + πt(θ1(X))u(t) + πt(θ2(X))u(t))dt

+(πt(XL∗ + LX)− πt(L+ L∗)πt(X))(dYo(t)− (πt(L+ L∗)− 2Re(u(t)))dt)

Consider the simplified case when u = 0, ie, filtering is carried out in thevacuum coherent state. Then, the above Belavkin filter simplifies to

dπt(X) = πt(θ0(X))dt+(πt(XL∗+LX)−πt(L+L∗)πt(X))(dYo(t)−πt(L+L∗)dt)

Equivalently in the conditional density domain, the dual of the above equationgives the Belavkin filter for the conditional density

ρ′(t) = θ∗0(ρ(t))dt+(L∗ρ(t)+ρ(t)L−Tr(ρ(t)(L+L∗))ρ(t))(dYo(t)−Tr(ρ(t)(L+L∗))dt)

whereθ∗0(ρ) = −i[H, ρ]− (1/2)(LL∗ρ+ ρLL∗ − 2L∗ρL)

Now let us takeL = aQ+ bP, a, b ∈ C

andH = P 2/2 + U(Q)

and translate this Belavkin equation to the Wigner-distribution domain andthen compare the resulting differential equation forW with the Kushner-Kallianpurfilter for the classical conditional density. We write

ρ(t, Q,Q′) =

W (t, (Q+Q′)/2, P )exp(iP (Q−Q′))dP

Note that the Kushner-Kallianpur filter for the corresponding classical proba-bilistic problem

dQ = Pdt, dP = −U ′(Q)dt− γPdt+ σdB(t),

dY (t) = (2Re(a)Q(t) + 2Re(b)P (t))dt+ σdV (t)

where B, V are independent identical Brownian motion processes is given by

dp(t, Q, P ) = L∗p(t, Q, P )+((αQ+βP )p(t, Q, P )−(

(αQ+βP )p(t, Q, P )dQdP )p(t, Q, P ))(dY (t)−dt∫

(αQ+βP )p(t, Q, P )dQdP )

whereα = 2Re(a), β = 2Re(b)

whereL∗p = −P∂Qp+ U ′(Q)∂P p+ γ.∂P (Pf) + (σ2/2)∂2P p

254CHAPTER 16. LARGE DEVIATIONS FORMARKOVCHAINS, QUANTUMMASTER EQUATION,WIGNER DISTRIBUTION FORTHE BELAVKIN FILTER

Let us see how this Kushner-Kallianpur equation translates in the quantum caseusing the Wigner distribution W in place of p. As noted earlier, the equation

ρ′(t) = −i[H, ρ(t)]

is equivalent to

∂tW (t, Q, P ) = (−P/m)∂QW (t, Q, P ) + U ′(Q)∂PW (t, Q, P )

+∑

k≥1

(−h2/4)k((2k + 1)!)−1U (2k+1)(Q)∂2k+1P W (t, Q, P )−−− (a)

while the Lindblad correction term to (a) is (−1/2) times

LL∗ρ+ ρLL∗ − 2L∗ρL

= (aQ+ bP )(aQ+ bP )ρ+ ρ(aQ+ bP )(aQ+ bP )− 2(aQ+ bP )ρ(aQ + bP )

which in the position representation has a kernel of the form

−|b|2(∂2Q + ∂2Q′ + 2∂Q∂Q′)− |a|2(Q −Q′)2

+(c1Q+ c2Q′ + c3)∂Q + (c4Q+ c5Q

′ + c6)∂Q′ ]ρ(t, Q,Q′)

This gives a Lindblad correction to (a) of the form

(σ21/2)∂

2QW (t, Q, P ) + (σ2

2/2)∂2PW (t, Q, P )

+d1∂Q∂PW (t, Q, P ) + (d3Q+ d4P + d5)∂QW (t, Q, P )

Thus, the master/Lindblad equation (in the absence of measurements) has theform

∂tW (t, Q, P ) = (−P/m)∂QW (t, Q, P ) + U ′(Q)∂PW (t, Q, P )

+∑

k≥1

(−h2/4)k((2k + 1)!)−1U (2k+1)(Q)∂2k+1P W (t, Q, P )

+(σ21/2)∂

2QW (t, Q, P ) + (σ2

2/2)∂2PW (t, Q, P )

+d1∂Q∂PW (t, Q, P ) + (d3Q+ d4P + d5)∂QW (t, Q, P )

Remark: Suppose b = 0 and a is replaced by a/h. Then, taking into accountthe fact that in general units, the factor exp(iP (Q −Q′)) must be replaced byexp(iP (Q−Q′))/h) in the expression for ρ(t, Q,Q′) in terms of W (t, Q, P ), weget the Lindblad correction as

(|a|2/2)(Q−Q′)2ρ(t, Q,Q′)

or equivalently in the Wigner-distribution domain as

(|a|2/2)∂2PW (t, Q, P )

16.3. BELAVKIN FILTER FORTHEWIGNERDISTIRBUTION FUNCTION255

and the master equation simplifies to

∂tW (t, Q, P ) = (−P/m)∂QW (t, Q, P )+U ′(Q)∂PW (t, Q, P )+(|a|2/2)∂2PW (t, Q, P )

+∑

k≥1

(−h2/4)k((2k + 1)!)−1U (2k+1)(Q)∂2k+1P W (t, Q, P )

This equation clearly displays the classical terms and the quantum correctionterms.

Exercise:Now express the terms L∗ρ(t), ρ(t)L∗ and Tr(ρ(t)(L+L∗) in termsof the Wigner distribution for ρ(t) assuming L = aQ + bP and complete theformulation of the Belavkin filter in terms of the Wigner distribution. Comparewith the corresponding classical Kushner-Kallianpur filter for the conditionaldensity of (Q(t), P (t)) given noisy measurements of aQ(s) + bP (s), s ≤ t byidentifying the precise form of the quantum correction terms.

256CHAPTER 16. LARGE DEVIATIONS FORMARKOVCHAINS, QUANTUMMASTER EQUATION,WIGNER DISTRIBUTION FORTHE BELAVKIN FILTER

Chapter 17

String and field theory withlarge deviations,stochasticalgorithm convergence

17.1 String theoretic corrections to field theo-ries

LetXµ(τ, σ) = xµ + pµτ − i

n6=0

aµ(n)exp(in(τ − σ))/n

be a string field whereaµ(−n) = aµ∗(n)

and[aµ(n), aν(m)] = ηµνδ(n+m)

When this string interacts with an external c-number gauge field described bythe antisymmetric potentials

Bµν(X(τ, σ))

The total string Lagrangian is given by

L = (1/2)ηαβXµ,αXµ,β − (1/2)Bµνǫ

αβXµ,αX

ν,β

The string equations are given by

ηαβXµ,αβ = −ǫαβ(Bµν(X)Xν,β),α) = −ǫαβBµν,ρ(X)Xρ

,αXν,β

This is the string theoretic version of the following point particle theory for acharged particle in an electromagnetic field:

MXµ,ττ = q.Fµν(X)Xν,τ

257

258CHAPTER 17. STRING AND FIELD THEORYWITH LARGE DEVIATIONS,STOCHASTIC ALGORITHMCONVERGENCE

In the case when the field Bµν,ρ is a constant, this gives corrections to the stringfield that are quadratic functions of the aµ(n):

Xµ = X(0)µ +X(1)

µ

whereX(0)µ = xµ + pµτ − i

n6=0

aµ(n)exp(in(τ − σ))/n

−i∑

n6=0

bµ(n).exp(in(τ + σ))/n

andηαβX

(1)µ,αβ = −ǫαβBµν,ρ(X)X(0)ρ

,α X(0)ν,β

= ǫαβBµν,ρ(aρ(n)bν(m)uαvβexp(i(n(τ − σ) +m(τ + σ)))

+bρ(n)aν(m)vαuβexp(i(n(τ + σ) +m(τ − σ))))

where(uα)α=0,1 = (1,−1), (vα)α=0,1 = (1, 1)

17.2 Large deviations problems in string theory

Consider first the motion of a point particle with Poisson noise. Its equation ofmotion is

dX(t) = V (t)dt, dV (t) = −γV (t)dt − U ′(X(t))dt+ σǫdN(t)

The rate function of the Poisson process is computed as follows. Let λ(ǫ) be themean arrival rate of the Poisson process. Then

Mǫ(f) = E[exp(ǫ

∫ T

0

f(t)dN(t))] =

exp((λ/ǫ)

∫ T

0

(exp(ǫf(t))− 1)dt)

Thus the Gartner-Ellis logarithmic moment generating function of the family ofprocesses ǫ.N(.) over the time interval [0, T ] is given by

Λ(f) = ǫ.log(Mǫ(f/ǫ)) = λ

∫ T

0

(exp(f(t))− 1)dt

So the rate function of this family of processes is given by

IT (ξ) = supf (

∫ T

0

f(t)ξ′(t)dt− Λ(f))

17.2. LARGE DEVIATIONS PROBLEMS IN STRING THEORY 259

=

∫ T

0

supx(ξ′(t)x− λ(exp(x) − 1))dt =

∫ T

0

η(ξ′(t))dt

whereη(y) = supx(xy − λ(exp(x) − 1))

By the contraction principle, the rate function of the process X(t) : 0 ≤ t ≤ T is given by

JT (X) =

∫ T

0

η(X ′′(t) + γX ′(t) + U ′(X(t)))dt

assuming σ = 1.Application to control: let the potential U(x|θ) have some control parameters

θ. We choose θ so thatinf‖X−Xd‖>δJT (X)

is as large as possible. This would guarantee that the probability of deviationof X from desired noiseless trajectory Xd obtained with θ = θ0 by an amountmore than δ is as small as possible. To proceed further, we write

e(t) = X(t)−Xd(t)

and then with θ = θ0 + δθ,

η(X ′′ + γX ′ + U ′(X |θ)) = η(e′′ + γe′ + U ′(X |θ)− U ′(Xd|θ0))

≈ η(e′′ + γe′ + U ′′(Xd|θ0)e+ U1(Xd|θ0)T δθ + δθTU2(Xd|θ0)δθ)where

U1(Xd|θ0) = ∂θU′′(Xd|θ0),

U2(Xd|θ0) = (1/2)∂θ∂Tθ U

′′(Xd|θ0)In particular, consider the string field equations in the presence of a Poissonsource:

∂20Xµ − ∂21X

µ = fµk (τ, σ)dNk(τ)/dτ

Hereσ0 = τ, σ1 = σ

By writing down the Fourier series expansion

Xµ(τ, σ) =∑

n

aµ(n)exp(in(τ − σ)) + δXµ(τ, σ)

where δXµ is the Poisson contribution, evaluate the rate function of the fieldδXµ and hence calculate the approximate probability that the quantum averageof the string field in a coherent state will deviate from its unforced value by anamount more than a threshold over a given time duration with the spatial regionextending over the entire string.

260CHAPTER 17. STRING AND FIELD THEORYWITH LARGE DEVIATIONS,STOCHASTIC ALGORITHMCONVERGENCE

17.3 Convergence analysis of the LMS algorithm

X(n) ∈ RN , d(n), n ∈ Z

are jointly iid Gaussian processes, ie, (X(n), d(n)), n ∈ Z is a Gaussian processwith zero mean. Let

E(X(n)X(n)T ) = R,E(X(n)d(n)) = r

The LMS algorithm is

h(n+ 1) = h(n)− µ.∂h(n)(d(n)− h(n)TX(n))2

= (I − 2µX(n)X(n)T )h(n) + 2µd(n)X(n)

h(n) is a function of h(0), (X(k), d(k)), k ≤ n − 1 and is thus independent of(X(n), d(n)). Then taking means with λ(n) = E(h(n)), we get

λ(n+ 1) = (I − 2µR)λ(n) + 2µr

and hence

λ(n) = (I − 2µR)nλ(0) + 2µn−1∑

k=0

(I − 2µR)n−k−12µr

= (I − 2µR)nλ(0) + (I − (I − 2µR)n)h0

whereh0 = R−1r

is the optimal Wiener solution. It follows that if all the eigenvalues of I − 2µRare smaller than unity in magnitude, then

limn→∞λ(n) = h0

However, we shall show that Cov(h(n)) does not converge to zero. It convergesto a non-zero matrix. We write

δh(n) = h(n)− λ(n)

and then get

λ(n+ 1) + δh(n+ 1) = (I − 2µX(n)X(n)T )(λ(n) + δh(n)) + 2µd(n)X(n)

or equivalently,

δh(n+1) = ((I−2µX(n)X(n)T )−(I−2µR))λ(n)+(I−2µX(n)X(n)T )δh(n)+2µ(d(n)X(n)−r)

= −2µ(X(n)X(n)T −R)λ(n) + (I − 2µX(n)X(n)T )δh(n) + 2µ(d(n)X(n)− r)

and henceE(δh(n+ 1)⊗ δh(n+ 1)) =

17.4. STRING INTERACTINGWITH SCALAR FIELD, GAUGE FIELD ANDAGRAVITATIONAL FIELD261

4µ2E[(X(n)X(n)T −R)⊗ (X(n)X(n)T −R)](λ(n) ⊗ λ(n))

+E[(I − 2µX(n)X(n)T )⊗ (I − 2µX(n)X(n)T )]E(δh(n) ⊗ δh(n))

+4µ2E[(d(n)X(n)− r) ⊗ (d(n)X(n)− r)]

−4µ2(E[(X(n)X(n)T−R)⊗(d(n)X(n)−r)]+E[d(n)X(n)−r)⊗(X(n)X(n)T−R)])λ(n)Note that so far no assumption regarding Gaussianity of the process (d(n), X(n))has been made. We’ve only assumed that this process is iid. Now making theGaussianity assumption gives us

E(Xi(n)Xj(n)Xk(n)Xm(n)) = RijRkm + RikRjm +RimRjk

Exercise: Using this assumption, evaluate

limn→∞E(δh(n)δh(n)T )

Note that in the general non-Gaussian iid case, in case that E(δh(n) ⊗ δh(n))converges, say to V , then the above equation implies that

V = (I − E[(I − 2µX(0)X(0)T )⊗ (I − 2µX(0)X(0)T )])−1

.[4µ2E[(X(0)X(0)T −R)⊗ (X(0)X(0)T −R)](h0 ⊗ h0)

+4µ2E[(d(0)X(0)− r) ⊗ (d(0)X(0)− r)]

−4µ2(E[(X(0)X(0)T−R)⊗(d(0)X(0)−r)]+E[d(0)X(0)−r)⊗(X(0)X(0)T−R)])h0]It should be noted that a sufficient condition for this convergence to occur isthat all the eigenvalues of the real symmetric matrix

I − E[(I − 2µX(0)X(0)T )⊗ (I − 2µX(0)X(0)T )]

be smaller than unity in magnitude.

17.4 String interacting with scalar field, gaugefield and a gravitational field

The total action is

S[X,Φ, g] =

(1/2)ηαβgµν(X)Xµ,αX

ν +,β d2σ

+

Φ(X)d2σ +

ǫαβXµ,αX

ν,βBµν(X)d2σ

The string field equations are

−ηαβ(gµν(X)Xν,β),α +Φ,µ(X)− ǫαβBµν,ρ(X)Xρ

,αXν,β = 0

262CHAPTER 17. STRING AND FIELD THEORYWITH LARGE DEVIATIONS,STOCHASTIC ALGORITHMCONVERGENCE

In flat space-time, gµν = ηµν and the approximate string equations simplify to

ηαβXµ,αβ = −Φ,µ(X) + ǫαβBµν,ρ(X)Xρ,αX

ν,β

The propagator of the string

Dµν(σ|σ′) =< T (Xµ(σ)Xν(σ′)) >

thus depends on Φ, Bµν,ρ. Let us evaluate this propagator approximately:

∂0Dµν(σ|σ′) = δ(σ0 − σ0′) < [Xµ(σ), Xν(σ′)] >

+ < T (Xµ,0(σ)Xν(σ′)) >

=< T (Xµ,0(σ)Xν(σ′)) >

Further,∂20Dµν(σ|σ′) = δ(σ0 − σ0′) < [Xµ,0(σ), Xν(σ

′)] >

+ < T (Xµ,00(σ)Xν(σ′)) >

= −iδ2(σ − σ′) + ∂21 < T (Xµ(σ)Xν(σ′)) >

+ < T (−Φ,µ(X(σ)) + ǫαβBµγ,ρ(X(σ))Xρ,α(σ)X

γ,β(σ)Xν(σ

′)) >

Thus our exact string propagator equation in the presence of the scalar andgauge fields becomes

Dµν(σ|σ′) = −iδ2(σ − σ′)

− < T (Φ,µ(X(σ))Xν(σ′)) > +

ǫαβ < T (Bµγ,ρ(X(σ))Xρ,α(σ)X

γ,β(σ)Xν(σ

′)) >

We now make approximations by expanding the scalar and gauge fields aroundthe centre of of the string position xµ. This approximation gives

< T (Φ,µ(X(σ))Xν(σ′)) >=

Φ,µρ(x) < T (Xρ(σ)Xν(σ′)) >

= Φ,µρ(x)D(0)ρν (σ|σ′)

whereD(0)µν is the unperturbed string propagator and is given by ηµν ln(|σ−σ′|)where

|σ − σ′| =√

(σ0 − σ0′)2 − (σ1 − σ1′)2

Likewise,< T (Bµγ,ρ(X(σ))Xρ

,α(σ)Xγ,β(σ)Xν(σ

′)) >

= Bµγ,ρδ(x)(< T (Xδ(σ)Xρ,α(σ)X

γ,β(σ)Xν(σ

′)) >

Since three out of four of the arguments of X or its partial derivatives in thislast expression have the same arguments, it is clear that it cannot contribute

17.4. STRING INTERACTINGWITH SCALAR FIELD, GAUGE FIELD ANDAGRAVITATIONAL FIELD263

anything to the propagator except singularities. So we neglect this term. Thusour corrected propagator equation becomes

Dµν(σ|σ′) = −iδ2(σ − σ′)

−Φ,µρ(x)D(0)ρν (σ|σ′)

Note that this is a local propagator equation since it depends on the coordinatesx of the centre of the string. We solve this to get

Dµν(σ − σ′) = D(0)µν (σ − σ′)− Φ,µν(x)

D(0)(σ − σ′′)D(0)(σ′′ − σ′)d2σ′′

WhereD(0)(σ) = ln|σ|

Now we raise the question about what field equations should be satisfied by Bµνand Φ so that the string action has conformal invariance. It For solving thisproblem, we must replace the above action by

S[X,Φ, g] =

(1/2)ηαβexp(φ(σ))gµν (X)Xµ,αX

ν,βd

+

exp(φ(σ))Φ(X)d2σ +

ǫαβexp(φ(σ))Xµ,αX

ν,βBµν(X)d2σ

since this corresponds to using the string sheet metric exp(φ(σ))ηαβ and cor-respondingly replacing the string sheet area element d2σ by the invariant areaelement exp(φ(σ))d2σ. For small φ, conformal invariance then requires thequantum average of the above action to be independent of φ. Taking into ac-count the larger singularity involved in the product of four string field terms atthe same point on the string world sheet as compared with the product of twostring field terms, this means that the quantity

ηαβgµν,ργ(x) < Xρ(σ)Xγ(σ)Xµ,α(σ)X

ν,β(σ) >

+Φ,µν(x) < Xµ(σ)Xν(σ) >

+ǫαβBµν,ρ(x) < Xρ(σ)Xµ,α(σ)X

ν,β(σ) >

+ǫαβBµν,ργ(x) < Xρ(σ)Xγ(σ)Xµ,α(σ)X

ν,β(σ) >

must vanish.

Another way to approach this problem is to start with the string equationsof motion:

Xµ = −Φ,µ(X) + ǫαβBµν,ρ(X)Xρ,αX

ν,β

and express it in terms of perturbation theory by writing

Xµ = xµ + δXµ = xµ + Y µ

264CHAPTER 17. STRING AND FIELD THEORYWITH LARGE DEVIATIONS,STOCHASTIC ALGORITHMCONVERGENCE

asY µ = −Φ,µν(x)Y

ν + ǫαβ(Bµν,ρ(x) +Bµν,ργ(x)Yγ)Y ρ,αY

ν , β

giving

Yµ = Zµ−Φ,µν(x)D(0).Zν+ǫαβBµν,ρ(x)D

(0).(Zρ,αZν,β)+ǫαβBµν,ργ(x)D

(0).(ZγZρ,αZν,β)

where x + Z is the string field in the absence of any interactions. With thisexpansion in mind, we then require that

ηαβgµν,ργ(x) < Y ρ(σ)Y γ(σ)Y µ,α(σ)Yν,β(σ) >

+Φ,µν(x) < Y µ(σ)Y ν(σ) >

+ǫαβBµν,ρ(x) < Y ρ(σ)Y µ,α(σ)Yν,β(σ) >

+ǫαβBµν,ργ(x) < Y ρ(σ)Y γ(σ)Y µ,α(σ)Yν,β(σ) >

should vanish.

Another problem in information theory: Let pi, i = 1, 2, ..., N be a prob-ability distribution. Let p↓i , i = 1, 2, ..., N denote the permutation of theseprobabilities in decreasing order. Consider the set

E = i : pi > eb

Clearly the size |E| of this set satisfies for 0 ≤ s ≤ 1

|E| ≤∑

i

(pi/eb)1−s = exp(−(1− s)b+ ψ(s)) = exp(sR) ≤ exp(R)

whereψ(s) = log

i

p1−si

and whereb = b(s,R) = (ψ(s)− sR)/(1− s)

and it is clear that

lims→0ψ(s)/s = H(p) = −∑

i

pilog(pi)

Hence, if R > H(s), by choosing s positive and sufficiently close to zero,it follows that if p(n) denotes the product probability measure of the pi on1, 2, ..., Nn, then with

bn(s,R) = n(ψ(s)− sR))/(1− s)

it follows that bn(s,R) → −∞ and hence exp(bn(s,R)) → 0 and further,

|p(n) > exp(bn(s,R)) ≤ exp(nsR) ≤ exp(nR)

17.4. STRING INTERACTINGWITH SCALAR FIELD, GAUGE FIELD ANDAGRAVITATIONAL FIELD265

Now define

P (p, L) =L∑

i=1

p↓i

Then, it is clear that since

|p > exp(b(s,R))| ≤ exp(R)

for all s ∈ [0, 1], we have

1− P (p, exp(R)) ≤ exp(b(s,R))

and this implies for the product probability distribution,

1−P (p(n), exp(nR)) ≤ exp(nb(s,R)) = exp(n(ψ(s)− sR)/(1− s)) → 0, n→ ∞

provided that s is chosen sufficiently close to zero and that R > H(p). In otherwords, we have proved that if R > H(p), then

P (p(n), exp(nR)) → 1

Motion of a string field in the presence of an external gauge field with poten-tials that are linear in the space-time coordinates. Evaluation of the low energyfield action in terms of the statistics of the string field.

L = (1/2)∂αXµ∂αXµ + ǫαβBµν(X)∂αXµ∂βX

ν

The equations of motion are

Xµ = ∂α∂αXµ = ǫαβBµν,ρ(X)∂βXν)∂αX

ρ

Writing

Bµν,ρ(X) =∑

An(µνρ|µ1, ..., µn)Xµ1Xµ2 ...Xµn

we find that for the expansion

Xµ = xµ − i∑

n

a(n)µexp(inuaσa)/n− ib(n)µexp(invaσa)/n

where(ua) = (1,−1), (va) = (1, 1)

we find thatǫαβBµν,ρ(X)∂βX

ν)∂αXρ =

r,s,n1,...,nr,m1,...,ms,l1,l2

ǫαβ(−i)n1+...+nr+m1+...+msCrs(µνρ|µ1, ..., µr, ν1, ..., νs)[Πrk=1(a

µk(nk)/nk)Πsk=1(b

νk(mk)/mk)]

(aν(l1)bρ(l2)exp(i(l1u+l2v).σ)−bν(l1)aρ(l2))exp(i(l1v+l2u).σ))uT ǫv.exp(i((n1+...+nr)u+(m1+...+ms)v).σ)

266CHAPTER 17. STRING AND FIELD THEORYWITH LARGE DEVIATIONS,STOCHASTIC ALGORITHMCONVERGENCE

where the coefficients Crs(.) are determined in terms of An(µνρ|µ1, ...µn). Inparticular if Bµνρ are constants, then this gauge term contributes only quadraticterms in the creation and annihilation operators and hence can be expressed as

Bµνρ =∑

l1,l2

Bµν,ρ(uT ǫv)(aν(l1)b

ρ(l2)exp(i(l1u+l2v).σ)−bν(l1)aρ(l2))exp(i(l1v+l2u).σ)

In short, in this special case, we can express the gauge field interaction correctedstring field upto first order perturbation terms as

Xµ(σ) = xµ +∑

n

aµ(n)fn(σ) + Fµνρ∑

n,m

aν(n)aρ(m)gnm(σ)

= xµ + δXµ(σ)

where[aµ(n), aν(m)] = ηµνδ(n+m)

Now consider a point field φ(x). Assume that its Lagrangian density is givenby L(φ(x), φ,µ(x)). The average action of this string field when string theoreticcorrections are taken into account in a coherent state |φ(u) > of the string isgiven by

< φ(u)|∫

L(φ(x + δX(σ)), φ,µ(x+ δX(σ)))dσd4x|φ(u) >

We can evaluate this action upto quadratic terms in the string perturbation byusing the Taylor approximations

φ(x + δX(σ)) = φ(x) + φ,µ(x)δXµ(σ) + (1/2)φ,µν(x)δX

µ(σ)δXν(σ)

φ,µ(x)(x + δX(σ)) = φ,µ(x) + φ,µν(x)δXµ(σ) + (1/2)φ,µν(x)δX

µ(σ)δXν(σ)

and then us the following identities:

< φ(u)|δXµ(σ)|φ(u) >=∑

n

< φ(u)|aµ(n)|φ(u) > fn(σ) + Fµνρ∑

n,m

< φ(u)|aν(n)aρ(m)|φ(u) > gnm(σ)

< φ(u)|δXµ(σ)δXν(σ)|φ(u) >is expressible as a linear combination of

< φ(u)|aµ1 (n1)aµ2(n2)|φ(u) >,

< φ(u)|aµ1 (n1)aµ2(n2)a

µ3 (n3)|φ(u) >,and

< φ(u)|aµ1 (n1)aµ2(n2)a

µ3(n3)aµ4(n4)|φ(u) >

Finally,< φ(u)|aµ(n)|φ(u) >= uµ(n), uµ(−n) = uµ(n)

17.4. STRING INTERACTINGWITH SCALAR FIELD, GAUGE FIELD ANDAGRAVITATIONAL FIELD267

For n+m 6= 0,

< φ(u)|aµ(n)aν(m)|φ(u) >= uµ(n)uν(m)

For n > 0< φ(u)|aµ(n)aν(−n)|φ(u) >= ηµν + uµ(n)uν(−n)

and so on.

Problems on the Fokker-Planck equation

[1] Derive the quantum Fokker-Planck equation for the Wigner distributionof the position-momentum pair of a particle moving in a potential with quantumnoise. The HPS evolution equation is

dU(t) = (−(iH + LL∗/2)dt+ LdA− L∗dA∗)U(t)

so that the master equation obtained as the equation for ρs(t) where

ρs(t) = Tr2(U(t)(ρs(0)⊗ |φ(0 >< φ(0)|)U(t)∗)

is given by

ρ′s(t) = −i[H, ρs(t)]− (1/2)(LL∗ρs(t) + ρs(t)LL∗ − 2L∗ρs(t)L)

[2] Given a master equation of the above type, how would you adapt theparameters θk(t), k = 1, 2, ..., d where

H = H0 +

d∑

k=1

θk(t)Vk, L = L0 +

d∑

k=1

θk(t)Lk

so that the probability density ρ(t, q, q) in the position representation tracks agiven pdf p0(t, q) ?

hint: Express the master equation as

ρ′(t) = T0(ρ(t)) +∑

k

θk(t)Tk(ρ(t)) +∑

k,m

θk(t)θm(t)Tkm(ρ(t))

where T0, Tk, Tkm are linear operators on the space of mixed states in the un-derlying Hilbert space. It follows then that

A remark about supersymmetry, large deviations and stochastic control:Suppose we are given a set of Bosonic and Fermionic fields and a Lagrangianfunctional of these fields such that this Lagrangian functional is invariant underglobal supersymmetry transformations. Suppose we now break the supersym-metry of this Lagrangian by adding terms to it that can be represented asa coupling between stochastic c-number fields and the Bosonic and Fermionic

268CHAPTER 17. STRING AND FIELD THEORYWITH LARGE DEVIATIONS,STOCHASTIC ALGORITHMCONVERGENCE

fields appearing in the original Lagrangian. Theh, we write down the equa-tions of motion corresponding to this stochastically perturbed Lagrangian. Theequations of motion will be stochastic partial differential equations which arenon-supersymmetric because of the stochastic perturbations. Now we calculatethe change in the modified action functional under an infinitesimal supersym-metry transformation and require that this change be bounded by a certainsmall number in the sense of expected value of modulus square calculated byassuming that the Bosonic and Fermionic fields are solutions to the unperturbedsupersymmetric equations of motion. This condition will impose a restrictionon the amplitude of the c-number stochastic fields. Keeping this restriction,we then try to calculate using the theory of large deviations, the probabilitythat the solution of the stochasically perturbed field equations which are non-supersymmetric will deviate from the supersymmetric unperturbed solutions byan amount more than a certain threshold. We then repeat this calculation byadding error feedback control terms to the perturbed dynamical equations andchoose the the control coefficients so that the resulting deviation probability isminimized. This in effect, amounts to attempting to restore supersymmetry bydynamic feedback control when the supersymmetry has been broken.