arXiv:1702.07906v2 [cond-mat.stat-mech] 6 Jul 2017

19
arXiv:1702.07906v2 [cond-mat.stat-mech] 6 Jul 2017 Nonequilibrium thermodynamics and information theory: Basic concepts and relaxing dynamics Bernhard Altaner Complex Systems and Statistical Mechanics, Physics and Materials Science Research Unit, University of Luxembourg, Luxembourg Thermodynamics is based on the notions of energy and entropy. While energy is the elementary quantity governing physical dynamics, entropy is the fundamental concept in information theory. In this work, starting from first principles, we give a detailed didactic account on the relations between energy and entropy and thus physics and information theory. We show that thermodynamic process inequalities, like the Second Law, are equivalent to the requirement that an effective description for physical dynamics is strongly relaxing. From the perspective of information theory, strongly relaxing dynamics govern the irreversible convergence of a statistical ensemble towards the maximally non- commital probability distribution that is compatible with thermodynamic equilibrium parameters. In particular, Markov processes that converge to a thermodynamic equilibrium state are strongly relaxing. Our framework generalizes previous results to arbitrary open and driven systems, yielding novel thermodynamic bounds for idealized and real processes. A. Introduction The basic connections between equilibrium thermody- namics and probability predate modern information the- ory and go back to Gibbs, Boltzmann and Einstein [1]. Shannon’s information theory has formalized the uncer- tainty associated to a probabilistic description in an ax- iomatic way [2]. The basic ideas of how thermodynam- ics and information theory are connected have enabled a modern epistemic perspective on thermodynamics [3– 6]. Extending this perspective to consistently include modern aspects of nonequilibrium thermodynamics is an active field of research [7–15]. The purpose of this article is to describe a universal self-consistent approach to modern nonequilibrium ther- modynamics based on the rationale of information the- ory. In order to be clear about our scope, we first recall the three elementary basic notions of nonequilibrium pro- cesses: 1. Irreversible (transient) relaxation of a non- equilibrium system to an equilibrium state specified by thermodynamic parameters. 2. Driving processes where these thermodynamics pa- rameters are changed. 3. Non-equilibrium steady states (NESS). Here, we are concerned with the first two aspects of nonequilibrium thermodynamics. The third aspect is not covered. The reason for this is that the thermodynamic consistency of NESS requires a clear notion of local equi- librium (and local detailed balance), which builds upon but ultimately goes beyond this work. Importantly, the relation between non-equilibrium and equilibrium thermodynamics is not that of, for example, the study of elephant and non-elephant metabolism. The concept of an elephant has nothing to do with the con- cept of metabolism. In contrast, thermodynamics always refers back to equilibrium, where there is a clear connec- tion between energy and entropy. In his autobiographical notes, Einstein expressed his high regard for thermodynamics as a universal the- ory [16]: “A theory is the more impressive the greater the simplicity of its premises, the more different kinds of things it relates, and the more extended its area of ap- plicability. Therefore the deep impression that classical thermodynamics made upon me. It is the only physi- cal theory of universal content which I am convinced will never be overthrown, within the framework of applicabil- ity of its basic concepts.” Here, we show that even beyond equilibrium, energy and entropy constitute these basic concepts. In particu- lar, we demonstrate how nonequilibrium thermodynam- ics emerges as a universal epistemic theory that connects the evolution of energy and entropy to irreversibility and thus to the loss of information in effective physical de- scriptions. 1. Information theory and statistical mechanics In 1957, Jaynes published a controversial article enti- tled “Information Theory and Statistical Mechanics” [3]. In that work, he argues that statistical mechanics can be interpreted as a theory of statistical inference that en- ables the construction of probabilistic descriptions on the basis of partial knowledge. He then concludes that the emerging “subjective” statistical mechanics should not be regarded as a physical theory. In particular, according to his view, statistical mechanics does not rely on additional physical dynamical or statistical assumptions. In order to relate the present work to Jaynes’ ideas we briefly re- visit his argument now, while a more detailed discussion will follow later. The connection between a statistical description of a system and a quantitative notion of uncertainty has been formalized by Shannon’s pioneering work on information

Transcript of arXiv:1702.07906v2 [cond-mat.stat-mech] 6 Jul 2017

arX

iv:1

702.

0790

6v2

[co

nd-m

at.s

tat-

mec

h] 6

Jul

201

7

Nonequilibrium thermodynamics and information theory:

Basic concepts and relaxing dynamics

Bernhard AltanerComplex Systems and Statistical Mechanics, Physics and Materials

Science Research Unit, University of Luxembourg, Luxembourg

Thermodynamics is based on the notions of energy and entropy. While energy is the elementaryquantity governing physical dynamics, entropy is the fundamental concept in information theory. Inthis work, starting from first principles, we give a detailed didactic account on the relations betweenenergy and entropy and thus physics and information theory. We show that thermodynamic processinequalities, like the Second Law, are equivalent to the requirement that an effective description forphysical dynamics is strongly relaxing. From the perspective of information theory, strongly relaxingdynamics govern the irreversible convergence of a statistical ensemble towards the maximally non-commital probability distribution that is compatible with thermodynamic equilibrium parameters.In particular, Markov processes that converge to a thermodynamic equilibrium state are stronglyrelaxing. Our framework generalizes previous results to arbitrary open and driven systems, yieldingnovel thermodynamic bounds for idealized and real processes.

A. Introduction

The basic connections between equilibrium thermody-namics and probability predate modern information the-ory and go back to Gibbs, Boltzmann and Einstein [1].Shannon’s information theory has formalized the uncer-tainty associated to a probabilistic description in an ax-iomatic way [2]. The basic ideas of how thermodynam-ics and information theory are connected have enableda modern epistemic perspective on thermodynamics [3–6]. Extending this perspective to consistently includemodern aspects of nonequilibrium thermodynamics is anactive field of research [7–15].

The purpose of this article is to describe a universalself-consistent approach to modern nonequilibrium ther-modynamics based on the rationale of information the-ory. In order to be clear about our scope, we first recallthe three elementary basic notions of nonequilibrium pro-cesses:

1. Irreversible (transient) relaxation of a non-equilibrium system to an equilibrium state specifiedby thermodynamic parameters.

2. Driving processes where these thermodynamics pa-rameters are changed.

3. Non-equilibrium steady states (NESS).

Here, we are concerned with the first two aspects ofnonequilibrium thermodynamics. The third aspect is notcovered. The reason for this is that the thermodynamicconsistency of NESS requires a clear notion of local equi-librium (and local detailed balance), which builds uponbut ultimately goes beyond this work.

Importantly, the relation between non-equilibrium andequilibrium thermodynamics is not that of, for example,the study of elephant and non-elephant metabolism. Theconcept of an elephant has nothing to do with the con-cept of metabolism. In contrast, thermodynamics always

refers back to equilibrium, where there is a clear connec-tion between energy and entropy.

In his autobiographical notes, Einstein expressed hishigh regard for thermodynamics as a universal the-ory [16]: “A theory is the more impressive the greaterthe simplicity of its premises, the more different kinds ofthings it relates, and the more extended its area of ap-plicability. Therefore the deep impression that classicalthermodynamics made upon me. It is the only physi-cal theory of universal content which I am convinced willnever be overthrown, within the framework of applicabil-ity of its basic concepts.”

Here, we show that even beyond equilibrium, energyand entropy constitute these basic concepts. In particu-lar, we demonstrate how nonequilibrium thermodynam-ics emerges as a universal epistemic theory that connectsthe evolution of energy and entropy to irreversibility andthus to the loss of information in effective physical de-scriptions.

1. Information theory and statistical mechanics

In 1957, Jaynes published a controversial article enti-tled “Information Theory and Statistical Mechanics” [3].In that work, he argues that statistical mechanics can beinterpreted as a theory of statistical inference that en-ables the construction of probabilistic descriptions on thebasis of partial knowledge. He then concludes that theemerging “subjective” statistical mechanics should not beregarded as a physical theory. In particular, according tohis view, statistical mechanics does not rely on additionalphysical dynamical or statistical assumptions. In orderto relate the present work to Jaynes’ ideas we briefly re-visit his argument now, while a more detailed discussionwill follow later.

The connection between a statistical description of asystem and a quantitative notion of uncertainty has beenformalized by Shannon’s pioneering work on information

2

theory [2]: Given a probability distribution, its uncer-tainty (or information entropy) quantifies the expectedamount of information acquired when drawing a samplefrom that distribution. Operationally speaking, uncer-tainty (if expressed in multiples of 1bit := ln 2) is theaverage number of yes-or-no questions about a systemthat need to be answered in order to identify the sampleunambiguously.

In statistical mechanics, samples correspond to distinctmicroscopic states of a system. Usually, the microstatespecifies (up to a given resolution) the positions and mo-menta (and other relevant degrees of freedom) of all en-tities making up the system. In contrast, a macrostateis defined by small number of macroscopic parameters,like temperature, volume or particle number. Using aphysical model, a macrostate can be inferred from a mi-crostate. However, the reverse is not true: In general,there are many microstates that are compatible with agiven macrostate. Consequently, the best we can do isassign a probability distribution to the microstates thatis consistent with our macroscopic observations. Jaynesargues that the least biased, i.e. maximally honest guessamongst all these compatible distributions, should max-imize its uncertainty. This so-called maximum entropy(MaxEnt) principle thus constitutes a modern variant ofOckham’s razor, which states that the most plausible ex-planation is the one that requires the least number ofadditional assumptions.

In his article, Jaynes shows that this maximum entropydistribution coincides with well-known equilibrium distri-butions, if the statistical averages of macroscopic observ-ables like energy and volume are constrained to observedmacroscopic values. While he does not explicitly moti-vate his choice of these specific observables, he is wellaware that his argument rests on the assumption thatthese quantities exhibit a very sharp peak. Ultimately,he motivates his choices a posteriori by remarking theyare good as long as the resulting statistical theory agreeswith experimental behaviour. He continues showing thatthe maximum entropy distributions derived by constrain-ing the average energy (and in another example, the vol-ume) correspond to the well-known equilibrium distri-butions. Likewise, the thermodynamic entropy then co-incides with the information-theoretic entropy. He con-cludes that entropy in the sense of Shannon is the fun-damental concept in thermodynamics, even more thanthe concept of energy itself. Since entropy does not re-quire any reference to physics, he thus claims that statis-tical mechanics is simply statistical inference by entropymaximization. In particular, he argues that his view onstatistical mechanics does not require additional physicaldynamic and statistical assumptions like ergodicity andequal a priori probabilities.

Not surprisingly, Jaynes’ approach has attracted a lotof criticism from the physics community. While the Max-Ent perspective provides an elegant way to circumventcertain difficult questions in statistical mechanics, the“subjective” view dismissing the physical nature of the

theory is not easily accepted by (statistical) physicists.At first glance, the subjective nature of entropy seems tobe problematic. After all, the notion of entropy has beenused for about a century to construct real machines. Sohow can a central thermodynamic quantity depend onthe knowledge of the experimenter that measures it?

The answer is that entropy is never measured. Evenin traditional thermodynamics, the equilibrium entropyof a system is only well defined after one chooses themacroscopic parameters to be used in its description. Orin Jaynes’ words [17]: “It is clearly meaningless to ask,’What is the entropy of a crystal?’ unless we first specifythe set of parameters which define its thermodynamicstate.” As such, entropy is not an ontological aspect ofa system. Rather, we should view it as an epistemicproperty that emerges only in the context of a (useful)description. Moreover, entropy is never measured in athermodynamic experiment. Instead, what one measuresin thermodynamics are changes of the energy of a systemthat come in the form of work and heat.

While this is a strong argument for Jaynes’ epistemicview of entropy and statistical mechanics, it is similarlya strong argument against Jaynes’ conclusion that sta-tistical mechanics is not rooted in physics. The mostelementary concept in both thermodynamics and physicsin general is energy. Energy and entropy, heat and workare amongst the “basic concepts” Einstein spoke about.Rejecting energy as a fundamental principle of statisticalmechanics means disconnecting it from thermodynamics.Viewing energy as just another extensive quantity whosevalue is constrained in a MaxEnt principle thus ignoresits physical significance. In particular, energy is differentfrom other extensive quantities. In thermodynamics, theenergy content of a system, usually called internal en-ergy is, unlike other extensive quantities like volume, notdirectly accessible.

Moreover, energy is intimately related to physical dy-namics. Thus any approach to thermodynamics that dis-regards dynamics is at best incomplete and at worst mis-leading. After all, in practice we use thermodynamicsto describe physical processes. In that light, Jaynes per-spective on statistical mechanics as a mere theory of sta-tistical inference seems indeed questionable. Moreover,as we will discuss in detail below, also the additional sta-tistical assumption of equal a priori probabilities usedfor the traditional derivation of statistical mechanics isintimately connected to the notion of energy.

In the light of the arguments above there seems only tobe one consistent stance on the connection of informationtheory and statistical mechanics, which will be adoptedthroughout this work: We agree with Jaynes’ view of sta-tistical mechanics as a meta-theory based on statisticalinference. As such, we accept (and indeed hope to pro-mote) the epistemic interpretation of entropy. However,we firmly disagree with Jaynes’ view that statistical me-chanics is independent from physics in general and phys-ical dynamics in particular. In contrast, we will showhow the connection between energy, dynamics, statistics

3

and information are at the heart of building a modernself-consistent thermodynamic framework.

2. Towards non-equilibrium thermodynamics

Our goal is to extend and formalize the ideas men-tioned above to non-equilibrium, i.e. to situations whenthe state of the system is not appropriately described byan equilibrium (MaxEnt) distribution. As equilibriumdistributions are stationary, a true thermodynamics isnecessarily a theory that treats distributions that vary intime. Temporal changes of the distribution are either dueto an autonomous relaxation or due to temporal changesin the thermodynamic parameters that characterize thesystem and its environment. The energy of a system maychange due to work and heat exchanged with its environ-ment. The Second Law appears as an inequality relatingthese quantities.

In order to proceed we thus need a self-consistent ap-proach to the Second Law of Thermodynamics from aninformation-theoretic perspective rooted in physics: Inparticular, our goal is to formalize the epistemic meaningof (irreversible) thermodynamic relaxation as a propertyof the effective dynamics we use to describe the evolu-tion of a probability distribution. Intuitively, we treatwhat we call strongly relaxing dynamics which are de-fined by the property that at any point in time, they dis-card information about the system’s microstate. Moreprecisely, we consider dynamics that always bring the in-stantaneous distribution closer to the maximally uncer-tain one, in accordance with the (instantaneous) thermo-dynamic control parameters. Mathematically, this prop-erty comes in the form of an inequality for the evolutionof an information-theoretic measure for the difference ofa (nonequilibrium) distribution from the correspondingreference equilibrium state [18]. After all, relaxation is in-timately related with the notion of irreversible processes,i.e. changes in the state of the system that discard infor-mation about the system’s past.

From this single information-theoretic inequality, wewill be able to derive the Second Law of Thermodynam-ics and other related inequalities, like the law of exergydegradation, a general version of Landauer’s principleand other work inequalities. In fact, we show that theseinequalities are all equivalent. While some of the resultsthat we present here have been derived previously in morespecialized contexts [4, 5, 8, 19–21], we believe that ourpresentation comprises the most general consistent for-mulation of nonequilibrium thermodynamics so far. Inparticular, it goes beyond the canonical or grandcanoni-cal ensemble to arbitrary situations where a system canexchange a conserved extensive quantity (like particles,volume, energy) with its environment. Moreover, ourformulation applies equally to situations where a systemis driven (i.e. where the control parameters of the sys-tem are changed with time) and to systems that residein changing environments.

In particular, our theory predicts the (non-integrable)non-isothermal corrections that appear if the temper-ature of the environment is changed during a non-isothermal process. It further emphasizes the special roleplayed by (inverse) temperature if compared to other in-tensive environment parameters (like chemical potentialsor pressure). As a consequence, the special role and gen-eral importance of the free energy (which appears as agenerating function in the canonical ensemble) becomesevident. Further, we introduce equilibrium and nonequi-librium versions of a “natural” thermodynamic potentials,a general notion for the “heat content” of a system and anotion of “exergy” referring to the potential of a systemto perform work in an autonomous relaxation process.

A crucial point to keep in mind during the rest of thiswork is the distinction between thermodynamic quanti-ties that are controlled by an experimenter, and ther-modynamic quantities that are measured, and thus showan autonomous dynamics prescribed by the system re-laxation dynamics. This leads to a new separation ofthe power (i.e. the rate of work) into a “driven” anda “autonomous” part which both can have either sign.Although this distinction is related to the distinction be-tween “input” and “output” power in macroscopic heatengines, it is conceptually different. We will further seethat the equalities that hold for reversible processes haveanalogous inequalities for real (irreversible) processes. Asthe present framework is built on information-theoreticideas, we also hope to convince the sceptical reader thatthe conversion of information to work (and vice versa) isa very real and fundamental property of any thermody-namic description.

After having provided the basic intuitive concepts ofour work, the main part of this text is dedicated to a for-mal, yet didactic mathematical treatment of these ideas.Nothing more complicated than (multidimensional) realanalysis is required. As such, the present work aims to beself-contained. We hope it is comprehensible for readerswith an undergraduate course in physics or mathematics.

Our work is structured as follows: In section B wequickly review the basic ideas of Gibbs that lead to thecanonical ensemble for situations where a system ex-changes energy with its environment [22]. Following thederivation of the grandcanonical ensemble for open sys-tems that can exchange matter with their environment,we formulate equilibrium ensembles for general open sys-tems in section C. The formal notions of information the-ory are introduced in section D. With these prerequisites,in section E we formalize the relaxation condition andsee how it naturally appears for Markovian, i.e. mem-oryless, dynamics. Section F defines non-equilibriumstate variables and process quantities like work and heat.Using these definitions, we show the equivalence of theinformation-theoretic relaxation condition with the Sec-ond Law, where the latter appears as the statement thatthe irreversible entropy production is never negative. Insection G we treat further thermodynamic notions likethat of heat content and derive (in-)equalities for (ir-

4

)reversible processes that relate (ir-)reversible work andinformation. We discuss our results in a broader contextand conclude in section H.

B. Canonical statistical mechanics

In this section we briefly revisit the historic approachesto equilibrium statistical thermodynamics, which dateback to Gibbs and Boltzmann. We will not dive intothe technical details, but rather use this quick review inorder to introduce our notation. For more details, we re-fer to standard textbooks like Refs. [23, 24]. However, inorder to understand the notion of generalized ensemblesin the next section requires a basic intuition about thederivation of the canonical ensemble, i.e. of systems thatexchange energy with their environment.

The historic and still most common approach towardsa statistical formulation of thermodynamics starts fromHamiltonian mechanics as the fundamental reversible dy-namics on the lowest level. A system with microscopicstates z is characterized by the Hamiltonian H0(z),which specifies the energy of a physical configuration.Since energy is a conserved quantity, the phase space vol-ume W(U) of all states with a fixed energy H0(z) = U isa measure for the number of (physically distinguishable)states that are accessible to the system. One proceedsby defining the function S(U) := lnW(U) and its deriva-

tive β := dS(U)dU . Note that throughout this work we set

Boltzmann’s constant kB ≡ 1 to unity.

In his seminal work on statistical mechanics [22], Gibbsshowed that β−1, S and U fulfil the phenomenologi-cal relations for temperature, entropy and internal en-ergy that were already known from phenomenological(equilibrium) thermodynamics. Today, we call this themicrocanonical derivation and understand it as a first-principles approach to the thermodynamics of isolatedsystems.

A useful application of thermodynamics requires us togo beyond isolated systems and treat systems in contactwith an environment that acts as an energy reservoir. Tothat end, we split a large isolated system with degrees offreedom z = (x,y) into a system with degrees of freedomx and environment degrees of freedom y. In general,the Hamiltonian H0(z) = Hs(x) + Hr(y) + Hi(x,y) ofthe large system is split into the energy function of thesystem and the environment, Hs(x) and Hr(y), and aninteraction term Hi(x,y).

One is interested in the marginal distribution (x) ∝∫

(x,y) dy of the system degrees of freedom. With-out going into the details of the derivation, the typicalapproach is to start with the microcanonical (i.e. fixed-energy) ensemble of the large combined system at energyU0. Because (x) is proportional to the number of pos-sible configurations of the environment for a given valueof the system energy E(x), it follows that the marginal

distribution for the system degrees of freedom reads

(x) =1

Zc

exp (−βE(x)) . (1)

where the canonical partition function

Zc =

exp (−βE(x)) dx (2)

ensures normalization. Usually, the energy function ofthe system is simply the system part of the Hamiltonian,E(x) = Hs(x). More precisely, the argument uses theextensivity of the energy (i.e. its scaling in the ther-modynamic limit) and requires the assumption of short-range interactions, meaning that in this limit the ener-getic contribution Hi(x,y) due to system-bath interac-tions is negligible compared to both Hs(x) and Hr(y).Moreover, and crucially, the argument (often done usinga Taylor expansion of the bath entropy Sr around theexpectation value of the bath energy), uses the fact thatenergy is conserved. Similar to the microcanonical case,the factor β := ∂U0

S0 relates changes of phase space vol-ume to energy changes in the isolated combined systemand is thus interpreted as its inverse temperature. Underthe assumption that the environment is much larger thanthe system, β is determined by the inverse temperatureof the heat bath alone. From the system’s perspective, βfixes the expected value of the system energy

U := 〈E〉 , (3)

where the equilibrium (or reference) average of a phase-space function O(x) is defined as

〈O〉 :=

(x)O(x) dx. (4)

Introducing the entropy functional H of a probabilitydensity (x) as

H[] := −

(x) ln (x) dx, (5)

the equilibrium (or Gibbsian) entropy of the system reads

S := H[]. (6)

Mathematically, the negative logarithm of the canonicalpartition function Zc

Φc := − lnZc, (7)

is a cumulant-generating function [25]. We will thus referto it as the natural dimensionless or generating “poten-tial”. Thermodynamically, it appears as a dimensionlesspotential defined by the Legendre duality relation

S +Φc = βU . (8)

Multiplying this dimensionless potential by the temper-ature β−1, we obtain the free energy

F := U − β−1S ≡ β−1Φc (9)

5

as the natural dimensional potential for the canonicalsituation.

Notice that we use the first equality to define the freeenergy. This definition will remain valid and important inthe rest of this work, even in the context of open systems,where the natural generating potential is not obtainedfrom a canonical partition function.

C. Open systems and generalized ensembles

In general, the energy function of the system E(x) de-pends on a set of parameters. On the one hand, thereare given unchangeable microscopic parameters like fun-damental physical constants. On the other hand, thereare thermodynamic system parameters like its volume V ,the numbers N = (Nk)k of particles of different speciesk, external fields h = (hl)l etc.. In the following, we willsummarize these macroscopic control parameters by thesymbol λ and occasionally make the dependence of thesystem energy function explicit by writing E = E(x;λ).

Amongst the system control parameters, a special roleis taken by extensive quantities (like volume and matter).While in a closed system, the value of these quantities isfixed, a corresponding open system can spontaneouslyexchange such extensive quantities with its environment.Consequently, an extensive system control parameter fora closed system becomes a measurable observable in thecorresponding open system. In particular, we are inter-ested in the exchange of conserved extensive quantities,that cannot be destroyed or created by neither the systemnor the environment. Then, any change of these quan-tities in the system must come from an exchange of thequantity with the system’s environment acting as a corre-sponding thermodynamic reservoir. As such, the presenttreatment excludes systems that allow for chemical reac-tions, where one species of matter may be transformedinto another one.

While this situation has a strong analogy with the tran-sition from the microcanonical to the canonical ensemble,notice that there is a crucial difference: The energy func-tion E (or more precisely, its average U), is not mea-surable, while the quantities associated to macroscopicextensive control parameters are. Moreover, the energyis not just another phase space function, but, as we havementioned in section B, a quantity that is intimately re-lated to the fundamental microscopic reversible dynamicsand thus the microcanonical definition of entropy itself.Keeping this special role of the energy in mind, we maynot be surprised by the emergence of peculiar terms forsystems in contact with environments that undergo tem-perature changes. Are more detailed discussion if thespecial role of energy is postponed to section H.

For concreteness, we briefly review the case of thegrandcanonical ensemble for a system that can exchangeboth energy and particles with its environment. For sim-plicity, we assume that there is only a single species ofparticles. In the grandcanonical ensemble, a system mi-

crostate x is an element of an extended phase-space,sometimes called Fock space. The Fock space containsmicroscopic configurations for any number of particles,and the system may switch between different “particleshells”. As said above, the number of particles N thenbecomes an observable N(x) and thus a function of themicrostate, instead of a system parameter.

The number of particles is an extensive quantity andthus its asymptotic statistics behave regularly in an ap-propriate thermodynamic limit. Accordingly, there isa corresponding intensive parameter called the chemicalpotential µ. Assuming the environment to be much largerthan the system, i.e. pretending that it behaves as an in-finitely large particle reservoir, we characterize it solelyby the value of µ. Because the number of particles isconserved, the chemical potential characterizing the equi-librium configuration for the system degrees of freedomneeds to be the same in the system and its environment.Marginalization to the system degrees of freedom thusyields the grandcanonical distribution function

(x) :=1

Zgc

exp (−β(E(x;λ)− µN(x))) . (10)

with the grandcanonical partition function

Zgc :=

exp (−β(E(x;λ)− µN(x))) dx. (11)

The equilibrium entropy is again given by the entropyfunctional (6), where this time we insert the grandcanon-ical distribution, Eq. (10). For any particular value of µand β it determines the equilibrium expectation value ofthe particle number

N := 〈N〉 . (12)

The grandcanonical dimensionless generating potentialΦgc := − lnZgc obeys the Legendre duality relation

S +Φgc = β(U − µN ). (13)

Its dimensional version is obtained by multiplication withtemperature β−1 and is called the grand potential

Ω := U − µN − β−1S (14)

= β−1Φgc.

The same logic used in deriving the grandcanonical en-semble can also be applied by turning the volume V froma control parameter to an observable V (x). Then, thecorresponding intensive quantity is the (negative) pres-sure, which specifies the equilibrium expectation valueV = 〈V 〉 in the so-called isobaric ensemble. For gener-ality and ease of notation, we will work with a generalensemble where some control parameters describing con-served extensive quantities have been transformed intoextensive observables. To that end, we consider a sys-tem in a heat bath at temperature β. As before, systemcontrol parameters are collectively denoted by λ. All

6

uncontrolled extensive quantities (other than the energyE(x) with 〈E〉 = U) have associated observables C(x),with corresponding averages

C := 〈C〉 . (15)

The conjugate intensive environment parameters are de-noted by γ. The generalized distribution thus reads

ν =1

Zexp(−β(E(x;λ)− γC(x))), (16)

where the vector ν = (β,γ,λ) summarizes all thermody-namic parameters. In the grandcanonical example, thesystem parameters are the volume V and fields h. Hence,we have λ = (V,h) and (γ,C(x)) = (µ,N(x)).

Again, in the thermodynamic limit, the appropriateentropy is given by Eq. (6) and obeys the general Legen-dre duality

S +Φ = β(U − γC). (17)

for the generating potential

Φ := − lnZ. (18)

Previously, variants of this potential have been called (inanalogy to the free energy) the (negative) free entropyor the Massieu(–Planck) potential, cf. Refs. [23, 25, 26].However, in order to avoid confusion in situations wherethese nomenclature requires the definition of a specificphysical situation, we will just call it the “natural” or“generating” adimensional potential. Its dimensionalvariant

G := β−1Φ ≡ U − γC − β−1S (19)

is called the natural energetic potential or natural en-ergy. It can be understood as the generalization of thegrand potential, Eq. (14), to generalized open systems.Notice that even though structurally similar, the free en-ergy F := U − β−1S is a distinct well-defined quantityfor any generalized ensemble. As we will see below, itis crucially related to the notion of work in isothermalprocesses. In contrast, the definition of the natural di-mensional potential G depends on the ensemble, i.e. onthe question which extensive macroscopic parameters arecontrolled and which are measured. Hence, the naturalenergy G must not be understood as a mere generaliza-tion of the free energy F .

The intensive bath parameters γ are (energetically)conjugate to the extensive quantities that we obtainedby changing from system control parameters to systemobservables. In addition there are energetically conjugatequantities for the remaining control parameters, definedas

Ri =

∂E

∂λi

= β−1 ∂Φ

∂λi

, (20)

where i denotes the ith component of λ. For the grand-canonical ensemble with E = E(x;V, h), they are the(negative) internal system pressure

−p :=

∂E

∂V

= T∂Φ

∂V(21)

and the polarisations M

M :=

∂E

∂h

= T∂Φ

∂h. (22)

Notice that Eq. (20) is one of the parameter-derivativesof the generalized potential:

∂Φ

∂β= U − γC, (23a)

∂Φ

∂λi

= βRi, (23b)

∂Φ

∂γi= −βCi. (23c)

Assuming analyticity of Φ in the parameters ν we furtherobtain the Maxwell relations:

∂Ri

∂λj

=∂Rj

∂λi

(24a)

∂Ci∂γj

=∂Cj∂γi

(24b)

∂Ri

∂γj= −

∂Cj∂λi

(24c)

Ri + β∂Ri

∂β=

∂U

∂λi

− γ∂C

∂λi

(24d)

β∂Ci∂β

= −∂U

∂γi+ γ

∂C

∂γi(24e)

While the above relations are universal and follow di-rectly from the elementary theory, the dependence ofequilibrium state variables on parameters is specific tothe system and its environment. In traditional ther-modynamics the algebraic relations between parametersand observables are called equations of state. They areuniquely determined by Φ as a function of ν.

D. Information theory and thermodynamics

The canonical treatment of systems that can exchangeenergy and other extensive quantities with a large equili-brated environment is usually considered as a derivationof thermodynamics from first principles. But what arethese first principles exactly?

While Jaynes argued that the nature of the dynamics(and thus questions of ergodicity) do not matter for thenotion of thermodynamics as a physical (meta-)theory,we disagree with this statement. In contrast, we ar-gue that for a consistent thermodynamics on some ef-fective level, we always need to refer back to an appro-priate microscopic dynamics, which is reversible and has

7

a fundamental constant of motion (which we call energy),which generates the dynamics. From that perspective, itdoes not matter whether we choose quantum mechanicsor classical Hamiltonian mechanics as the correspondingunderlying reversible theory.

While the fundamental requirement on a microscopicdynamics connects energy and reversibility, we need asecond requirement that introduces probability into thedescription. In the canonical derivation, this is achievedby the assumption of equal a priori probabilities for iso-lated systems, where the dynamics is completely speci-fied by the energy function attributed to all degrees offreedom. As such, it amounts to the choice of a “fair”reference measure for the microstates of a microscopicdynamics. We will discuss this perspective in more de-tail in Sec. H.

In the derivation of the canonical (generalized) ensem-bles for (open) systems, the statistical quantity that isconsistent with the concept of thermodynamic entropyhas emerged as the universal functional S = H[] of thecorresponding equilibrium distribution . The indepen-dence of this observation from the details of the physi-cal situation has its origin in the exponential scaling ofthe marginal probability density functions for extensivequantities in the thermodynamic limit. It has recentlybeen understood and mathematically formalized usingthe theory of large deviations [25, 27, 28]. Similarly,the physical interpretation of the parameters β and γ

as intensive parameters (temperature, chemical poten-tials, pressure) requires the thermodynamic limit for thecorresponding reservoirs of extensive quantities (energy,particles, volume).

As we explained in some detail in the introduction,the epistemic approach is rooted in Shannon’s informa-tion theory [2, 18, 29]. In information theory, the entropyfunctional H[] is an axiomatically defined quantitativemeasure of the uncertainty of a probability distribution.Or differently interpreted, the entropy of a probabilitydistribution represents the average information one ob-tains from sampling a state x according to (x). In theepistemic perspective, equilibrium statistical mechanicsis a theory of statistical inference, which assigns a prob-ability to a microstate given the prior knowledge of amicroscopic physical model (specified by an energy func-tion E(x) and other observables), macroscopic parame-ters and macroscopic observations. The equilibrium ref-erence distribution is obtained as the maximally non-committal (i.e. the least biased) distribution that isconsistent with given knowledge about the expectationvalues of the global extensive quantities. Consequently,this distribution should have the maximal uncertaintyamongst all distributions that yield the correct macro-scopic averages.

Formally, amongst all ′ that obey the constraints〈E〉′ = U and 〈C〉′ = C, the maximally uncertain equi-librium distribution is defined as

:= argmax′

H[′]. (25)

In this so-called MaxEnt approach, the intensive environ-ment parameters β and γ enter as the Lagrange multipli-ers in the maximization problem. Further, the partitionfunction − ln(Z) appears as the Lagrange multiplier thatensures normalization.

The main difficulty with the generalization of this ideato non-equilibrium situations is evident: in equilibrium,the macroscopic state of a system is characterized by aset of independent thermodynamic state variables, whichfix the reference expectation values. Equilibrium equa-tions of state then unambiguously give the values of theremaining thermodynamic variables. Out of equilibrium,there is no general notion of the “correct” macroscopicexpectation values or their mutual relations.

Information theory yet provides us with a means toquantify the disparity of a distribution from some ref-erence distribution ⋆ by means of the Kullback–Leibler(KL) divergence [18, 29]:

D[‖⋆] :=

(x) ln(x)

⋆(x)dx ≥ 0 (26)

Non-negativity of this quantity follows from Jensen’s in-equality, and (ignoring the more subtle notions of mea-sure theory), D[‖⋆] = 0 holds if and only if = ⋆.

The KL-divergence has a natural interpretation interms of coding theory, where we think of events (or inthe present context, microstates) as encoded by symbolicsequences. In binary coding, such a symbol sequence canbe thought of as the answers to a specific set of yes-or-noquestions about the microstate. With that idea in mind,the Shannon–Kraft–McMillan coding theorem states thatfor any probability distribution there exists an opti-mal natural “encoding” (or, equivalently, an optimal setof questions), such that the expected sequence length isproportional to H[] [29]. The KL-divergence quantifiesthe additional average message length (i.e. the additionalaverage number of questions that need to be answered)when one draws samples from a distribution , but usesan encoding that was optimized for another distribution⋆ [18].

With this interpretation it is not surprising that theKL-divergence will play an important role for formulatingnonequilibrium thermodynamics: after all, we intend tomake meaningful statements about nonequilibrium states(i.e., nonequilibrium probability distributions) by refer-ring to the statistical notions that arise in the contextof equilibrium systems (with their corresponding equilib-rium distributions).

E. Relaxing dynamics

A crucial property of an equilibrium distribution is itsstationarity. Hence, a true thermodynamics is necessarilya nonequilibrium theory that deals with time-dependentprocesses. As said above, here we consider two aspects ofthermodynamic processes. Relaxation processes governthe convergence of a non-equilibrium distribution to its

8

equilibrium state. The latter is defined by the values ofthe thermodynamic parameters ν = (β,γ,λ) that char-acterize the system and its environment. In addition,an external agent may drive the system by applying athermodynamic protocol ν = ν(t) that specifies how thethermodynamic parameters change with time.

In what follows, we characterize the instantaneous(non-equilibrium) state of the system at time τ by a time-dependent probability distribution (τ). Its evolution toa later state is governed by the first-order (partial) dif-ferential equation

∂τ(τ) = ν(τ)[

(τ)], (27)

where the dynamics is encoded in the operator ν . At thispoint, we demand nothing of the operator besides that itmaps a probability densities to probability densities andthat its value ν(τ)[

(τ)] is a state function, i.e., that it

depends only on the instantaneous (τ). If the dynamicsis a model for the evolution of the state of a thermody-namic system, the operator necessarily depends on thevalue of the physical control parameters ν.

A thermodynamic system out of equilibrium that isleft on its own in a fixed environment will eventuallyequilibrate in what we call the relaxation process. Wesay that a dynamics is weakly relaxing if for any constantvalue of the physical parameters ν we have

(τ)τ→∞−−−−→ ⋆ν . (28)

Naturally, the asymptotic distribution ⋆ν is the station-ary equilibrium distribution for the appropriate physi-cal situation, which is defined by the right-hand side ofEq. (16). In what follows we emphasize the role ⋆ν as athermodynamic reference distribution (for given param-eter values ν) by using the ⋆-symbol as a superscript.

In the last section, we introduced the Kullback–Leiblerdivergence D[‖⋆] as the information-theoretic distanceof a distribution from a reference distribution ⋆. Thus,it is natural to define the thermodynamic distance

D(τ)ν := D

[

(τ)∥

∥⋆ν

]

. (29)

It measures the disparity of the instantaneous non-equilibrium distribution (τ), which evolves according to

ν from the corresponding equilibrium reference distribu-tion ⋆ν . In the context of the present interpretation ishas first appeared in Ref. [5] under the name of “entropydeficiency”, but constitutes also as a central quantity inmore recent work [8, 20, 21].

With the definition of the thermodynamic distance, wereformulate the condition of a weakly relaxing dynamicsat constant parameters ν as

D(τ)ν

τ→∞−−−−→ 0. (30)

Weakly relaxing dynamics are thus only constrained bythe their time-asymptotic behaviour. For driven systems,

where parameters ν may change with time, the weakrelaxation condition is not appropriate. Even for non-driven systems that only relax, a meaningful thermody-namic for transient behaviour should require a constraintformulated on the dynamics at all times, rather than onlyreferring to its final destination.

The central property of thermodynamic relaxation isits irreversibility: it is impossible to revert the effectsof relaxation without spending something a form of en-ergy that thermodynamics calls work. However, as thegoal of the epistemic approach is to understand ther-modynamic concepts from probabilistic and information-theoretic concepts, and not the other way around, wewill not introduce work yet. Instead we propose moreabstractly that irreversibility simply encodes the loss ofinformation, i.e. the increase of uncertainty.

We already have everything in place to define whatwe call a strongly relaxing dynamics : A dynamics (27)is strongly relaxing if and only if at any moment in timeand for any probability distribution, we have

∂τD(τ)

ν ≤ 0, (31)

with equality if and only if (τ) = ⋆ν , and thus D(τ)ν = 0.

This condition is equally meaningful for driven and non-driven situations. We emphasize that on the left-handside of this condition, a partial time-derivative appears.As such, the time-derivative acts only on (τ), which is inturn calculated using the evolution rule Eq. (27). Prac-tically it means that the evolution operator ν appearingin Eq. (27), as well as the reference distribution ⋆ν , areevaluated at their instantaneous values ν = ν(τ). Fordriven systems, a total time-derivative would includes anadditional term proportional to ν(τ) multiplied by a pa-rameter gradient. We will encounter and analyse suchparameter gradients in more detail below. For now let usjust mention that for constant parameters strong relax-ation implies weak relaxation.

Let us briefly stop here to emphasize the following:The strong relaxation condition formulated as the in-equality (31) is the single dynamical assumption we willuse in the remainder of this work. In particular, it is thesingle condition that we will need to derive all the well-known thermodynamic inequalities like the Second Law.As we will see below, all these thermodynamic processinequalities are in fact equivalent.

The crucial point about the inequality (31) is that ithas an unambiguous information-theoretic meaning: astrongly relaxing dynamics constantly discards informa-tion. Or more mathematically: The distribution (τ+δτ)

obtained after an infinitesimal time-step δτ by applyingthe generator ν(τ) to a nonequilibrium distribution (τ)

is information-theoretically closer to the equilibrium dis-tribution than (τ):

D[(τ+δτ)‖⋆ν(τ)] ≤ D[(τ)‖⋆ν(τ)]. (32)

Asymptotically, the probability distribution reaches the

9

maximally non-committal distribution in its stationaryequilibrium state.

While the strong relaxation condition may come as anappealing starting assumption for a formulation of relax-ing dynamics, we still have not said anything about dy-namics that implement that assumption. In particular,we have not specified anything but time-locality for theevolution equation (27). In physics, we mostly deal withevolution equations for probabilities where ν is a linearoperator. For discrete and continuous state spaces, Equa-tion (27) is called the Master or Fokker–Planck equation,respectively. Such linear equations arise naturally as theequations governing the change of probability in Markovprocesses, in which case ν is called the (backward) gen-erator of the Markovian dynamics.

Oliver Penrose suggested that Markov processes areubiquitous as stochastic models for physical processes,because they ensure statistical regularity for the stochas-tic dynamics [24]. In particular, Markov processes arememoryless, which is an important criterion for the re-producibility of statistical experiments: All relevant infor-mation for the future evolution of the system is encodedin the instantaneous distribution. In particular, it doesnot matter how (and in which laboratory throughout theworld) the state was prepared.

More precisely, for Markovian processes the condi-tional probability of a transition x → x′ does only de-pend on the states x and x′, but not on the stochastichistory of the process. Not surprisingly, this lack of mem-ory manifests itself in information-theoretic terms: AnyMarkov process that converges to unique invariant dis-tribution ∞ monotonously decreases the KL–divergenceof the instantaneous distribution (τ) from ∞:

∂τD[(τ)‖∞] ≤ 0. (33)

Or differently stated: For any convergent Markov pro-cess, the relative entropy D[(τ)‖∞] is a Liapunov func-tional for the dynamics [29, 30].

Keep in mind that this is a purely mathematical prop-erty. In order to call a Markov process strongly relax-ing in the present thermodynamic context, the station-ary distribution ⋆ν(x) must originate from a physicalmodel defined by means of extensive physical quantitiesE(x;λ) and C(x). More precisely, it requires that theanti-symmetric part of the generator ν must be obtainedas the gradient of a phase space function

ϑ(x;λ,γ) = β(E(x,λ)− γC(x)) (34)

subject to a global noise level characterized by a noise pa-rameter β, see also [31, 32]. This requirement implies thecondition of detailed balance, i.e. the absence of proba-bility currents in the stationary state, and thus forwardand backward trajectories are equally likely. In mathe-matical terms, detailed balance requires the generator ν

to be similar to a symmetric operator and thus to havea real spectrum.

While a consistent thermodynamics can also be formu-lated for Markovian dynamics that do not fulfil detailed

balance, a consistent interpretation in that case requiresa careful analysis of the notions of local equilibrium andlocal detailed balance [33–35], which are beyond the scopeof this paper.

For completeness, we also mention that Liapunov func-tionals resembling relative entropies exist also for non-linear evolution equation, see for instance Refs. [15, 36].For the purpose of the present article it is sufficient tokeep in mind that any physical Markovian dynamics con-verging to a unique steady state constructed from a phys-ical equilibrium potential fulfils the strong relaxation cri-terion: Independently of the exact instantaneous state,they always approach the maximally uncertain equilib-rium distribution that is consistent with the macroscopicthermodynamic parameters.

F. Nonequilibrium Thermodynamics

With the thermodynamic, information-theoretic anddynamic prerequisites introduced in the previous sec-tions, we have everything in place to proceed to formu-late our nonequilibrium thermodynamics for driven andrelaxing systems. We start with the definition of nonequi-librium state variables and potentials and then considertheir dynamics. After that, we introduce process quan-tities like (rates of) work and heat, which allows us todefine entropy production and finally formulate the Sec-ond Law of Thermodynamics.

1. Nonequilibrium state variables and potentials, exergy

State variables are obtained as the expectation val-ues of phase space functions O(x;ν) which may explic-itly depend on thermodynamic system and environmentparameters ν. While we exclude phase space functionsthat depend explicitly on time, we explicitly include thetime-dependence of state functions for driven systemswhere ν = d

dτ ν(τ) may be non-zero. In previous sec-tions, we have only considered equilibrium state variables(cf. Eq. (4)), that were obtained as equilibrium (refer-ence) averages. From now on, we will decorate such ref-erence averages by the symbol ⋆:

O⋆ν := 〈O〉⋆ :=

O(x;ν) ⋆ν(x) dx. (35)

Non-equilibrium state functions are obtained by averag-ing over the (explicitly time-dependent) nonequilibriumdistribution (τ):

O(τ)ν := 〈O〉(τ) :=

O(x;ν) (τ)(x) dx. (36)

While reference averages always depend on ν, non-

equilibrium averages O(τ) = 〈O〉(τ) depend only on ν

if the phase space function O(x) does. For instance,

10

the non-equilibrium averages of the macroscopic exten-sive observables C(x)

C(τ) := 〈C〉(τ) =

C(x) (τ)(x) dx (37)

do not depend on the control parameters. Keep also inmind that the non-equilibrium internal energy

U(τ)λ := 〈E〉(τ) =

E(x;λ) (τ)(x) dx (38)

depends only on the system control parameters λ, butnot on the intensive parameters of the environment. Inthe following, we make the parameter dependence ex-plicit, unless we find it (notationally) convenient to sim-ply write O⋆ instead of O⋆

ν . In case we need to make thetime dependence of the parameters explicit, we may also

write O⋆ν(τ) and O

(τ)ν(τ).

The non-equilibrium (information) entropy character-izes the uncertainty of the instantaneous nonequilibriumdistribution (τ):

S(τ) := H[(τ)] = −

(τ)(x) ln (τ)(x) dx. (39)

In analogy to the general Legendre duality (17), we definethe nonequilibrium variant of the natural dimensionlesspotential:

Φ(τ)ν := β(U

(τ)λ − γC(τ))− S(τ) (40)

The corresponding nonequilibrium natural energy ac-cordingly reads

G(τ)ν := β−1Φ(τ)

ν = U(τ)λ − γC(τ) − β−1S(τ). (41)

It is straightforward to show that the difference betweenthe natural non-equilibrium potential and its equilibriumreference yields the thermodynamic distance (29):

Φ(τ)ν − Φ⋆

ν = D(τ)ν ≥ 0. (42)

By multiplying this difference with the temperatureβ−1, one obtains a quantity with the dimensions of en-ergy:

B(τ)ν := G(τ)

ν − G⋆ν ≡ β−1D(τ)

ν ≥ 0. (43)

This quantity has a strong resemblance to the quantitycommonly called exergy in an engineering (and systemstheory) context [37, 38]. The exergy of an equilibriumsystem vanishes, while it is positive for non-equilibriumdistributions. It is the maximum work that can be ex-tracted from a system during a relaxation process. Forrelaxing dynamics in the sense of Eq. (31), we directlyfollow that exergy is destroyed during thermodynamicrelaxation:

∂τB(τ) ≤ 0 (44)

The law of exergy degradation (44) is usually derivedas a reformulation of the Second Law. Starting fromthe information-theoretic perspective, it appears natu-rally from the start. Its equivalence with the SecondLaw will become evident shortly.

2. Time derivatives of state variables

The reference distribution ⋆ν does not depend explic-itly on time. Thus the time-derivative of an equilibriumaverage is only due to changes in the control parame-ters ν:

O⋆ν :=

d

dτO⋆

ν(τ) = (∇νO⋆ν) ν. (45)

The parameter gradient ∇ν is a vector containing thepartial derivatives with respect to the control parameters.For example, in the grandcanonical situation describedabove we have ∇ν = (∂β , ∂µ, ∂V , ∂h). For equilibriumaverages, the parameter gradient

∇νO⋆ = 〈∇ν O〉⋆ + cov[O,∇νϑ]

⋆. (46)

can be re-written as the average of the state function∇ν O(x) and an equilibrium co-variance

cov[O,O′]⋆ = 〈OO′〉⋆− 〈O〉⋆ 〈O′〉

⋆. (47)

The second argument in the co-variance, ∇νϑ(x;ν), isthe gradient of the phase space function (34) appearingin the exponential of the reference density (cf. Eq. (16)).Below, we will see its interpretation as a natural (nondi-mensional) state-function describing the heat content ofa system.

If a state function O(x) does not explicitly depend onthe parameters, the first term in Eq. (46) vanishes andthe co-variance is the only contribution. In particular,we obtain the equilibrium stability relations

∂C⋆k

∂γk= β cov[Ck, Ck] ≥ 0. (48)

In contrast, the total time derivative of a non-

equilibrium state variable O(τ)ν reads

O(τ)ν :=

d

dτO

(τ)ν(τ) = ∂τO

(τ)ν + (∇νO

(τ)ν )ν. (49)

It generally features two terms: The first one, which wewrite as a partial time-derivative, can be attributed tothe (autonomous) relaxation of the non-equilibrium prob-ability distribution:

∂τO(τ)ν :=

O(x;ν(τ)) ˙(τ)(x) dx, (50)

For parameter-dependent observables o(x;ν) we havealso the gradient part

(∇νO(τ)ν ) ν = 〈∇ν O〉 ν, (51)

which cannot easily be decomposed like we did inEq. (45).

11

3. Process quantities: work and heat

Apart from changes of state variables, thermodynam-ics deals with process quantities that generally cannotbe written as the time-derivative of a state function. Intraditional thermodynamics, process quantities are as-sociated with inexact differentials often denoted by thesymbols δ or đ. Here, Eq. (27) provides us with an ex-plicit evolution rule for the non-equilibrium probabilitydensity distribution and we do not need to work withinexact differentials.

A process quantity Ψ[0, τ ] is defined as the integral

over the time interval [0, τ ] of a rate function Ψ(τ)

Ψ[0, τ ] :=

∫ τ

0

Ψ(t) dt. (52)

In general, the integrands Ψ(τ) ≡ ddτΨ[0, τ ] may con-

tain the control parameters ν as well as their derivativesν. In what follows, we will usually work with the in-tegrands Ψ(τ) and not with their integrals. However, wewill see that for systems in environments of constant tem-perature, some process quantities for reference processesbecome integrable.

The central process quantities in thermodynamics arethe rate of heat flow Q and the power W , i.e. the rate ofwork being performed on the system. Work and powerare attributed to changes in macroscopic extensive sys-tem control parameters λ and system observables C. Be-cause of the distinction between control parameters andmeasurable observables, the rate of work per unit time(i.e., the power) W (τ) consists of two terms:

W (τ) := W(τ)drv + W

(τ)aut , (53)

which we will call the driving and autonomous power.Both work terms have a positive sign if they add a posi-tive amount of energy to the system.

The driving power is the rate of energy change obtainedby changing the control parameters that appear in theenergy function of the system:

W(τ)drv :=

∂Eλ

∂λ

⟩(τ)

λ =∑

i

R(τ)i λi (54)

[

= −p(τ)V +M(τ)h]

,

The exact form of the driving power depends on the phys-ical situation. For instance, in the grandcanonical casewhere volume and (and potentially a field h) is controlled,the driving work contains the mechanical work (and thechange in the internal field energy due to a change in thebare field h). Like in equilibrium, the quantities

R(τ)i =

∂E

∂λi

⟩(τ)

(55)

are the non-equilibrium system observables that are con-jugate to the system control parameters λi and can beeither intensive or extensive, cf. Eq. (20).

The autonomous power is the rate of energy that en-ters (positive sign) or leaves (negative sign) the systemas a result of the autonomous system dynamics (27). Itamounts to the exchange of conserved extensive quanti-

ties C(τ) with the reservoirs against their intensive pa-

rameters γ:

W(τ)aut := γC

(τ) [

= µN (τ)]

. (56)

It is the power that can be extracted from the non-equilibrium state of the system by consuming exergy thatwas, for instance, generated from driving the system outof an equilibrium state. Like before, the equality insquare brackets holds for the grandcanonical ensemblewhere the autonomous work is due to the autonomousflow of matter between the system and the environment.

In standard thermodynamics, work contributions areusually split into input and output work, and the dis-tinction is made by the sign. Our distinction re-lies on whether extensive quantities are either con-trolled or may autonomously relax against an environ-ment. Macroscopic well-known thermodynamic process(e.g. the Carnot-, Otto or Diesel-processes) are con-structed from combining four different processes wheredifferent system and environment parameters are con-trolled. Often, the work during one of this partial pro-cesses has a well-defined sign, and is thus either a pureinput or a pure output work. Using the present distinc-tion we still may find a unique attribution of the workalong a partial process to be either of the driving or theautonomous type. Notice however, that reversing thethermodynamic cycle reverses the sign of the power. Itthus exchanges input with output work, whereas the dis-tinction into driving or autonomous work remains thesame.

4. The First and Second Law of Thermodynamics

Besides through work, the internal energy of a systemmay change due to a heat flow Q(τ) from the environ-ment to the system. The First Law of Thermodynamicsexpresses energy conservation and states that the inter-nal energy of system can change either by a heat flow orthrough power:

U (τ) = Q(τ) + W (τ). (57)

Notice that the First Law and the definition of work un-ambiguously define the rate of heat flow: It is the rateof change of the energy of the system, that cannot beattributed to changes in macroscopic properties that weuse to specify equilibrium state of the system.

If the environment is at an inverse temperature β, the

heat flow corresponds to an entropy flow S(τ)e from the

environment to the system:

S(τ)e := βQ(τ) = β(U (τ) − W (τ)) (58)[

= β(U (τ) − µN (τ) + p(τ)V −M(τ)h)]

.

12

The interaction of the system with its environmentchanges the nonequilibrium entropy, i.e. the instanta-neous uncertainty of the distribution. However, it alsochanges the true microscopic state of the environmentby means of introducing correlations between the sys-tem and the environment. The latter are not part ofour description and lead to an irreversible production of

entropy S(τ)i .

In the present work, this entropy production emergesas the change of system entropy that cannot be attributed

to an entropy flow S(τ)e :

S(τ)i := S(τ) − S(τ)

e . (59)

Keep in mind that although the time-integrals Si[0, τ ]and Se[0, τ ] are well defined, there is no correspondingstate observable for these entropy changes. Using thedefinitions of heat (57) and autonomous power (56), wecan rewrite the irreversible entropy production as

S(τ)i = S(τ) − β(U

(τ)λ − W (τ))

= S(τ) − βU(τ)λ + βW

(τ)aut + βW

(τ)drv

= S(τ) − β(U(τ)λ − γC

(τ)) + βW

(τ)drv .

In order to formulate the usual notion of the SecondLaw, we consider the gradient part of time-derivative forthe natural non-equilibrium potential (40), which we callthe gradient entropy flow

S(τ)∇

:= (∇νΦ(τ)ν )ν (60)

= (U(τ)λ − γC(τ))β − βC(τ)γ + β

∂U(τ)λ

∂λλ

= (U(τ)λ − γC(τ))β − βC(τ)γ + βW

(τ)drv . (61)

Using partial integration, we realize that the irreversibleentropy production

S(τ)i = −Φ(τ)

ν + (U(τ)λ − γC(τ))β − βC(τ)γ + βW

(τ)drv

≡ −Φ(τ)ν + S

(τ)∇

= −Φ(τ)ν + (∇νΦ

(τ)ν )ν (62)

is nothing than the autonomous relaxational part of theevolution of the natural potential, represented by the par-tial time-derivative

S(τ)i = −∂τΦν

(τ). (63)

In order to formulate the common statement of theSecond Law of Thermodynamics, we use that the naturalreference potential Φ⋆

ν does not explicitly depend on time,i.e. that ∂τΦ

⋆ν = 0. Hence, we can equally express the

irreversible entropy production using the thermodynamicdistance (42):

S(τ)i = −∂τ [Φ

(τ)ν − Φ⋆

ν ] = −∂τD(τ)ν . (64)

Equation (64) is the main result of this paper. Itestablishes a fundamental relation between the defini-tion of thermodynamic entropy production (appearingon the left hand side) and the abstract definition of aninformation-theoretic thermodynamic distance (on theright hand side). While structurally similar results haveappeared in various more specialized contexts [4, 5, 8, 19–21], the present general derivation is the main originalconceptional result of this work. In particular, we em-phasize that we didn’t rely on any implicit assumptions.The thermodynamic entropy production is defined oper-ationally and consistent with phenomenological thermo-dynamics: it originates from the (physically unambigu-ous) definition of power, Eq. (53), which specifies energychanges due to changes in experimentally accessible, i.e.measurable and controllable, macroscopic quantities. Incontrast, the right-hand side of (64) has an unambiguousinformation-theoretic meaning: it measures the instan-taneous rate of discarded information with respect to aphysical equilibrium reference measure.

The above definitions do not rely on the strong re-laxation condition (31). Yet, assuming strongly relaxingdynamics, Eq. (64) directly yields the Second Law in itsmost common form:

S(τ)i = −∂τD

(τ)ν ≥ 0. (65)

The framework presented in this work does provides aproof of the Second Law using only strong relaxation asan assumption. In particular, we proved the Second Lawfor arbitrary Markovian dynamics that, in the absence ofdriving, relax to a thermodynamic equilibrium distribu-tion. To the knowledge of the author, this is the mostgeneral derivation of this fundamental law based on firstprinciples.

G. Derived concepts

So far we have illustrated the connection between in-formation theory and the elementary notions of thermo-dynamics in a very general way. The present section fo-cuses on generalizations of other derived thermodynamicconcepts like enthalpy. We further derive and discussother (equivalent) thermodynamic inequalities includingLandauer’s principle of information-to-work conversion.

1. Enthalpy as heat content and mediate work

Another important concept in traditional thermody-namics is that of enthalpy which arises naturally in thecontext of chemical physics. In traditional (bulk) chem-istry it is natural to work with the isobaric-isothermalensemble, where pressure and temperature of the envi-ronment are controlled and volume can fluctuate. Theequilibrium enthalpy H⋆ = U⋆ + pV⋆ can be interpretedas the energy that is necessary to create a system mi-nus the amount of work performed by the system (notice

13

the sign) to establish its volume in a given environment.Thus, enthalpy can be understood as the “heat content”of a system, i.e. the part of the internal energy that is notattributed to energy in macroscopic degrees of freedom.

It seems natural to generalize the notion of heat con-tent to our ensembles for general open systems. For thecase of a system with autonomous (fluctuating) particlenumbers (and all other extensive parameters constant),the analogous equilibrium heat content reads U⋆ − µN ⋆:It is the energy needed to create a system plus the amountof work needed to establish a certain particle numberwithin a large environment characterized by a chemicalpotential µ.

For the general case, we thus define the natural “non-free” potential

Θ := βU − βγC = 〈ϑ〉 . (66)

The natural non-free potential is the dimensionless ver-sion of what we call the natural heat content or general-ized enthalpy

H := β−1Θ = U − γC. (67)

The heat content thus emerges as an energetic “non-free”state function: It specifies an energy content that is notavailable to perform work. Notice that the natural non-free potential appear both the Legendre transform of βUwith respect to the observed, extensive quantities βγCand the (non-equilibrium or reference) average of thestate function ϑ defined in Eq. (34).

To see more clearly how heat content and heat flow isrelated, consider the following situation: A system in agiven state (i.e. with a given distribution (τ)) residesin an environment at temperature β and other intensiveparameters γ. The state uniquely determines the expec-

tation values of the extensive observables C(τ). The heatcontent H(τ) = U (τ) − γC(τ) characterizes the part ofthe internal energy, that we cannot attribute to macro-

scopic degrees of freedom γC(τ). Now suppose we in-stantaneously change the intensive parameters from γ toγ′. The internal energy of the system has not changed,but the attribution of what constitutes the macroscopic

part of the energy is now given by γ′C(τ). As such, the

heat content of the system will have changed, withouteither heat flowing or work being performed on the sys-tem. Instead, in order to change the parameters of thereservoirs, work will have to be performed on the envi-ronment. Since the environment is imagined as infinitelylarge, it is not sensible asking for the total amount ofwork needed to change its state. However, the (negative)power needed due to the instantaneous state of system iswell defined:

W(τ)med := −C

(τ)γ. (68)

We suggest to call W(τ)med the mediate power. We apply

the negative sign convention, because energy changes areformulated with respect to the system. The word “me-diate” here is understood in contrast to the immediate

driving power Wdrv, which directly changes the internalenergy of the system by changing the parameters thatgovern the energy function of the system. In contrast,the mediate power reflects how our distinction betweenenergy in hidden (microscopic) and observable (macro-scopic) degrees of freedom changes with time if we varythe environment parameters. The sum of autonomousand mediate power is the change of the energy attributedto macroscopic, observable degrees of freedom:

Waut − Wmed =d

dτ[C(τ)γ] ≡

d

dτ[U (τ) −H(τ)] (69)

The mediate power appears naturally in the γ-derivatives of the thermodynamic potentials. In fact,

subtracting the entropy flow S(τ)e = −βQ(τ) from the

total change of the “non-free” natural potential Θ, werecover the gradient entropy flow that has appearedin Eq. (61):

Θ− βQ(τ) = (U − γC)β − βCγ + βWdrv

= (∇νΦ(τ)ν )ν

≡ S(τ)∇

. (70)

Notice that the corresponding dimensional relation

H(τ) − Q(τ) = (∇νG(τ)ν )ν − β−1H(τ)β (71)

features and additional non-gradient term for non-isothermal processes. Still, the following relation is validfor arbitrary processes:

H(τ) = Q(τ) + W(τ)drv + W

(τ)med. (72)

In the above equation, we see that a change of enthalpycan be due to three physically distinct mechanisms: Heat

flow Q(τ), (immediate) driving power W(τ)drv and the me-

diate power associated to changing the intensive param-eters characterizing the environment. As such, Eq. (72)acts as a “First Law” for the (generalized) enthalpy. Dif-ferent formulations (although similar in spirit) have beenproposed for chemical reaction networks in Refs. [15, 39].

2. Work inequalities

For arbitrary processes between equilibrium states, acommon result in traditional thermodynamics is that thework performed on the system is necessarily larger thanthe difference of the corresponding equilibrium free en-ergies. Here, we can generalize this inequality to a non-equilibrium version for driven systems in a straightfor-ward way. To that end notice that by definition we have

W (τ) ≡ U (τ) − β−1S(τ)e

≡ U − β−1S(τ) + β−1S(τ)i

≥ U (τ) − β−1S(τ), (73)

14

where in the last line we used the Second Law, Eq. (65).Again this inequality is equivalent to the Second Lawand thus to our initial information-theoretic assumptionof relaxing dynamics. Using the definition of the non-equilibrium free energy F (τ) ≡ U (τ) − β−1S(τ) we usethe chain rule to find the relation

W (τ) ≥ F (τ) − β2S(τ)β ≡ F (τ) + S(τ)T (74)

This is the non-equilibrium version of the work inequality.In particular, integrating this relation along an isother-mal process with β = 0 we have

W [0, τ ] ≥ F(τ)ν(τ) −F

(0)ν(0). (75)

For any isothermal process, the work done on the systemis at least as large as the difference between the non-equilibrium free energies at the beginning and at the endof the process. For a non-isothermal process, the lowerbound for the power contains a non-integrable rate termweighing the rate of temperature change T with the non-equilibrium entropy S(τ).

Sometimes, the work inequality (75) is stated with thedifferences of the equilibrium free energies appearing onthe right-hand side. However, it is important that suchan inequality is not valid for arbitrary non-equilibriumsituations. While we have already seen that it usuallyrequires an isothermal processes, we also need to be in aphysical situation where here are no autonomous measur-able macroscopic extensive observables. Hence, the workis purely driving work and does not have an autonomouscontribution. Assuming that driving only occurs duringthe interval [0, τ ], we have ν(t) = ν f and thus no drivingpower for t ≥ τ . It follows that

W [0, τ ] = Wdrv[0, τ ] = Wdrv[0,∞]

≥ F(∞)ν(∞) −F

(0)ν(0)

= F⋆νf

−F (0)νi

. (76)

Consequently, even in the canonical isothermal case, ageneral inequality that involves only the equilibrium freeenergies requires system to start in equilibrium [19].

This example reminds us that when using well-knownthermodynamic inequalities involving difference of statefunction at different times, one needs to be careful. WhileEq. (74) is always valid, the more subtle issues arising foropen systems in this formulation are implicit in the defi-nition of work. In the following make this more explicitby introducing the notion of reversible work.

3. Reversible processes and reversible work

Traditional thermodynamics uses the notion of ideal-ized reversible process in order to formulate thermody-namic inequalities. The reversible work is defined as thenet work that can be extracted from a system in such

a reversible process, where the system remains at equi-librium at every point in time. Note that this conditionrequires the process to be quasistatic, and thus wouldrequire an infinite amount of time to perform.

In order to see how reversible processes enter thepresent picture, we start with the definition of the re-versible power. In analogy definitions (54) and (56), weseparate it into two terms:

W(τ)⋆ := W

(τ)drv,⋆ + W

(τ)aut,⋆, (77)

with the reference driving power W(τ)drv,⋆ and the reference

autonomous power W(τ)aut,⋆ defined as

W(τ)drv,⋆ :=

∂Eλ

∂λ

⟩⋆

λ =∑

i

R⋆i λi, (78a)

W(τ)aut,⋆ := γC

⋆. (78b)

Formally, we have exchanged non-equilibrium aver-ages by their corresponding equilibrium reference values.

Keep in mind that while R(τ) = ∂λU

(τ), the Maxwellrelations (24) show is that in general R⋆

i 6= ∂λU⋆. More-over, the term “autonomous” power might not be appro-priate. In a reversible process, there is no autonomousrelaxation by definition. For practical purposes, the con-crete expressions for R⋆

i and

C⋆ =d

dτ[C⋆(ν(τ))] (79)

are fully determined by the equilibrium state functionsC(ν) describing a (traditional) thermodynamic system.The irreversible power is defined as the difference be-tween the actual work and its reversible analogue:

W(τ)irr := W (τ) − W

(τ)⋆ . (80)

As equilibrium state variables serve as reference valuesfor their non-equilibrium analogue, reversible processescan be understood as (idealized) reference processes forreal processes. It is a well know fact that in an isother-mal process (i.e. a process, where the bath temperature

remains constant (β = 0) in the interval [0, τ ]), the dif-ference between equilibrium free energies is given by theintegrated reversible work. In order to proof this state-ment for the present situation, we use Eq. (23) and thedefinitions (78) to find:

W(τ)⋆ = (∂λG

⋆ν)λ+ (∂γG

⋆ν)γ +

d

dτ[γC⋆]

= (∇νG⋆ν)ν − (∂βG

⋆ν)β +

d

dτ[γC⋆] . (81)

The β-derivative of the natural potential is easily recog-nized as

∂βG⋆ν = ∂β(β

−1Φ⋆ν) = β−1∂βΦ− β−2Φ⋆

= β−1(U⋆ − γC⋆)− β−2(βU⋆ − βγC⋆ − S⋆)

= β−2S⋆. (82)

15

Realizing the first term of Eq. (81) as the total time-derivative of the natural equilibrium energy G⋆

ν , we thusfind

W(τ)⋆ =

d

dτ[G + γC⋆]− (β−2S⋆)β

= F⋆ + S⋆T , (83)

We immediately recognize the structural similarity of theequality (83) for the reversible work with the inequal-ity (74) for the actual work. By integration, we obtainthe traditional result mentioned above for isothermal pro-cesses:

W⋆[0, τ ] = F⋆ν(τ) −F⋆

ν(0). (84)

For non-isothermal processes we cannot integrateEq. (83) and thus have no direct relation between re-versible work and free energy changes.

4. Thermodynamics and information processing

While in the previous section we derived some connec-tions between work and free energy, our main result (64)allows us to relate work and information. Originally, thisquestion was raised by Landauer regarding the thermo-dynamic cost of irreversible computation [4].

Landauer’s idea is summarized as follows: Considerthe state of a single bit, i.e. a single binary digit of infor-mation, as part of the microscopic state of the system.Suppose that there is no energy bias associated with thestate of this bit. Without any a-priori knowledge aboutthe bit, it will be in one of its two states (say 0 or 1)with equal probability. We thus have p0 = p1 = 0.5 withan associated information entropy Sbit = ln 2 = 1bit.The elementary irreversible single-bit operation is dele-tion, i.e. the reset of the bit to a reference, say 1. Aftererasure, the bit is certainly (p1 = 1) in state 1 and the en-tropy of the bit vanishes. Thus, the erasure of a bit leadsto a decrease in system entropy by the amount − ln 2.By the Second Law, the irreversibly produced entropyshould be positive.

Landauer’s principle is the statement that the erasureof a bit at temperature β−1 comes at thermodynamiccost of at least QL := β−1 ln 2 of dissipation. The energyfor this dissipation must enter the system as an amountof work W→1. Consequently, the work necessary for theerasure of a single bit at constant temperature obeys theinequality

W→1 ≥ QL = β−1 ln 2. (85)

We can easily derive Landauer’s principle from theisothermal work inequality (75). Consider an arbitraryisothermal process in the time interval [0, τ ], where at theinitial time t = 0 the bit is in state 1 and 0 with equalprobability, and at the final time t = τ is is surely in state1. If this is the only effect of the process, the difference

in nonequilibrium entropies is S(τ) − S(0) = − ln 2. Nowassume that the bit is energetically neutral, meaning theinternal energy does not depend on its state (U (τ) = U (0))and thus

W [0, τ ] ≥ F (τ) −F (0)

= U (τ) − U (0) − β−1(S(τ) − S(0))

= β−1 ln 2, (86)

which is exactly the statement of the original formula-tion (85).

Recently, related inequalities have been obtained invarious contexts [8, 15, 20, 21], which relate the irre-versible work to the change in the thermodynamic dis-tance Eq. (29). To connect to these formulations, it willbe convenient to define the shorthand notation

∆(τ)⋆ O := O

(τ)ν(τ) −O⋆

ν(τ) (87)

for the differences of non-equilibrium state function O(τ)

and their corresponding equilibrium reference value O⋆.Subtracting Eq. (83) from the work inequality (74) andmultiply by β to find

βW(τ)irr ≥ β∆

(τ)⋆ F − β−1(∆

(τ)⋆ S)β

= β∆(τ)⋆ G + β

d

dτ[γ∆

(τ)⋆ C]− β−1(∆

(τ)⋆ S)β, (88)

where we used the symbol ∆(τ)⋆ defined in Eq. (87) for

brevity of notation. The chain rule lets us express thefirst term using the total time-derivative of the thermo-dynamic distance (42):

β∆(τ)⋆ G = ∆

(τ)⋆ Φ− (∆

(τ)⋆ G)β

= D(τ)ν − (∆

(τ)⋆ G)β. (89)

Inserting Eq. (89) into Eq. (88) while realizing that

∆(τ)⋆ [G + β−1S] = ∆

(τ)⋆ H we find the general inequality

βW(τ)irr ≥ D(τ)

ν + βd

dτ[γ∆

(τ)⋆ C]− (∆

(τ)⋆ H)β. (90)

While this result is equivalent to Eq. (64), we considerit as another main result of this work. It is valid forarbitrary driven processes, as long as the dynamics isstrongly relaxing. In order to get a better intuition for theterms appearing on the right-hand side of inequality (90),we consider some special cases.

5. Process-specific bounds for the irreversible work

First consider isothermal processes where also all ex-tensive quantities are controlled. Then, we have no au-tonomous extensive observables C(x) or conjugate inten-sive parameters γ. The only remaining term in the boundis the thermodynamic distance and the bound reads

βW(τ)irr ≥ D(τ)

ν . (91)

16

The right hand side can be integrated to reproduce the“nonequilibrium Landauer principle” from Ref. [8]:

Wirr[0, τ ] ≥ D(τ)ν(τ) −D

(0)ν(0) (92)

It states that in a canonical isothermal scenario the ir-reversible entropy production is at least as large as thedifference of the thermodynamic distances at the end andthe beginning of the process.

For general open isothermal systems the bound reads

βW(τ)irr ≥ D(τ)

ν + βd

dτ[γ∆

(τ)⋆ C]. (93)

The additional term contains the rate of change of thedeviation of instantaneous energy attributed to macro-

scopic degrees of freedom, γC(τ), from its value in thecorresponding equilibrium state, γC⋆. It can still be in-tegrated to yield

Wirr[0, τ ] ≥ D(τ)ν(τ) −D

(0)ν(0) + γ(τ)∆

(τ)⋆ C − γ(0)∆

(0)⋆ C.

For an isothermal process where the observable macro-scopic extensive quantities are at their equilibrium valuesat the beginning and the end of the process, we repro-duce the bound from Eq. (93). Notice that this resultdoes not depend on the value of this difference at anypoint during the process. Moreover, it does not requirethe system to be in equilibrium at the beginning or theend of the process. We only required that the observableextensive quantities are at their equilibrium values, irre-spective of the value of the internal energy or the exactform of the distributions.

For non-isothermal process, the general lower boundfor the irreversible work, Eq. (90), cannot be integrated.The bound thus depends on the value of all observablesand parameters during the process. In particular, unlikefor the integrable case, it will crucially depend on thespeed of the driving and its total duration. Disregardingthe question whether such processes exist, we can stillformulate the notions of quasi-static processes.

A phenomenologically quasi-static process is the limitof a process such that at any instant during the duration

of the process we have ∆(t)⋆ C = 0. The bound for such

a process is formally identical to the bound in a processfor closed systems, i.e. in physical situation where thereare no autonomous macroscopic extensive observables.In both situations, only the internal energy can devi-

ate from its equilibrium value and thus ∆(t)⋆H = ∆

(t)⋆ U .

The inequality for closed systems and phenomenologi-cally quasi-static processes thus reads

βW(τ)irr ≥ D(τ)

ν − (∆(τ)⋆ U)β. (94)

We further define a full quasi-static process as a limit-ing process where in addition to C also the internal en-ergy U is always at equilibrium. In that case, we recoverthe isothermal canonical bound Eq. (91). Notice thatwhile a reversible process by definition is a full quasi-static process, the reverse is not true. A (full) quasistatic

process constrains only the average values of the exten-sive quantities during the process, whereas a reversibleprocess specifies the entire instantaneous distribution as(t) = ⋆

ν(t).

H. Discussion

In the previous two sections, we showed how thermo-dynamics inequalities can be derived in a self-consistentinformation-theoretic framework based on basic physi-cal concepts. For a reader that has been following therecent developments in non-equilibrium thermodynam-ics, the validity of these results may not come as a hugesurprise. In particular, some notions of work and heatin open (but non-driven) systems has been analysed inRef. [5].

While the present work is more general and containsa full treatment of driven systems, its main purpose isto clarify the logical structure of thermodynamics. Inparticular, we tried to be very clear on disentangling thedifferent elementary building blocks entering this presen-tation:

• The basic concept of physics and classical mechan-ics is energy. Equilibrium statistical mechanicsrelates macroscopic averages of a microscopic en-ergy function E(x) and other (observable) exten-sive quantities C(x). The theory of large devia-tions ensures that these extensive averages capturethe typical behaviour (Sections B and C).

• Probability theory formalizes plausible reasoning;information theory allows us to quantify and com-pare the uncertainty of thermodynamic states for-malized as statistical ensembles. Applying thesenotions to physical systems, we obtain the MaxEntprinciple (Sec. D).

• Markov processes discard information and thus for-malize abstract irreversibility. Strongly relaxingdynamics connect this notion of irreversibility tothe physical irreversibility of the macroscopic re-laxation process (Sec. E).

• Apart from energy, which is the fundamental phys-ical quantity, all other relevant phase-space func-tions are either physical parameters or observablequantities. Consequently, defining non-equilibriumstate functions does not extend the physical de-scription (Sec. F).

• The connection to traditional thermodynamics onlyrequires the First Law, Eq. (57), which expressesenergy conservation between macroscopic and mi-croscopic forms of energy.

To the best knowledge of the author, such a presen-tation has still been missing from the literature. How-ever, the present work goes beyond merely reproduc-ing already-known results. In particular, we presented

17

new inequalities for arbitrary driven processes, which alsoshed light on the more subtle distinction between severalidealized variants.

Before we conclude, some additional comments on theconceptional basis of thermodynamics as a meta-physicaltheory seem appropriate.

1. On energy, time and probability

At first glance, the generalized equilibrium distribu-tions

⋆(x) ∝ exp(−β(E(x)− γC(x))) (95)

exhibit an apparent symmetry between the system en-ergy E(x) and phase-space functions C(x). Indeed, forany non-driven process, the conjugate pairs (β,E) and(−βγi, Ci) of intensive and extensive quantities are math-ematically equivalent. This symmetry is then broken bythe definition of work and heat via the time derivative ofthe expectation value 〈E〉 = U . Hence, by looking only atEq. (95) a natural question comes to mind: What makesenergy special?

The answer to this question must necessarily comefrom physics, and not from mathematics. In fact, itshould arise from a fundamental physical theory thatcaptures universal physical laws on a sufficiently lowlevel. Energy is special because energy and dynamics,and thus the notion of time, are fundamentally related.

Fundamental here means on a sufficiently low level, sayHamiltonian or quantum mechanics. In these low-leveldescriptions, energy is characterized by the Hamiltonianwhich is responsible for the time evolution. More ab-stractly, Noether’s theorem shows that energy is the con-served quantity associated with the symmetry of physi-cal laws with respect to time translations. In the micro-canonical derivation of statistical physics (cf. section B),the importance of the Hamiltonian (and thus energy) isevident from the start: one defines the entropy as thecentral quantity of statistical physics via the (logarithmof the) phase space volume W(U) of all (physically dis-tinct) microstates that are compatible with a given valueU of the system’s energy.

In contrast, the MaxEnt approach to statistical me-chanics directly yields Eq. (95) without the need to re-fer to a microscopic Hamiltonian. Even though the con-nection between energy and entropy is more obscure, itstill can be recovered. Recall that uncertainty enters themicrocanonical derivation in the form of equal a-prioriprobabilities for all microscopic states on an energy shell.Mathematically, this means that the volume of the energyshell is calculated using the natural normalized (Lebesgueor Liouville) measure. The same assumption enters theMaxEnt approach, when we calculate the entropy of aprobability density (x): The definition of probabilitydensities already requires a natural reference measure,which in the present case is the natural Lebesgue mea-sure. Consequently, we calculate all integrals using the

natural phase space volume dx. Hence, using MaxEntfor (natural) phase space densities is as fundamental (or,as arbitrary) as starting the microcanonical derivationfrom the assumption of equal a-priori probabilities.

However, this seems to be just another step down therabbit hole. One may ask for the origin of the naturalmeasure on phase space. Couldn’t we just equivalentlyuse another measure? Coming back to Noether’s theoremconnecting energy and time we might also ask: How doesthe natural measure (and thus probability) connect to thenotion of time?

There is also a consistent simple answer to that funda-mental question: The concept of time is nothing else thana fair weighting of dynamically accessible microstates inan inertial frame of reference. Recall that in a Newto-nian inertial frame of reference a body remains at restor moves with constant speed unless forces are present.Hence, for a force-free particle, the notion of time simplyexpresses a uniform (and thus fair or unbiased) way ofsplitting the phase space states that are occupied by thetrajectory of the particle.

2. On reversibility and the arrow of time

Another feature of fundamental evolution rules is theirreversibility. Leaving the weak force aside, on such a mi-croscopic level, we are not able to infer the direction oftime from the dynamics of the elementary constituents.Moreover, the operation that reverses the microscopic ar-row of time is a measure-preserving involution on themicroscopic phase space and thus conserves entropy. No-tice that in Hamiltonian mechanics, the conservation ofentropy is ensured by Liouville’s theorem.

Relaxing irreversible dynamics are always coarse-grained descriptions of reality formulated on a certainlevel that describes an observer’s phenomenology. Likemacroscopic entropy production, the thermodynamic ar-row of time is an epistemic concept. In the author’s opin-ion, these connections between space and time, proba-bility and information might provide an interesting per-spective on recent advances in cosmology [40–43]. Tak-ing the universal discreteness of quantum mechanics (andthe more fundamental CPT invariance) into account, thismight even provide novel perspectives on quantum grav-ity. However, at this point we leave the rabbit hole inorder to conclude the present work.

3. Outlook and conclusion

In this work, we presented a first-principles approachto non-equilibrium thermodynamics based on the as-sumption of relaxing dynamics. Such dynamics are for-getful or strictly non-anticipating, meaning that the dis-tribution (τ) of microscopic degrees of freedom gets lesscertain over time. More precisely, “less certain” heremeans “closer to the generalized equilibrium distribution

18

⋆ν for a physical system described by a set of thermody-namic parameters ν”. Acknowledging the fundamentalmeaning of energy and (microscopic) entropy, such distri-butions can similarly be obtained as the maximally non-committal distributions for a given set of phenomenolog-ical first-order constraints on extensive thermodynamicquantities.

Within our framework, we consistently identified gen-eral notions of a natural (generating) potential and theassociated natural heat content, which generalizes the no-tion of enthalpy. The process inequalities relating work,heat and information (like the Second Law, the Law ofexergy degradation or Landauer’s principle) were shownto be equivalent to the assumption of relaxing dynam-ics, which can be rationalized in an information-theoreticframework.

A careful distinction between driving and autonomouspower emphasizes the importance of working with thecorrect (non-equilibrium) potentials if non-equilibriuminequalities are to be applied beyond the canonical,isothermal setup. We have introduced a general boundfor the irreversible work and discussed it for several realand idealized processes. Further and maybe most im-portantly, we emphasized the importance of energy andentropy as the basic concepts of thermodynamics anddiscussed how they are intimately connected to dynam-ics and probability.

We have purposefully excluded the treatment of non-equilibrium steady states, because this requires are moredetailed discussion of the notion of local equilibrium,which is beyond the scope of this paper. For the samereason, we have not included a formulation of thermo-dynamics on the level of individual trajectories realizedby a (Markovian) stochastic process. A more detailedtreatment of so-called Stochastic Thermodynamics (cf.Refs. [44–46] for an introduction), will be the subject ofa future publication. We believe that the applications of

the present framework to so called fluctuation relations,which arise from comparing the probabilities forward andbackward trajectories [47, 48], can bring new insights. Inparticular, studying the fluctuation relations for the pro-cess quantities introduced here (like driving, autonomousand mediate work as well as the gradient entropy flow)seems promising.

In conclusion, we see this work as a contribution to-wards a modern epistemic view of nonequilibrium ther-modynamics, i.e. a view of thermodynamics as a gen-eral physical theory of statistical inference. It relatesmacroscopic (observable and controllable) and micro-scopic (hidden and uncontrollable) degrees of freedom.This is very much in line of atomistic notion of heat thatemerged around 1900. After all, heat is an epistemic con-cept in the first place: the energy of a system attributedto operationally inaccessible degrees of freedom.

ACKNOWLEDGMENTS

The author thanks the editors for the invitation tocontribute to this special issue. This work would nothave been possible without insightful and sometimesfierce discussions with many people throughout the lastyears. In particular, we acknowledge exchanges withLukas Geyrhofer, Massimiliano Esposito, David Hof-mann, Alexandre Lazarescu, Matteo Polettini, RiccardoRao, Hugo Touchette, Jürgen Vollmer and Artur Wach-tel. Various ThinkCamps hosted by the MPI-DS in Göt-tingen and the Institute of Entropology reassured the au-thor to pursue these more conceptual issues. The authoris further indebted to his host institutions which enabledand encouraged a freedom of scientific thoughts, even ifthese thoughts were at times slightly tangential to projectdescriptions formulated a priori. This research was sup-ported by the National Research Fund Luxembourg inthe frame of the project FNR/A11/02.

[1] L. Peliti and R. Rechtman,Journal of Statistical Physics 167, 1020 (2017).

[2] C. E. Shannon, AT&T Technical Journal 27, 379 (1948),many reprints available.

[3] E. T. Jaynes, Physical Review 106, 620 (1957).[4] R. Landauer, IBM Journal of Research and Development

5, 183 (1961).[5] I. Procaccia and R. Levine, The Journal of Chemical

Physics 65, 3357 (1976).[6] C. H. Bennett, International Journal of Theoretical

Physics 21, 905 (1982).[7] S. Toyabe, T. Sagawa, M. Ueda, E. Muneyuki, and

M. Sano, Nat. Phys. 6, 988 (2010).[8] M. Esposito and C. Van den Broeck,

EPL (Europhysics Letters) 95, 40004 (2011).[9] T. Sagawa, Thermodynamics of information processing

in small systems (Springer Verlag, 2012).

[10] A. Bérut, A. Arakelyan, A. Petrosyan, S. Ciliberto,R. Dillenschneider, and E. Lutz, Nature 483, 187 (2012).

[11] J. M. Horowitz, T. Sagawa, and J. M. Parrondo, Physicalreview letters 111, 010602 (2013).

[12] A. C. Barato, D. Hartich, and U. Seifert,New J. Phys. 16, 103024 (2014).

[13] J. M. R. Parrondo, J. M. Horowitz, and T. Sagawa,Nat. Phys. 11, 131 (2015).

[14] P. Sartori and S. Pigolotti,Phys. Rev. X 5, 041039 (2015).

[15] R. Rao and M. Esposito, Phys. Rev. X 6, 041064 (2016).[16] P. A. Schilpp and H. Jehle, American Journal of Physics

19, 252 (1951).[17] E. T. Jaynes, American Journal of Physics 33, 391 (1965).[18] J. Shore and R. Johnson,

IEEE Transactions on Information Theory 26, 26 (1980).[19] C. Jarzynski, Phys. Rev. Lett. 78, 2690 (1997).

19

[20] S. Vaikuntanathan and C. Jarzynski, EPL (EurophysicsLetters) 87, 60005 (2009).

[21] K. Takara, H.-H. Hasegawa, and D. Driebe, Physics Let-ters A 375, 88 (2010).

[22] J. W. Gibbs, Elementary Principles in Statistical Me-chanics: Developed with Especial Reference to the Ratio-nal Foundations of Thermodynamics (C. Scribner’s sons,New York, 1902).

[23] H. B. Callen, Thermodynamics and an Introduction toThermostatistics (AAPT, 1998).

[24] O. Penrose, Foundations of statistical mechanics; a de-ductive treatment (Pergamon Press, Oxford, 1970).

[25] H. Touchette, Phys. Rep. 478, 1 (2009).[26] R. Balian, From microphysics to macrophysics: methods

and applications of statistical physics, Vol. 2 (SpringerScience & Business Media, 2007).

[27] R. S. Ellis, Entropy, large deviations, and statistical me-chanics, Vol. 271 (Springer Science & Business Media,2012).

[28] R. Chetrite and H. Touchette,Annales Henri Poincaré 16, 2005 (2014).

[29] T. M. Cover and J. A. Thomas, Elements of InformationTheory (Wiley Series in Telecommunications and SignalProcessing) (Wiley-Interscience, 2006).

[30] J. Schnakenberg, Rev. Mod. Phys. 48, 571 (1976).[31] M. Baiesi and C. Maes,

New Journal of Physics 15, 013004 (2013).

[32] A. Wachtel, J. Vollmer, and B. Altaner,Phys. Rev. E 92, 042132 (2015).

[33] M. Esposito and C. Van den Broeck,Phys. Rev. E 82, 011143 (2010).

[34] M. Esposito, Phys. Rev. E 85, 041125 (2012).[35] U. Seifert, Eur. Phys. J. E 34, 26 (2011).[36] P. A. Markowich and C. Villani, Mat. Contemp 19, 1

(2000).[37] M. Lozano and A. Valero, Energy 18, 939 (1993).[38] I. Dincer and M. A. Rosen, Exergy: energy, environment

and sustainable development (Newnes, 2012).[39] T. Schmiedl and U. Seifert,

J. Chem. Phys. 126, 044101 (2007).[40] R. Bousso, Rev. Mod. Phys. 74, 825 (2002).[41] J. Bekenstein, Sci. Am. 289, 58 (2003).[42] S. W. Hawking, M. J. Perry, and A. Strominger,

Phys. Rev. Lett. 116, 231301 (2016).[43] S. W. Hawking, arXiv preprint arXiv:1401.5761 (2014).[44] S. Ciliberto, S. Joubaud, and A. Petrosyan, J. Stat.

Mech. 2010, P12003 (2010).[45] U. Seifert, Rep. Prog. Phys. 75, 126001 (2012).[46] C. Van den Broeck and M. Esposito,

Physica A 418, 6 (2015), proceedings of the 13thInternational Summer School on Fundamental Problemsin Statistical Physics.

[47] C. Maes, in Poincaré Seminar 2003: Bose-Einsteincondensation-entropy (Birkhäuser, Basel, 2004) p. 145.

[48] U. Seifert, Phys. Rev. Lett. 95, 40602 (2005).